This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
p
d0 deletes the previous data loaded on packet at router and newly marks its own IP address d1 performs XOR and marks its IP address to the former router's data loaded on packet
values after d 2 just increase and send packet's distance data only with the former data maintained
d4
Fig. 4. Packet flow and data in marking algorithm
Fig 4 shows that router data saved at packet is newly created or changed in the course that packet starts sending after router's marking its data to the packet at router R1 and arrives to Router R5 thru each router and also shows each packet discerned by router data saved at such packet. In Fig 4 d0 deletes the previous data loaded on packet at router and newly marks its own IP address, d1 performs XOR and marks its IP address to the former router's data loaded on packet, and values after d2 just increase and send packet's distance data only with the former data maintained.
A Proposal of Extension of FMS-Based Mechanism to Find Attack Paths
483
In Fig 4 arrow means to send to the next router applying probability p, for example if probability p applies to d0 of R1, the value is d0 and d1 of R2. Here if applying probability produces random number and x
p, follow process(2), and if greater than 0 and x>p, follow process(3) It can be described in the following equation.
R2 [d 0 ] = R1[d 0 ] × p
(2)
R2 [d1 ] = R1[d 0 ] × (1 − p)
(3)
In current mechanism, probability p is fixed at every router. Now by giving weight to the probability by distance, data loss and arrival rate is to be improved. It is helpless to lose data owned whenever passing through router but it is necessary to minimize time to spend in finding attack path by having packet with distant data arrive at the victim system as much as possible. Value to indicate the number of packets arriving at each router, X, is defined below. Definition. Where number of hops is n, the number of packets arriving at router with packet, whose distance is i, is expressed in
X in+1 .
Also function j to probability p is defined in the following equation to weigh by distance data.
fi =
1 ( i +1)×2
(i=distance)
(4)
With the two above definitions value to hop can be expressed in the equation below. Where n=0,
X 01 = X 00 ⋅ f 0 X 11 = X 00 ⋅ (1 − f 0 ) Where n=1,
X 02 = X 01 ⋅ f 0 + X 11 ⋅ f1 X 12 = X 01 ⋅ (1 − f 0 )
X 22 = X 11 ⋅ (1 − f1 ) . . . Combining above equations it can be expressed in the following function equations.
X 0n+1 = X 0n ⋅ f 0 + X 1n ⋅ f1... X nn ⋅ f n
(5)
484
B.-R. Kim and K.-C. Kim
X in+1 = X in−1 ⋅ f i−1 for i=1, … , n+1
(6)
Without setting the probability fixed as expressed above, if weighing by packet's distance data, whenever passing through router the packet's arrival rate with the previous data, which was greatly diminished, will be largely improved.
6 Performance Mechanism to weigh router's probability p by packet's distance data was proposed above. To confirm whether equation 4 is optimized or not, function equation to probability p is changed to the following function having two coefficients a, b.
fi =
1 ai +b
(i=distance data, a ≥ 0, b ; 0 )
(7)
First it will be found out how packet's arrival rate is changed as hop gets longer. Fig 5 shows the packet's arrival rate changed by hop and distance data when probability is fixed to p=0.5 at each router. hops distance
packet's arrival rate
d=0 d=1 d=2 d=3 d=4 d=5 d=6
1
2 50 50
3 50 25 25
50 25 12.5 12.5
4 50 25 12.5 6.25 6.25
5 50 25 12.5 6.25 3.125 3.125
6 50 25 12.5 6.25 3.125 1.5625 1.5625
Fig. 5. Packet arrival rate by hop and distance data(p fixed) (Unit %)
As shown on Fig 5 as hop increases the arrival rate of packet arriving to remote router diminishes to half each. With some distance data, packet's arrival rate(Ri) is changed by the following equation.
Ri = 100i (i=distance data)
(8)
2
To sum up this, it can be found the more hops increase and the bigger distance data gets, the dramatically smaller packet arrival rate. The packet arrival rate by hop and distance data obtained from the application of equation 4, function proposed to probability p, in the results packet arrival rate has been improved by flexibly weighing p by distance data rather than by fixing the probability p.
7 Conclusion There have been significant problems in tracing back attack path with marking algorithm among IP traceback mechanisms stressed as countermeasure against DoS attack
A Proposal of Extension of FMS-Based Mechanism to Find Attack Paths
485
which has caused great damage to internet service companies. It was effectively improved to solve problems in detecting attack source and requiring great deal of packets to find attack path to cope with DoS attack by proposing mechanism to find MAC address of attack source and improve packet arrival rate by weighing router's sampling probability value. While the attack source could not be found in current marking algorithm, in the proposed mechanism MAC address of attack source can be found by marking MAC address at intermediate router and tracing back attack path in the victim system and real packet arrival rate and the deviation of the rate can be greatly reduced by changing the value of router's sampling probability from fixed value into flexible value to give weight by distance data of packet.
Acknowledgements This work was supported by INHA UNIVERSITY Research Grant.
References 1. Computer Emergency Response Team (CERT), CERT Advisory CA-1995-01 IP Spoofing Attacks and Hijacked Terminal Connections, http://www.cert.org/ advisories/CA-199501.html, Jan. 1995. 2. Computer Emergency Response Team (CERT), CERT Advisory CA-2000-01 Denial-ofservice developments, http://www.cert.org/advisories/CA-2000-01.html, Jan. 2000. 3. S. A. Crosby, D. S. Wallach, Denial of Service via Algorithmic Complexity Attacks, in: Proceedings of the 12th USENIX Security Symposium, 2003. 4. Project IDS - Intrusion Detection System, http://www.cs.columbia.edu/ids/index.html, 2002. 5. Dawn Xiaodong Song and Adrian Perrig, Advanced and Authenticated Marking Schemes for IP Traceback, in Proc. IEEE INFOCOM, April. 2001. 6. Stefan Savage, David Wetherall, Anna Karlin, and Tom Anderson, Practical network support for IP traceback, in Proc. of ACM SIGCOMM, pp. 295-306, Aug. 2000. 7. P. Ferguson and D. Senie, "Network Ingress Filtering: Defeating Denial of Service Attacks Which Employ IP Source Address Spoofing, RFC 2267, Jan. 1998. 8. G. Sager. Security Fun with Ocxmon and Cflowd. Presentation at the Internet 2 Working Group, Nov. 1998. 9. Computer Emergency Response Team (CERT), http://www.cert.org/index.html, 2002. 10. David A. Curry, UNIX System Security," Addison Wesley, pp.36-80, 1992. 11. S. M. Dellovin, The ICMP Traceback Messages, Internet Draft: draft-bellovin-itrace00.txt, http://www.research.att.com/~smb, Mar. 2000. 12. R. Stone, CenterTrack: An IP Overlay Network for Tracking DoS Floods, In to appear in Proceedings of thje 2000 USENIX Security Symposium, Denver, CO, July. 2000.
Comparative Analysis of IPv6 VPN Transition in NEMO Environments* Hyung-Jin Lim, Dong-Young Lee, and Tai-Myoung Chung Internet Management Technology Laboratory and Cemi: Center for Emergency Medical Informatics, School of Information and Communication Engineering, Sungkyunkwan University, Chunchun-dong 300, Jangan-gu, Suwon, Kyunggi-do, Republic of Korea {hjlim, dylee, tmchung}@rtlab.skku.ac.kr
Abstract. IPv6 deployments use different NGtrans mechanisms, depending on the network situation. Therefore, a mobile network should be able to initiate secure connectivity with the corresponding home network. The correlation between transition mechanisms used in existing network and transition mechanisms supported by the mobile network, is an important factor in secure connectivity with a corresponding home network. In this paper, VPN scenarios applicable to mobile networks during the transition to IPv6, have been analyzed. In addition, performance costs have also been evaluated, in order to establish connectivity according to VPN models with each transition phase. In addition, this paper analyzes factors affecting performance, through numeric analysis for NEMO VPN model, under an IPv6 transition environment. As shown in our study, a NEMO VPN creating hierarchical or sequential tunnels should be proposed after careful evaluation of not only security vulnerabilities but also performance requirements.
1
Introduction
Present Internet deployment consists of an IPv4-based network. However, this network environment cannot cope with demands for various services and expanding users resulting both from network technological development and the rapid spread of the Internet. To overcome this demand, the IETF has proposed the IPv6 protocol, IPv6 includes several excellent features such as newly recognized routing efficiency, mobility support, QoS, auto-configuration. Additionally, the IPv6 protocol solves the exhaustion of IPv4 addresses. In IPv4, IPSec is implemented using an additional module, to process the AH and ESP header. However, in IPv6, IPSec is an integrated component, as regular use is expected. A considerably long period of time will elapse before IPv6 in networks is developed, during this time it will coexist with the present IPv4 protocol. *
This research was supported by the MIC (Ministry of Information and Communication), Korea, under the ITRC (Information Technology Research Center) support program supervised by the IITA (Institute of Information Technology Assessment).
M. Gavrilova et al. (Eds.): ICCSA 2006, LNCS 3982, pp. 486 – 496, 2006. © Springer-Verlag Berlin Heidelberg 2006
Comparative Analysis of IPv6 VPN Transition in NEMO Environments
487
For natural transition to a full IPv6 network, the IETF NGtrans working group has developed a variety of transition mechanisms [1]. In contrast to situations with pure IPv4 or IPv6 networks, the availability of IPsec for VPN composition in an IPv6 network development process is an important factor in choosing a transition mechanism. The IETF NEMO working group has been doing research relating to mobility, based on a mobile router, as representative of an IPv6 application, mainly focusing on improvements of basic protocol, route optimization (RO), and multi-homing [2-4]. Since NEMO supports basic mobility in the network unit, an extended application of existing VPN models may also be required. This means that analysis of such VPN models in the transition phase to IPv6 is required, even in the NEMO research domain. Therefore, in this paper, VPN scenarios applicable under mobile network environments are evaluated. In Section 2, the IPv6 transition phase and scenarios are analyzed, to establish the VPN under the NEMO environment. Section 3 and Section 4 covers the evaluation and analysis of costs expected in accordance with each scenario.
Fig. 1. Mobile network topology and connectivity
2 IPv6 VPN Transition Scenario in NEMO Environment The IETF NEMO proposes the IPv6 extension protocol as supporting mobility for MR. Fig. 1 presents the NEMO basic topology. Under the IPv6 transition environment, IP versions of the visited network and Internet determine the MR requirements in establishing connectivity. That is, a total of 27 combinations are made possible due to IP versions by MR, visited networks, and the Internet. It is assumed that, in the case of a visited network supporting only IPv4, the MR is given CoA from AR_f and is a mechanism for setting up IPv6 CoA [5]. It is also assumed that such a visited network can support NEMO, irrespective of its IP version. Additionally, the MR should to recognize the version of the visited network. 2.1 Connectivity Fig. 1 demonstrates a basic NEMO VPN model, in which reverse tunneling connects a MR with HA. An IPSec tunnel exists within the reverse tunnel, therefore, the packets contain an external IP encapsulation header and IPSec extension header. In this study, the main transition phases of the five cases of NEMO are considered in the
488
H.-J. Lim, D.-Y. Lee, and T.-M. Chung
IPv6 transition environment. The cases where the IPv6 transition mechanism by MR is not required, enabling IPv6 communications is excluded. It is possible that the Internet and a visited network (AR_f) support various IP versions as only-IPv6, onlyIPv4 and Dual-stack. A transition mechanism required for MR and AR to set up connectivity according to their network status is presented in Fig 1. Connectivity of the Internet refers to a transition mechanism where a host in a foreign network may be used to form connectivity with an arbitrary host within a network connected to the Internet. If a visited network (AR_f) supports dual stack, its inner network device or endhosts are able to enable IPv6 or IPv4. However, if the visited network (AR_f) is an only-IPv4 node, a transition mechanism as Bump In the API (BIA), Bump In the Stack (BIS) and Network Address Translation-Protocol Translation (NAT-PT) is required for its inner IPv6 devices to communicate with other IPv6 devices. It is assumed that AR_h is a network device in support of IPv6, as is the case with MRs. Different IP version of the Internet require different transition mechanism for MR or a visited network to create connectivity. Under this mobile environment, when the mobile network has been involved in a foreign network, it may demand establishment of a different connectivity, in accordance with transition mechanism the foreign network suggests.
Fig. 2. Configuration Fig. 1_table-
①
of
VPN
in
Fig. 3. Configuration Fig. 1_table- &
② ③
of
VPN
in
2.2 NEMO VPN in IPv6 Transition An additional IPSec tunnel should be created to guarantee secure connectivity, irrespective of whether a transition mechanism exists or is used under such an environment. From the perspective of a VPN, a remote access VPN model can be applied to a mobile environment. However, a Gateway-to-Gateway VPN model (=intranet VPN) is suitable in a MR, since it is a network. Also, we took into account only the VPN connectivity based on not host-based but network-based translation mechanisms since IPv6-based MRs include a method to set up connectivity with its IPv6-based home network. The home network should establish connectivity with its own IPv6-based MR. Therefore, the case where an IPv6 host within a MR set up connectivity with an IPv4 end-host within a network can also be considered. In this case, the MR or the IPv6 host may require a translation mechanism.
Comparative Analysis of IPv6 VPN Transition in NEMO Environments
489
Fig. 2 demonstrates that when an IPSec tunnel for end-to-end secure channel is formed, the MR sets up connectivity with its own home network via an IPv4 carrier, through a tunneling mechanism supporting 6-to-6 over 4 connectivity. In Fig. 3, IPv6 is supported in a visited network. From a MR perspective, IPv6-based NEMO is made possible, without any transition mechanism. However, a 6-to-6 over 4 tunneling mechanism is required because the Internet still consists of an IPv4 carrier. In contrast to the case in Fig. 2, since the visited network has already used such a tunneling mechanism, a tunneling mechanism supported by its AR_f can be used. This implies that the MR does not require provision of a separate mechanism.
④
Fig. 4. The case of Fig.1_table 1-
⑤
Fig. 5. The case of Fig. 1_table 1-
Fig. 4 demonstrates that two IP versions on the ISP carrier both exist, on whose contact boundary a dual stack network device or a network-based translator can be located. For connectivity required from a NEMO perspective, the MR requires the 6to-6 over 4 tunneling mechanism over the IPv4 carrier to the dual stack network device. Fig. 5 refers to a case where the Internet infrastructure offers IPv6, in which case the Internet is dedicated to IPv6 or supports a dual stack network device. In that case, since only a visited network uses IPv4 version, connectivity with the MR’s home network is possible if a tunneling mechanism is used, such as ISATAP.
3 Evaluation of NEMO VPN with NGtrans Mechanism 3.1 Consideration and NEMO VPN Model In this section, an end-to-end VPN model and a transition mechanism are analyzed in a NEMO environment, from the view of processing cost. Additionally, this section also concerns costs and influences triggered by creating a VPN in the 5 models in the preceding section and IPv6-based NEMO VPN without the transition mechanism. When VPN tunneling is established, the cost of processing according to the IPSec mode and service are included. Likewise, when the NEMO VPN is included in IPv6 transition environments, the performance in data transmission includes the packet translation cost between different IP versions, cost of IP tunneling, and cost of VPN tunneling related to the cryptography process. The packet translation cost between different IP versions refers to the processing cost of translation mechanisms, and the cost of IP tunneling refers to the processing cost of tunneling mechanisms, composed of the various connectivity and tunnel type (i.e., 6-to-6 over 4 or 4-to-4 over 6).
490
H.-J. Lim, D.-Y. Lee, and T.-M. Chung
In this paper, the performance parameters considering the transmission efficiency of the VPN are extracted from previous experimental results [6-11] and the cost effects of each VPN model are evaluated in NGtrans environments. The cost estimation of the VPN constitution is conducted by the per packet cryptography process. In order to evaluate the processing overhead in applying VPN in a transition environment, the following parameters in Table 1, were considered. Table 1. VPN’s Performance Parameter in NGtrans Environment Parameter
Description
Value
CSh
Processing cost for Security header
1.5~7.5
CIP
Processing cost for IP encapsulation header
vcon
means weight value as increase ratio for IPv4 transfer packet amount by the result of TCP payload reduction,
vcon
1.6
v4
1
v6 v4 sec v6 sec v 4 IP v 6 IP
1.0137 1.015~0.0164 1.0287~0.0301 1.0137 1.041
bwa
Bandwidth of the access network in local area network
100Mbps
bwi fg
Bandwidth of the backbone network
E1(1.544Mbps)
Weighted value by a fragmentation on network device
0.1(default)
p kt lp
Transfer packet’s amount per host
64k~1Mbyte
Latency include processing of network and data link layer
500
N
Hop count in public network
20~30hop
n
Hop count within local network
1~3hop
Nn
End-host’s number with VPN flow
1~100flow
μ sec
In the VPN transmission mode, only the transport mode is evaluated. When evaluating the tunnel mode, the expansion of transmission packets by adding the external header, fragmentation and additional cost of cryptography should be considered. In addition, connectivity only using basic transition mechanism operation, in conjunction with NEMO VPN models, is considered when estimating the cost. It is assumed that the configurations for security association and IP tunneling are already established for initiating communication. 3.2 Evaluation of NEMO VPN Model Fig. 1 presents a basic model of NEMO under a pure IPv6 environment, where all network devices and end-hosts only support IPv6. It is assumed that a Gateway-to-Gateway VPN model is applied between the MR and its home network, because of the assumption that the inner MR is a secure section. Additionally, although the IPSec transport mode from the MR end-host can be recommended, the
Comparative Analysis of IPv6 VPN Transition in NEMO Environments
491
transport mode is considered in the only MR, in order to analyze relative VPN performance influence implemented in each model. Both transport mode processing in the end-host, and tunnel mode processing in the MR are used. Therefore, VPN processing costs incurred for transmission packets are evaluated from the end-host in the MR to the end-host located in the MR’s home network as follows:
V6basic = (2Prate(Csh (v6 sec −1) + Cipv6 sec(v6ip −1)(1+ fg)) + (v6ipv6 sec(ti + 2t1)))Nn Reverse tunnel and IPSec tunnel processing is performed on MR and HA. Processing costs required for tunneling are calculated in accordance with Prate(Csh (v6sec −1) + Cip v6sec(v6ip −1)(1 + fg)) . Packets transmitted from an end-host first process the IPSec header in the MR, and then establish the reverse tunnel, in which case additional processing costs are introduced as much as PrateCsh (v6sec −1) and
PrateCipv6sec(v6ip −1) . Since MR and HA perform the functions of a router, fragmentation constant (1 + fg) is additionally considered along with established reverse tunneling. In this case, Prate refers to the number of cryptographic processes for the amount of transmission packets on the basis of an Ethernet payload of 1500bytes, which means Prate = Pkt / MTU [12]. In IPSec, the frequency packet processing may vary depending upon the algorithm (e.g., DES, SHA1, MD5 etc) that is applied to AH and ESP. However, it is assumed that the number per MTU is identical. In particular, additional headers are created by v 6 sec than in the case of the Ethernet MTU basis, as can be compared with that of IPv4. Therefore, local hosts should perform additional processing as PrateCsh (v6sec −1) . Encrypted packets travel through n local network hops and N Internet carrier hops, and finally to the VPN gateway, in which case each hop requires transmission processing costs as much as ti (= (l p + (Pkt / bwi ))N ) and
t1 (= (l p + ( Pkt / bwi ))) , respectively. At this time, weight is placed upon the amount of an increase in packets, in accordance with the version of connectivity and type of tunneling (i.e., VPN tunnel or IP-in-IP tunnel). This means that the local network of the remote host, the Internet carrier, and the home network VPN gateway, incur transmission costs as much as v6ipv6sec . Finally, in accordance with an increase in the VPN end-host, Nn is taken into account . It is assumed that each end-host only has a single flow and has the same increase rate in its packets. Therefore, there is no influence upon the processing costs of individual hosts even if the number of end-hosts increases. In Fig. 2, aside from 6-to6over4 tunneling between MR and AR_h to set up connectivity, a reverse tunnel including an end-to-end VPN tunnel from MR to HA is established, in which case the costs incurred are as follows:
Vm1 = (Prate(Csh (v6 sec −1)(2 + fg) + 2PrateCipv6 sec(v6ip − 1) + 2PrateCipv6secv6ip (v4ip −1)(1 + fg) + v6secv6ipv4ip (ti + t1 ) + v6secv6ipt1 )Nn
492
H.-J. Lim, D.-Y. Lee, and T.-M. Chung
Where MR requires a cost as much as PrateCsh (v6sec −1) + PrateCipv6sec(v6ip −1) to process the reverse tunnel and the VPN tunnel. Next, the 6-to-6 over 4 tunnel processes PrateCipv6secv6ip (v4ip −1)(1 + fg) by the level of increased packet overhead. These packets cross the Internet to reach the access router AR_h, in which each hop incurs costs as much as v6secv6ipv4ip (ti + t1 ) . AR_h also incurs processing costs as much as
PrateCipv6secv6ip (v4ip −1)(1 + fg) to decapsulate 6-to-6 over 4 tunnels. Likewise, since the AR_h is a router, fragmentation constant (1 + fg) is considered. Each hop incurs process costs as much as v6secv6ipt1 during transmission to HA. In Fig. 3, there are 4 tunnel processing places on the path between MR and HA. Its implication is that all the end points of each tunnel (i.e. MR, AR, f, AR_f, Ar-h, HA) should reflect their fragmentation constant.
Vm2 = (2Prate (Csh (v6 sec − 1) + Cip v6 sec (v6ip − 1)(1 + fg) + Cip v6 secv6ip (v4ip − 1)(1 + fg)) + v6secv6ipv4ipti + 2v6secv6ipt1 )Nn Fig. 4 shows that the Internet is composed of two carriers with two different versions. Beyond the boundary, IPv6 connectivity can be maintained. However, reverse tunneling including end-to-end VPN, is established on the MR to HA section.
Vm3 = (PrateCsh (v6sec −1)(2 + fg) + 2PrateCipv6sec(v6ip −1) + 2PrateCipv6secv6ip (v4ip −1)(1+ fg) +v6secv6ipv4ip (t1 +(ti / 2))+v6secv6ip ((ti / 2) +t1 ))Nn Fig. 5 refers to a model that can be established in the prior stages of all IPv6 transition phase environments, in which case MR can form connectivity through 6to6over4 tunnels up to the boundary facing the Internet.
Vm4 = (Prate(Csh +Cip )(v6ipv6sec −1)(2+ fg) + 2PrateCipv6secv6ip (v4ip −1)(1+ fg) +v6secv6ipv4ipt1 +v6secv6ipti +v6secv6ipt1)Nn Such models are considered as V6gtog and V4gtog as a reference point to compare influences of processing performance with those of NEMO VPN models. In addition, the costs incurred in gateway-to-gateway VPNs created between MR and HA were taken as a reference point. In the case of V4gtog , even if it is not defined in the NEMO standard, the IPv4 VPN is compared in a cost analysis of the IPv6-based NEMO VPN in a transition environment.
4 Numerical Results In the previous section, we have evaluated performance costs to set up connectivity in conjunction with VPN models according to each transition phase. In this section, factors affecting performance through numeric analysis for the NEMO VPN model under an IPv6 transition environment, are analyzed. As presented in Table 1, described values are applied to parameters that given are obtained from previous work.
Comparative Analysis of IPv6 VPN Transition in NEMO Environments
493
Although it may raise the problematic from the standpoint of fairness because those parameters are not evaluated on the same environment, we use the parameter values in order to analyze relativity performance overhead for evaluation results. That is, we analyze relativity overhead for the processing performance of each NEMO VPN under a transition environment in contrast with V6gtog and V4gtog as a base line. In addition, it is assumed that a router or host has the same capabilities and resources for processing transmission packets. The performances in VPN connectivity are analyzed according to increasing the amount of transmission packets from 64kbytes to several megabytes. Fig. 6 presents the differences in processing costs of VPN models with an increase in transmission packets from a single host of each model. In the case of Vm2 , it can be seen that processing costs are highest, as are compared with that of V6basic. Vm1 , Vm3 and Vm4 presents differences not only in costs of reverse tunnels, including IPSec tunnels composed between MR and HA, bus also costs according to the length of such tunneling sections. The hierarchical tunnel means a tunnel within a tunnel, while the sequential tunnel means concatenated tunnels. Compared with the case of V4gtog , V6gtog , this suggests that NEMO VPN under transition creates 3 levels of hierarchical tunnels, and represents the reason costs have increased with an increase in transmission packets and hosts joining VPN, as shown in Fig. 7. 160
5000
120 100 80 60
V6basic V_m1 V_m2 V_m3 V_m4 V4gtog V6gtog
4000
P r o c e s s in g c o s t
P r o c e ss in g c o st
4500
V6basic V_m1 V_m2 V_m3 V_m4 V4gtog V6gtog
140
3500 3000 2500 2000 1500
40
1000
20
500
0
0 64
128
192
256
320
384
448
512
576
Transfer packet's amount (Kbyte)
640
704
1
4
7
10
13
16
19
22
25
28
31
End-host's number with VPN flow
Fig. 6. VPN cost with an increase in packets Fig. 7. VPN cost with an increase in end-hosts
In NEMO VPN models, there is no difference in processing costs for an increase in packets due to the length of tunnel transmission and additional packet frequency. This implies that hierarchical tunnel processing causes a delay in transmission with heavy overhead. Fig. 8, demonstrates processing costs incurred under each model are influenced as the fragmentation constant increases. As mentioned in Section 4.2, it is assumed that these fragmentation constants represent the processing capability of a network device processing their packets. Therefore, when the results as shown in Fig. 6 and 7 are expected, the influences due to such processing capability of the network device can be analyzed.
494
H.-J. Lim, D.-Y. Lee, and T.-M. Chung 120000
500
100000
P r o c e s sin g c o s t
V6basic V_m1 V_m2 V_m3 V_m4 V4gtog V6gtog
400 300 200 100
P r o c e s sin g c o s t
600
V6basic V_m1 V_m2 V_m3 V_m4 V4gtog V6gtog
80000 60000 40000 20000
0
0 0.1 0.4 0.7
1 1.3 1.6 1.9 2.2 2.5 2.8 3.1
Fragmentation weighted value
Fig. 8. VPN cost with an increase in fragmentation constant
0.1
0.4
0.7
1
1.3
1.6
1.9
2.2
2.5
2.8
3.1
Fragmentation weighted value
Fig. 9. VPN cost with an increase in fragmentation constant and and hosts
Fig. 8 explains that, as fragmentation constants increase, models such as
V4gtog , V6gtog incur a lower increase in processing costs than NEMO VPNs created in 3 levels of hierarchical tunnels. That is, NEMO VPN models have presented a greater increase due to such hierarchical tunnels. Especially, with regard to the location of tunnel processing, V6basic must cover 2 tunnel end points; Vm1 Vm3 Vm4 3 tunnel end points; Vm2 4 tunnel end points. In this case, it can be seen that influences occurring upon processing costs according to the processing capabilities of network devices. Like Fig. 7, Fig. 9 demonstrates how processing costs are influenced when the frequency of processing packets increases with regard to the number of hosts joining the VPN. In Vm2 with the largest number of processing locations, processing costs increase relative to existing levels. In Fig 8, V6basic incurs more costs over another NEMO VPN model, when fg increases. These results demonstrate that, because the rate in which v6ip increases is higher than that of v4ip , it has more influence upon the costs of fragmentation according to the processing capability of the network device.
5 Conclusion The current network mobility base specification requires that all signaling messages between the MR and the HA must be authenticated by IPsec. The use of IPsec to protect Mobile IPv6 signaling messages is described in detail in the HA-MN IPsec specification by the IETF. In order to protect against a wide range of attacks, using AH and/or ESP between MR and HA is of paramount importance. The purpose of this study is to analyze relative influences of VPN tunnel and IP tunnel upon processing capabilities, based on cryptographic processes in NEMO topology. In particularly, we analyzed VPN scenarios applicable to mobile network environments during IPv6 transition phases. According to the network environment, various NGtrans mechanisms are used for communication with the IPv4 node.
Comparative Analysis of IPv6 VPN Transition in NEMO Environments
495
The correlation between transition mechanisms used in the existing network and transition mechanisms supported by a mobile network can become an important decision factor in establishing secure connectivity with the home network. As discussed previously, there are restrictions in applying VPN in conjunction with transition mechanisms, because there are fundamentally conflicting transition mechanisms and IPSec VPN; the former supports connectivity between different IP versions by modifying the address and format of IP packets, the latter supports the security service by prohibiting the modification of transit packets. Alternatively, hierarchical tunnelbased transitioning solutions may be used when security is a requisite. However, encapsulating IPv6 packets within IPSec payloads, and then within IPv4 packets, results in significant overhead, therefore resulting in degraded performance. In conclusion, as presented in the results, the NEMO VPN composing of hierarchical or sequential tunnels should be proposed only after careful evaluation of security vulnerabilities and performance requirements. Most vendors have been developing IPv6 in the same aspect with our study plan in IPv6-based packet forwarding technologies based on hardware. Likewise, when it comes to the NEMO community, if the configuration of both IP tunnel and VPN tunnel on mobile routers as well as hosts is required under an IPv6 transition environment, the relative influences upon processing capabilities should be considered sufficient. Incidentally, it is recommendable that the future NEMO specification fully reflects how it can be formed from a VPN performance perspective. Also, in relation to multiple tunnels, SA establishment between tunnel end points, reliable tunnel configuration, and IPSec processing burden represents an important problem to overcome for successful wide deployment of mobile networks.
References 1. Gilligan, R. and E. Nordmark, “Transition Mechanisms for IPv6 Hosts and Routers”, RFC2893, August 2000. 2. Thierry Ernst, “Nework Mobility Support Requirement,” work in progress, available at draft-ietf-nemo-requirements.txt, February 2003. 3. Thierry Ernst, “Network mobility in IPv6”, phD Thesis, University Joseph Fourier Grenoble, Francee, available at http://www.inria.fr/rrrt/tu-0714.html, October 2001. 4. Thierry Ernst, “Network Mobility Support Terminology”, work in progress, available at draft-ernst-nemo-terminology.txt, November 2002. 5. P. Thubert, M. Molteni, P. Wetterwald, “IPv4 traversal for MIPv6 based Mobile Routers”, work in progress, available at draft-thubert-nemo-ipv4-traversal-01.txt, May 2003. 6. S. Zeadally, R. Wasseem, I. Raicu, "Comparison of End-System IPv6 Protocol Stacks", IEE Proceedings Communications, Special issue on Internet Protocols, Technology and Applications (VoIP), Vol. 151, No. 3, June 2004. 7. Raicu and S. Zeadally, “Impact of IPV6 on End-user Applications”, IEE/IEEE International Conference on Telecommunications (ICT 2003), Tahiti, Papeete, French Polynesia, February 2003. 8. E. Kandasamy, G. Kurup, T. Yamazaki, “Application Performance Analysis in Transition Mechanism from IPv4 to IPv6”, APAN Conference, Tsukuba, February 2000.
496
H.-J. Lim, D.-Y. Lee, and T.-M. Chung
9. I. Raicu and S. Zeadally, “Evaluating IPv4 to IPv6 Transition Mechanisms”, IEE/IEEE International Conference on Telecommunications (ICT 2003), Tahiti, Papeete, French Polynesia, February 2003. 10. Seiji Ariga, Kengo Nagahashi, Masaki Minami, Hiroshi Esaki, Jun Murai, "Performance Evaluation of Data Transmission using IPSec over IPv6 Networks", INET2000, Yokohama, July 2000. 11. Hyung-Jin. Lim, Yun-Ju Kwon, Tai-Myoung. Chung, “Secure VPN Performance in IP Layers”, The Journal of The Korean Institute of Communication Sciences Vol. 26 No. 11, 2001. 12. Mogul, Jeffery, and Stephen Deering, "Path MTU Discovery". RFC 1191, November 1990.
A Short-Lived Key Selection Approach to Authenticate Data Origin of Multimedia Stream Namhi Kang1 and Younghan Kim2 1
Ubiquitous Network Research Center in Dasan Networks Inc., Yatap-Dong, Bundang-Ku, 463-070 Seongnam, South Korea [email protected] 2 Soongsil University, School of Electronic Engineering, Sangdo 5-Dong 1-1, Dongjak-Ku, 156-743 Seoul, South Korea [email protected]
Abstract. This paper presents an efficient approach to provide data origin authentication service with a multimedia stream application. The proposed approach is intended to achieve not only fast signing/verifying transaction but also low transmission overhead. In particular, the consideration on the performance is one of key issues for increasingly widespread using of wireless communication since there are lots of limitations including scarce network resources, low computing power and limited energy of nodes. To meet such requirements, we take advantage of using a short-lived key(s) that allow an authentication system to overcome the performance degradation caused by applying highly expensive cryptographic primitives such as digital signature. The major concern of this paper, therefore, is to derive an appropriate length of a key and hash value without compromising the security of an authentication system.
1
Introduction
Multimedia streaming technology is already prevailing throughout the Internet. Multicast is regarded as a cost efficient delivery mechanism for multimedia streaming from a source to a group of receivers. However, in contrast to regular unicast communications, multicast is more difficult and has several inherent obstacles that still remain to be solved [1]. This paper is intended to find a way to support data origin authentication for multimedia multicast applications. We focus on the performance consideration that is necessary for both wireline and wireless communication. Especially, computational and transmission overheads are much concern in the lossy and unreliable wireless communication. Data origin authentication (DOA) is commonly achieved by MAC (Message Authentication Code) in unicast [2]. However, it is difficult to apply MAC directly into multimedia streaming over multicast channel owing to a couple of challenges, such as the difficulty of key sharing and the possibility of insider
This work was supported by the Korea Research Foundation Grant. (KRF-2004005-D00147).
M. Gavrilova et al. (Eds.): ICCSA 2006, LNCS 3982, pp. 497–506, 2006. c Springer-Verlag Berlin Heidelberg 2006
498
N. Kang and Y. Kim
coalition to cheat other members. Applying digital signatures instead of MAC can solve these shortcomings since only the source is able to bind its identity to the signature. The trade-off is nevertheless that there exist critical performance problems when an asymmetric cryptography primitive is employed to a multimedia stream. Such problems of most regular digital signature schemes such as RSA and DSA are caused by using a large key size for protecting data in the context of long-term security. It is too ambiguous however to specify how much time is enough for the long-term security. The validity term of security obviously differs depending on the application, the policy or the underlying environment. Multimedia streaming applications have different requirements and properties from the viewpoint of security service. If an application is required to support authentication in order to resist against attacks only for a short period such as a week or a day, then using a key which is long enough for one or two decades is a waste of resources. The best exemplified application is the entertainment contents streaming service (e.g. movies or music), where the recipient discards a received packet as soon as it is consumed. This is widely used scenario in multimedia streaming applications. On the basis of this notion, we propose an efficient approach to support DOA for multimedia streaming using a short-lived but secure key. The proposed approach is not a new authentication scheme in itself, but a framework to employ existing schemes based on digital signature (see section 2). The proposed approach is able to offer the equivalent security with smaller key sizes. This paper is organized as follows. We discuss stream authentication solutions proposed so far in the literature in section 2. In section 3, the lower bound of the Lenstra and Verheul’s suggestion [3] that is the basic theory of this paper is shortly described. In section 4, we propose the way to select a short-lived key in a secure fashion. In section 5, we report some comparison results, and then we conclude this paper in section 6.
2
Multicast Stream Authentication Solutions
Existing streaming authentication solutions that we consider in this paper are divided into two categories: MAC based approaches [4, 5] and digital signature based approaches [4, 6, 7, 8, 9, 10]. In this paper, TESLA [4], WL’ Tree and Star scheme [7], EMSS [4], and SAIDA [9] are discussed and compared in detail. Multicast data origin authentication is intended for all receivers to ensure that the received data is coming from the exact source rather than from any member of the group. The latter is called group authentication in multicast [5]. Therefore, the guarantee of asymmetric property, which means only the signer (the source) can generate authentication information while others (receivers or verifiers) is only able to verify such information, is required. To achieve this asymmetric property, TESLA is very efficient since it requires only three MCA computations per packet resulting also in a low transmission overhead. However, TESLA requires the time synchronization between the source and the receiver at
A Short-Lived Key Selection Approach to Authenticate Data Origin
499
the initial bootstrapping time. On the other hand, the limitation of the MultipleMAC scheme is that there still exist attacks caused by an insider coalition. In digital signature based schemes, asymmetry is a natural property since it uses an asymmetric key pair: a private key for the signer and a public key for the verifier. Most solutions based on a digital signature employ an amortizing signature over several packets [4, 7, 8, 9, 10]. In these schemes, time expensive signing and verifying operations are performed once a block which includes a set of packets. There are two conspicuous differences between them: one is the method of setting a group of packets for amortizing, and the other is the way to achieve the loss tolerant property. With a system applying an amortizing signature, there is a trade-off between the computational cost and the verification delay. As the block size increases, where the block size is the number of packets gathered to set a block for amortizing, the computational time is decreased, whereas the verification delay is increased owing to the waiting time including both for generating a block at the sender and for the arrival of a signature at the receiver side. In addition, packet loss is a critical issue when an authentication system applies an amortizing signature over stream. Wong and Lam proposed star and tree technique (referred to as the WL scheme in this paper) [7] based on Merkle’s hash tree technique [11]. The basic idea of the WL scheme is to sign a block hash which is the top level hash value of the tree representing all packet’s hash in a block so that a signature (called block signature) amortizes all packets in a block. Hash chaining was introduced in [6], where only the first packet including hash of the next packet is digitally signed. Thereafter, the source continuously sends each packet which contains the hash value of its next packet. This scheme is efficient but it is not loss tolerant and the source has to know the entire stream in advance. In order to overcome these shortcomings, several solutions (e.g. [4] and [8]) have been proposed recently. An erasure coding function can be used to achieve space efficiency in streaming authentication under the condition that the predefined number of packets must be arrived at the receiver [9, 10]. The drawback of those schemes is more expensive computational cost caused by applying erasure code than other schemes.
3
Overview of Lenstra and Verheul’s Suggestion
In the proposed approach, the source is intended to support DOA without high performance degradation of a multimedia application. To do so, the source generates a signature using a short key of length which is available only for a predefined short duration. The challenge is to derive an appropriate length of key without compromising the security of an authentication system. Our approach is based on the lower bound of the Lenstra and Verheul’s suggestion [3]. In [3], Lenstra and Verheul recommended appropriate key sizes for both symmetric and asymmetric cryptosystems based on a set of explicitly formulated parameter settings combined with historic data points of attacking on the
500
N. Kang and Y. Kim
cryptosystems . In particular, the computational efforts involved in existing successful attacks, namely a number of MIPS (Million Instructions Per Second) years required for a successful attack, are the major concern. They defined the IM Y (y) as a MIPS-Years considered to be infeasible until year y to derive key size necessary to support computationally equivalent to the same strength offered by the DES in 1982. The computational efforts of 5 × 105 Mips-Year is considered as secure cost to resist software attack on commercial DES in 1982. IM Y (y) is formulated by IM Y (y) = 5 ∗ 105 ∗ 212(y−s)/m ∗ 2(y−s)/b MIPS-Years,
(1)
where s, m, and b are variables to consider environmental factors affecting the security in selection of key length (see [3] in detail). On the basis of equation (1), the lower bound of key size for a symmetric key system (also for a hash function) is derived as the smallest integer that is at least 56 + log2 (IM Y (y))/5 ∗ 105 ). (2) On the other hand, to select the length of the conventional asymmetric key system, they applied the asymptotic run time L[n] of a NFS (Number Field Sieve) combined with a historical fact that a 512 bits modulus was broken in 1999 at the cost of around 104 MIPS year, such that L[2k ] L[2512 ] ≥ , 12(y−1999)/18 104 IMY(y) ∗ 2
(3)
where a factor 212(y−1999)/18 indicates that the cryptanalytic is expected to become twice as effective every 18 months during year 1999 to year y.
4
Our Approach
In [12], we proposed an approach to support secure routing methodology for mobile nodes, where a secure length of key pairs for a digital signature was derived. Yet no consideration on multimedia streaming authentication and cryptographic hash function has previously appeared. In this section, we briefly review the way presented in [12] and extend it for finding a secure length of hash function. 4.1
Secure Key Length Selection for Digital Signature
The IAO(x, y) was defined in [12] as an ‘Infeasible Amount of Operations’ within a certain period x of the year y. IAO(x, y) is derived from the IM Y (y) formulated in equation (1). IAO(1 year, this year) is therefore derived from IM Y (this year). Here, it is emphasized that the term ‘MIPS’ is used as a measuring tool for the relative comparison of computational workload. Like [3], it is also supposed that a single operation requires a single clock cycle. That is, a 2.4 GHz Pentium 4 computer (which is used for our experiments) is regarded as a 2400 MIPS machine. A short but secure length of key was calculated based on the following intuitive corollary.
A Short-Lived Key Selection Approach to Authenticate Data Origin
501
Corollary 1. Let S = {x1 , · · · , xj : x1 + · · · + xj = X} be a set of sub periods of X, and k be the length of a short-lived key. If k meets equation (4), then the length of k is secure enough within xi of the year y, for all xi ∈ S. L[2k ] ≥ 55.6 ∗ 212(y−1999)/18 ∗ IAO(xi , y).
(4)
Proof. We derived equation (4) by applying IAO(X, y) to equation (3), where 512 ] we approximated that L[2 is equivalent to 1.75∗1015. The proof was appeared 104 in our previous work (see [12] in detail).
40
RSA DSA IAO(1 hour):RSA IAO(1 hour):DSA
Time (ms)
Time (ms)
Example: Consider 2004 as the target year (y of IM Y (y)) to protect a system, for example, then it is expected that IM Y (2004)(= 5.98 ∗ 1010 MIPS) is secure until the end of year 2004 according to equation (1). This is equivalent to 1.88 ∗ 1024 operations thus such a number of operations are infeasible until the end of 2004, namely IAO(1 year, 2004) is 1.88 ∗ 1024 . Thereby we can calculate a proper key length for any sub period as described in Table 1. If a day is determined as a short period, the source can use an 880-bit key to sign. This is because IAO(1 day, 2004) is 5.16 ∗ 1021 which is equivalent to 1.64 ∗ 108 MIPS year. 880-bit can be derived by using equation (4). The signature computed with an 880-bit key offers the same security level for a day as does a 1108-bit key for this year and a 2054-bit key for the next two decades. Fig. 1 shows that the cost of a general digital signature scheme depends on the key length, whereas our approach depends on the lifetime of the deriving short-lived key.
30
12
10
DSA RSA IAO(1 hour):DSA IAO(1 hour):RSA
8
6
20
4
10 2
512
640
768
896 1024
1280
1536
(a) Signing cost gain
1792 2048 Key length(bits)
512
640
768
896 1024
1280
1536
1792 2048 Key length (bits)
(b) Verifying cost gain
Fig. 1. Computational cost gain (experiment conditions are same with table 1)
In DSA, it is known that the time it takes to generate a signature is lower than that of signature verification, but the signing cost is similar to verifying cost in our experiment (see Table 1.). Unlike DSA, signature verification is much
502
N. Kang and Y. Kim
faster than signature generation in a RSA setting. It might be claimed that it is advantageous for signing to be the faster operation. However, it may well be more advantageous to have faster verification in multicast, especially one-tomany applications. This is because, in the usual scenario, a content distributor has much higher capabilities, such as fast CPU and large RAM, than those of the receivers. For this reason, we consider RSA as an underlying signature algorithm in this paper. 4.2
Hash Length Selection
Like [10], we also apply a TCR (Target Collision Resistance) hash function into an authentication scheme instead of ACR (Any Collision Resistance) under the assumption that there exist UOWHFs (Universal One-Way Hash Functions). The TCR hash function allows the authentication scheme using amortizing signature to gain much more space efficiency. The major advantage of TCR compared with ACR is that the birthday attack, which is the best attack on ACR hash function families, does not directly apply to TCR hash function since message is specified before the hash function is given an attacker. Unlike TCR, an attacker can select two different messages, M and M´, that map to the same hash, h(M ) = h(M´), in advance with a setting of ACR hash function. As a result, TCR hash function of size L/2 can satisfy with the same secure level to ACR hash function of size L. For example, if an authentication system has a target period of less than 18.75 hours as the xi of IAO(xi , y), then only an 8 byte hash value is sufficient to hold TCR. In particular, the transmission overhead shortcoming of WL’s Tree scheme, where log2 BS hashes per packet are required for authentication, can be reduced by a factor of more than 2. Furthermore, from the viewpoint of computational cost, the source can use a 128-bit hash function (e.g. MD5) for all periods of IAO(xi , y) in the case of applying UOWHF, while the source must use a 160-bit or higher-bit hash function (e.g. SHA1, RIPEMD160, or SHA256) for a period of more than a day in order to protect against birthday attack in an ACR hash function setting. A secure hash length for the predefined lifetime is calculated based on the following theorem 2. If an authentication scheme does not hold TCR property, an authentication system must replace TCR with ACR hash function [10]. With a system applying TCR hash function, for example, there is a secure hole that the source signs on a malicious packet which is a counter part of a collision pair. It is possible because the source knows the TCR hash function to be used in advance. However, this is not the case of DOA but the case of relating to NRO. We recall again that NRO is not addressed in our scenarios, as discussed above. Corollary 2. Let S = {x1 , · · · , xj : x1 + · · · + xj = X} be a set of sub periods of X, and lT CR be the hash length of TCR hash function (generally lT CR = lACR ∗ 1/2). If lT CR meets equation (5), then the length of lT CR is secure enough within xi of the year y, for all xi ∈ S. lT CR ≥ 56 + log2 (
IAO(xi , y) ). 15.75 ∗ 1018
(5)
A Short-Lived Key Selection Approach to Authenticate Data Origin
503
Table 1. Short-lived key and hash length selection including computational cost Period
IAO(xi , y) Key length RSA (msec) DSA (msec) Hash(bits)
xi
(y = 2004)
(bits)
sign/verify
2 decades
7.78 ∗ 1028
2113
51.55 / 1.13 12.96 / 13.55
178 / 89
1 decades
3.81 ∗ 10
26
sign/verify
ACR/TCR
1562
29.31 / 0.65
6.13 / 9.31
162 / 81
1 year
24
1.88 ∗ 10
1108
7.92 / 0.31
3.64 / 4.15
146 / 73
1 Month
1.57 ∗ 1023
1008
5.92 / 0.29
2.87 / 3.14
140 / 70
1 Week
3.67 ∗ 10
22
952
5.37 / 0.27
2.71 / 2.82
136 / 68
1 Day
21
5.15 ∗ 10
880
4.55 / 0.23
2.34 / 2.71
130 / 65
1 Hour
2.15 ∗ 1020
772
3.79 / 0.22
2.18 / 2.23
122 / 61
1.07 ∗ 10
748
3.27 / 0.20
1.93 / 2.0
116 / 58
30 minutes
20
Proof. Corollary 2 follows the proof of corollary 1. Equation (5) is derived by applying IAO(xi , y) to equation (2) in a similar way to calculate equation (4) in corollary 1. In Table 1, we tabulate the length of signature key and hash for each selected period (lifetime) as well as computational cost when the selected key is applied to an authentication solution. To experiment such computation time, we use Crypto++ library [14] on a Linux based 2.4GHz Pentium 4 computer, where the input size is 1024 bytes. 4.3
Remark on Short-Key Management
The source may have to generate several ephemeral key pairs during the session according to the validity term (say lifetime) of selected short key pairs. The lifetime available for secure use is proportional to the strength of the key being used such as the length of RSA modulus. In some scenarios, multiple key pairs for a single session may lead to public key distribution and certification issues since the source should re-key and re-certify the newly distributed key. In particular such difficulty is even more challenging in a large group. However, it is not the case of the proposed approach if the key selection is appropriate according to the property of an application. We note that the time required for most multimedia contents distribution is not long (e.g. 2 hours for a movie, a quarter or half a day for an on-line presentation). In such a scenario, a key pair for a day, namely IAO(1 day, y), can be a good selection without the overburden necessary to redistribute a new public key securely. In this paper, we do not address the issue of key management in detail. Instead we assume that every recipient has ascertained the correctness of a public key corresponding to a private key (secret key) used for packet signing operations in advance by use of key management protocol and public key certification mechanisms such as
504
N. Kang and Y. Kim
PKI (Public Key Infrastructure) or GDOI (Group Domain Of Interpretation) which can be especially used when no PKI is available [13].
5
Performance Evaluation
We evaluate performance efficiencies to compare our approach with four other schemes (i.e. WL’s Tree, EMSS, SAIDA and TESLA that are briefly described in section 2) in terms of computational cost as well as the transmission overhead. We commonly used 1024-bytes as the length of input data to get the computational time of each primitive, and experiments were performed in the same conditions as those of Table 1, namely, 1024 byte input, Crypto++ library on a 2.4GHz machine. To do an evaluation, a decade and a day were set as the security level (say the lifetime of a key) for existing schemes and our approach, respectively. In the case of SAIDA, we assumed up to 50% losses per block. In addition, we set 6 edges for EMSS scheme (namely, 6 hashes per packet). Fig. 2 shows that the proposed approach is very efficient with respect to computational cost at the source side (see (a) in Fig.2) and at the receiver side (see (b) in Fig.2) as well, resulting from the use of a short-lived key. From the viewpoint of computational cost, SAIDA is more expensive than others using an amortizing signature owing to the additional cost for applying an erasure coding function. EMSS and WL’s Tree require similar computational cost. In particular, (a) of Fig 3 shows that our approach applied to EMSS and WL’s Tree schemes require lower computational cost than even TESLA, which uses fast MAC as the underlying primitive, in the case where group members are widely dispersed throughout the Internet so that each packet experiences a different delay in
0.35 SAIDA
Computational cost (msec)
Computational cost (msec)
4.5 4 3.5 3 Tree(WL) EMSS
2.5 2
SAIDA IAO(1day)::SAIDA Tree(WL) EMSS IAO(1day)::Tree IAO(1day)::EMSS
0.3
0.25
0.2
0.15 IAO(1day) with SAIDA
1.5 IAO(1day) with Tree(WL) IAO(1day) with EMSS
0.1
1 0.05
0.5
0
20
40
60
80
100
120
0
20
40
60
80
BS (Block Size)
(a) Signing cost
100
120
BS (Block Size)
(b) Verification cost
Fig. 2. Computational cost comparison
A Short-Lived Key Selection Approach to Authenticate Data Origin 450 Transmission overhead per packet (byte)
0.6 Computational cost (msec)
505
0.5 IAO(1day)::Tree IAO(1day)::EMSS IAO(1day)::SAIDA TESLA
0.4
0.3
400 Tree(WL) IAO(1day)::Tree EMSS IAO(1day)::EMSS SAIDA IAO(1day)::SAIDA TESLA
350
300
250
200
150
0.2
100
0.1 50
0
1
2
3
4
5
6
7
8
9
10
0
20
40
60
Number of instance
(a) IAO(1 day, y) vs. TESLA
80
100
120
BS (Block Size, except TESLA)
(b) Space overhead
Fig. 3. Computing and space overhead comparison
transit. To compensate for the different arrival time in a heterogeneous multicast network, TESLA uses multiple key chains; the source calculates several MACs for a packet with different keys and each key will be sent with a different delaying lag. The number of instance (x-axis of (a) in Fig. 3) indicates the number of different delay groups. In Fig. 3, (b) illustrates the space overhead of each solution versus the number of packets per block, where TESLA (indicated by a dotted line with star point) is exception. In the case of TESLA, each point on the line shows the effect of the number of instances on the space overhead (2,4,6,8, and 10 respectively). From the aspect of the transmission overhead, unlike the comparison of computational cost, SAIDA is the best. The Tree scheme is likely to be limited for use in practice due to a high overhead. However, the Tree scheme allows the receiver to verify a packet immediately and individually, regardless of the loss pattern. In the case of applying IAO to Tree, the space overhead is reduced by a factor of 2 at a minimum. As shown in the figure, EMSS and SAIDA with IAO(1 day, y) become more space efficient than TESLA in the scenario where there are more than four delay groups in multicast and the source amortizes a signature over more than 40 packets.
6
Conclusion
In this paper, we first discussed a couple of existing solutions to support data origin authentication for multicast stream. Then we proposed an efficient approach using a short-lived key. To do so, we also addressed how to calculate an appropriate length of key and hash value for a predefined short lifetime. The results show that our approach is a very efficient way to enhance the performance of stream authentication solutions.
506
N. Kang and Y. Kim
References 1. C. Diot, B. Levine, B. Lyles, H. Kassem and D. Balensiefen, ”Deployment Issues for the IP Multicast Service and Architecture,” IEEE Network magazine, January/February 2000. 2. S. Kent and R. Atkinson. Security Architecture for the Internet Protocol. IETF RFC2401, Nov. 1998. 3. Arjen K. Lenstra and Eric R. Verheul. Selecting cryptographic key sizes. Journal of Cryptology 14(4):255-293, 2001. 4. A. Perrig, R. Canetti, J. D. Tygar and D. Song. Efficient Authentication and Signing of Multicast Streams over Lossy Channels. IEEE Security and Privacy Symposium, 2000. 5. R. Canetti, J. Garay, G. Itkis, D. Micciancio, M. Naor and B. Pinkas. Multicast Security: A Taxonomy and Some Efficient Constructions. Infocom’99, 1999. 6. R. Gennaro and P. Rohatgi. How to Sign Digital Streams. Lecture Notes in Computer Science, vol. 1294, pages 180-197, 1997. 7. C. K. Wong and S. S. Lam. Digital Signatures for Flows and Multicasts. In Proc. IEEE ICNP ’98, 1998. 8. P. Golle and N. Modadugu. Authenticating streamed data in the presence of random packet loss. NDSS’01, pages 13-22, Feb. 2001. 9. Jung Min Park and Edwin K. P. Chong. Efficient multicast stream authentication using erasure codes ACM Trans. Inf. Syst. Secur. 6(2):258-285, 2003. 10. C. Karlof, N. Sastry, Y. Li, A. Perrig, and J.D. Tygar. Distillation Codes and Applications to DoS Resistant Multicast Authentication. NDSS 2004. 11. R. Merkle. Protocols for public key cryptosystems. In Proceedings of the IEEE Symposium on Research in Security and Privacy, pages 122-134, Apr. 1980. 12. N. Kang, I. Park, and Y. Kim. Secure and Scalable Routing Protocol for Mobile Ad-hoc Networks, Lecture Notes in Computer Science, vol 3744, Oct. 2005. 13. Baugher, M., Weis, B., Hardjono, T. and H. Harney, The Group Domain of Interpretation, RFC 3547, July 2003. 14. Crypto++ class library, http://www.eskimo.com/ weidai/cryptlib.html
Weakest Link Attack on Single Sign-On and Its Case in SAML V2.0 Web SSO Yuen-Yan Chan Department of Information Engineering, The Chinese University of Hong Kong, Shatin, N.T., Hong Kong [email protected]
Abstract. In many of the single sign-on (SSO) specifications that support multitiered authentication, it is not mandatory to include the authentication context in a signed response. This can be exploited by the adversaries to launch a new kind of attack specific to SSO systems. In this paper, we propose the Weakest Link Attack, which is a kind of parallel session attack feasible in the above settings. Our attack enables adversaries to succeed at all levels of authentication associate to the victim user by breaking only at the weakest one. We present a detailed case study of our attack on web SSO as specified in Security Assertions Markup Language (SAML) V2.0, an OASIS standard released in March, 2005. We also suggest the corresponding repair at the end of the paper.1
1
Introduction
Authentication is almost unavoidable in performing business processes over the Internet. Very often, users need to sign-on to multiple systems within one transaction. What makes it worse is that different credentials and authentication methods are often involved. System administrators also face the challenge of managing the increasing number of user accounts and passwords. One solution for this issue is Single Sign-On (SSO), a technique that enables a user to authenticate once and gain access to the resources of multiple systems. In order to facilitate SSO, enterprises unite to form circles of trust, or federations. They do so by establishing business agreements, cryptographic trust, and user identities across security and policy domains. Nowadays, federation and SSO become the dominant movement in identity management in electronic business. To this end, standards and specifications are published by renown organizations to provide standardized mechanisms and formats for the communication of identity information within and across federations. The most influential one is the Security Assertion Markup Language (SAML) established by the Organization for Advancement of Structured Information Standards (OASIS), which is an XML-based framework for communicating user authentication and attribute 1
This piece of work is financially supported by Hong Kong Earmarked Grants 423203E and 4328-02E of Hong Kong Research Grant Council.
M. Gavrilova et al. (Eds.): ICCSA 2006, LNCS 3982, pp. 507–516, 2006. c Springer-Verlag Berlin Heidelberg 2006
508
Y.-Y. Chan
information. Its first version, SAML V1.0 [2], was published in November, 2002. Since then, OASIS has released subsequent versions of SAML and the latest one is SAML V2.0 [3] that released in March, 2005. Today, SAML has been broadly implemented in major enterprise Web server and application server products. Examples include BEA WebLogic ServerTM 9.0, Sun Java System Application Server PlatformTM Edition 8.1, IBM WebSphere Application ServerTM 6.0, as well as commercial products for SSO and identity federation solutions such as Computer Associates eTrust Single Sign-OnTM and IBM Tivoli Federated Identity ManagerTM. 1.1
Related Work
Related works on security analysis of SSO include the following. In the OASIS Standard [6], a number of threats to the SAML V2.0 web SSO specification are documented, in which countermeasures that heavily depend on SSL/TLS as well as system level mechanisms are highlighted. These threats include message eavesdropping, replay and modification, man-in-the-middle attacks, and denial of service attacks. As of the completion of this paper, there are no other attacks reported on this latest SAML version yet. Groß analyzed the security of SAML V1.1 SSO Browser/Artifact Profile [1] and revealed three attacks, namely the connection hijacking attack, man-in-the-middle attack, and the HTTP referrer attack. This caused OASIS SSTC2 to response with a document recently [7] and restate the security scope of SAML. A few work with a different approach, such as Hansen et. al. who validated SAML V1.1 SSO security using static analysis and reported several flaws in some SSO instantiations [9]. Some work has also been done on security analysis of Liberty SSO. For example, Pfitzmann et. al. presented a man-in-the-middle attack of Liberty ID-FF V1.1 SSO with enabled client [8]. Liberty corrected the flaw in ID-FF V1.1 with the countermeasure proposed by the authors. 1.2
Our Contribution
In this paper, we present the weakest link attack, which is a parallel session attack specific to SSO systems. We start by an attack hypothesized in a generic SSO scenario where multitiered authentication is in place. Then we examine the attack hypothesis by exploiting it on SAML V2.0 web SSO and we succeed. We further generalize our attack and propose a repair for its elimination. Since OASIS SSTC position SAML as a common XML framework, and emphasize that SAML protocols and implementations are designed to operate in a broader context in conjunction with other protocols and mechanisms [6], we do not regard we have broken SSO in SAML and its related specifications. Nevertheless our paper achieves significant attack results and contributes valuable information and considerations to future implementation of SSO in general. 2
Security Services Technical Committee.
Weakest Link Attack on SSO and Its Case in SAML V2.0 Web SSO
1.3
509
Paper Organization
After an overview of SSO and SAML in Section 2, the rest of the paper is arranged in a four-step style: 1. Attack Hypothesis. We will provide the definition and the hypothesis of the attack in Section 3. 2. Attack Exploitation. We will exploit the hypothetical attack on a default instantiation of SAML V2.0 web SSO in Section 4. 3. Attack Generalization. Based on the results obtained, we will generalize the attack over generic SSO scenarios in Section 5. 4. Attack Elimination. Lastly, repair to the flaw in the SSO specification will be provided in Section 6. and the paper will be concluded in Section 7.
2 2.1
An Overview on Single Sign-On and SAML Single Sign-On
Single Sign-On is mechanism which enables a user to authenticate virtually once and gain access to the restricted resources of multiple systems. Typically SSO involves three principals: – User U . Who accesses the network and makes use of the system resources for any purposes. In practice, the user is often represented by a user agent. – Service provider SP. Who offers restricted services which are only available to authenticated principals. – Identity provider IdP. Who creates, maintains, and manages identity information for principals and provides principal authentication to other service providers. We depict a generic SSO scenario in Fig. 1. 2.2
Security Assertions Markup Language (SAML)
The SAML Standard consists of a set of XML schemas and specifications, which together define how to construct, interchange, interpret, and extend security assertions for a variety of purposes. The major ones include web Single Sign-On (web SSO), identity federation, and attribute-based authorization. The Standard defines a set of assertions, protocols, bindings, and profiles [3]. An assertion is a piece of information that provide one or more statements made by an SAML authority, and SAML protocols specify how to generate messages and exchange them between requesters and responders. SAML protocol bindings [4] are mappings from SAML message exchanges into standard communication protocols such as SOAL and HTTP. The SAML profiles [5] define a set of rules to use SAML assertions and protocols to accomplish specific purposes, in particular, the Web Browser SSO Profile defines how web SSO is supported.
510
Y.-Y. Chan
Fig. 1. Single Sign-On
2.3
SAML V2.0 Web SSO
A default instantiation3 of web SSO according to the specification in SAML V2.0 Web Browser SSO Profile [5] is illustrated in Fig. 2.
Fig. 2. SAML web SSO
2.4
Basic Security Measures of SAML V2.0 web SSO
Security measures in SAML Web Browser SSO Profile include [6] the use of timestamps (IssueInstant) to indicate message freshness, the requirement of IdP’s digital signature on
This refers to an instantiation enforcing mandatory (i.e. [Required]) elements and attributes only.
Weakest Link Attack on SSO and Its Case in SAML V2.0 Web SSO
3
511
Attack Hypothesis
Definition 1 (Weakest Link Attack). A weakest link attack is a parallel session attack in single sign-on settings where an adversary launches concurrent service requests at two service providers that require different authentication levels, only at one of which the adversary can pass. The adversary makes use of the response corresponds to the successful authentication issued by the identity provider to replace that of the failing one, so that both requests are accepted. Consider an adversary Adv who has broken the level-L authentication context of a user U. Under SSL/TLS, the power of Adv is confined to: 1. Authenticate as U at level L. 2. Initiate arbitrary number of concurrent conversations with any other legitimate principals. 3. Obtain messages which U is the legitimate receiver. 4. Send messages which U is the legitimate sender. 5. Obtain and redirect messages which U is the legitimate redirector. 6. Alter plaintext messages which U is the legitimate receiver or redirector. Hypothesis 1 (Weakest Link Attack). Adversary Adv can succeed in the weakest link attack provided all of the following conditions are satisfied: C1. There exists two or more service providers relying on a single identity provider for user authentication. C2. Multitiered authentication is in place. C3. Either one or both of the following conditions are true: a. The level of authentication or/and the designated service provider is/are not indicated in the response. b. There is no integrity for the response message. C4. User redirection for request and response messages is required. Remark 1. The first condition is common in SSO practices, especially when the service providers are providing complementary services. E.g. an airline company and a car rental service company sharing a common travelers’ club as their identity provider. The second condition is reasonable as requirements of authentication contexts diverge among service providers. 3.1
Weakest Link Attack in Hypothetic SSO
We describe how the hypothetic attack (Hypothesis 1) takes place using the SSO depicted in Fig. 1 as an hypothetic scenario. We assume the following settings: consider Adv to be an adversary described above. There are two service providers, SP1 and SP2, who both rely on the same identity provider IdP for authentication and assertion services. Suppose SP1 requires an authentication context at level-H and SP2 requires that at level-L where H > L (for example, assume H indicates an X.509 PKI-based authentication and L indicates a general username password authentication). The hypothetic attack is shown in Fig. 3.
512
Y.-Y. Chan
Fig. 3. Weakest Link Attack on Single Sign-On
3.2
Attack Damage
As we seen from the above illustration, the attack results in an acceptance of service request made by Adv who fails to satisfy the authentication context requirement of SP1 originally. If we consider Adv as an adversary who breaks only the victim user’s password, Adv in this case is as powerful as having forged an X.509 digital signature! In the other words, the weakest link attack enables an adversary to succeed in an authentication at a higher-level when she has only succeeded in breaking the one at a lower-level.
4
Weakest Link Attack on SAML V2.0 Web SSO
We now exploit our attack to a web SSO instantiated according to the specification in SAML V2.0 [3]. The instantiation only implements all mandatory procedures and elements as specified in the Standard, except the SPs specify the required authentication levels in AuthnContextClassRef in additional. Remark 2. AuthnContextClassRef is contained in
Attack on SAML V2.0 Web SSO
An attack on an instantiation of SAML V2.0 web SSO is depicted in Fig. 4. The attack settings including the adversary power are the same as those described
Weakest Link Attack on SSO and Its Case in SAML V2.0 Web SSO
513
Fig. 4. Weakest Link Attack on SAML V2.0 web SSO
in Section 3. As per the authentication context requirements, SP1 specifies "X509" in AuthnContextClassRef while SP2 specifies "Password" in AuthnContextClassRef, which indicates a requirement of X.509 challenge and response authentication and a user name password authentication respectively. As seen from Fig. 4, Adv succeeds in both service requests at SP1 and SP2 in the end. SP1 accepts the replay
5
Results and Attack Generalization
The weakest link attack is successfully exploited in a default instantiation of SAML V2.0 web SSO. We further generalize our attack into the theorem below: Theorem 1 (Generalization of Weakest Link Attack). In a generic SSO system where underlying SSL/TLS is in place, the weakest link attack enables
514
Y.-Y. Chan
an adversary to succeed at all authentication levels associate to a victim user by breaking only the weakest authentication context. The attack is attainable if the following conditions are met in the corresponding SSO protocol: 1. User redirection for request and response messages is required. 2. At least one of the following conditions in the response message is true: (a) The level of authentication is not indicated or implied. (b) The subject being authenticated is not indicated or implied. (c) There is no integrity for the message segment that binds the subject and the authentication level. Proof. (A sketch.) Adv runs parallel authentication requests at two SPs and obtains
6
Repair
We propose the repair to the generalized weakest link attack below: Corollary 1 (Repair to Weakest Link Attack). The weakest link attack can be prevented when all of the followings are mandatory: 1. Include the indicator of the required authentication level in the response. 2. Include the indicator of the authentication subject in the response. 3. Sign the response message. Repair for SAML V2.0 web SSO. We suggest the following modifications: 1. Make the AuthnContextClassRef attribute as well as its container
7
Conclusion
In the above sections, we have presented a new parallel session attack specific to SSO. We started by an attack hypothesized in a generic SSO scenario. And then we evaluated the attack hypothesis by exploiting it on SAML V2.0 web SSO and
Weakest Link Attack on SSO and Its Case in SAML V2.0 Web SSO
515
we succeeded. We generalized the attack into a theorem and proposed the corresponding repair. Due to the scope of SAML, we do not regard we have broken the SSO in SAML and its related specifications. Nevertheless we achieved significant attack results and contributed valuable information and considerations to SSO practitioners.
References 1. Thomas Groß. Security analysis of the saml single sign-on browser/artifact profile. In Proceedings of the 19th Annual Computer Security Applications Conference, December 2003. 2. OASIS SSTC. Assertions and Protocols for the OASIS Security Assertion Markup Language (SAML), November 2002. 3. OASIS SSTC. Assertions and Protocols for the OASIS Security Assertion Markup Language (SAML) V2.0, March 2005. 4. OASIS SSTC. Bindings for the OASIS Security Assertion Markup Language (SAML) V2.0, March 2005. 5. OASIS SSTC. Profiles for the OASIS Security Assertion Markup Language (SAML) V2.0, March 2005. 6. OASIS SSTC. Security and Privacy Considerations for the OASIS Security Assertion Markup Language (SAML) V2.0, March 2005. 7. OASIS SSTC. SSTC Response to ”Scurity Analysis of the SAML Single Sign-on Browser/Artifact Profile”, July 2005. 8. Birgit Pfitzmann and Michael Waidner. Analysis of liberty single-signon with enabled clients. IEEE Internet Computing, 7(6):38–44, November 2003. 9. Jakob Skriver Steffen M. Hansen and Hanne Riis Nielson. Using static analysis to validate the saml single sign-on protocol. In Proceedings of the 2005 Workshop on Issues in the Theory of Security, pages 27–40, January 2005.
516
A
Y.-Y. Chan
Example of the Suggested
Fig. 5. Example of the Suggested
An Inter-domain Key Agreement Protocol Using Weak Passwords Youngsook Lee, Junghyun Nam, and Dongho Won Information Security Group, Sungkyunkwan University, Korea {yslee, jhnam, dhwon}@security.re.kr Abstract. There have been many protocols proposed over the years for password authenticated key exchange in the three-party scenario, in which two clients attempt to establish a secret key interacting with one same authentication server. However, little has been done for password authenticated key exchange in the more general and realistic four-party setting, where two clients trying to establish a secret key are registered with different authentication servers. In fact, the recent protocol by Yeh and Sun seems to be the only password authenticated key exchange protocol in the four-party setting. But, the Yeh-Sun protocol adopts the so called “hybrid model”, in which each client needs not only to remember a password shared with the server but also to store and manage the server’s public key. In some sense, this hybrid approach obviates the reason for considering password authenticated protocols in the first place; it is difficult for humans to securely manage long cryptographic keys. In this paper, we propose a new protocol designed carefully for four-party password authenticated key exchange that requires each client only to remember a password shared with its authentication server. To the best of our knowledge, our new protocol is the first password-only authenticated key exchange protocol in the four-party setting. Keywords: Key exchange, authentication, password, public key.
1
Introduction
Key exchange protocols are cryptographic protocols enabling two or more parties communicating over a public network to establish a common secret key that should be known only to the parties at the end of a protocol execution. This secret key, commonly called a session key, is then typically used to build confidential or integrity-protected communication channel between the parties. This means that key exchange should be linked to authentication so that each party has assurance that a session key is in fact shared with the intended party, and not an imposter. Hence, authenticated key exchange is of fundamental importance to anyone interested in communicating securely over a public network; even if
This work was supported by the Korean Ministry of Information and Communication under the Information Technology Research Center (ITRC) support program supervised by the Institute of Information Technology Assessment (IITA). Corresponding author.
M. Gavrilova et al. (Eds.): ICCSA 2006, LNCS 3982, pp. 517–526, 2006. c Springer-Verlag Berlin Heidelberg 2006
518
Y. Lee, J. Nam, and D. Won
it is computationally infeasible to break the cryptographic algorithm used, the whole system becomes vulnerable to all manner of attacks if the keys are not securely established. Achieving any form of authentication in key exchange protocols inevitably requires some secret information to be established between the communicating parties in advance of the authentication stage. Cryptographic keys, either secret keys for symmetric cryptography or private/public keys for asymmetric cryptography, may be one form of the underlying secret information pre-established between the parties. However, these high-entropy cryptographic keys are random in appearance and thus are difficult for humans to remember, entailing a significant amount of administrative work and costs. Eventually, it is this drawback that password-based authentication came to be widely used in reality. Passwords are drawn from a relatively small spaces like a dictionary and are easier for humans to remember than cryptographic keys with high entropy. Bellovin and Merritt [5] was the first to consider how two parties, who only share a weak, low-entropy password, and who are communicating over a public network, authenticate each other and agree on a high-entropy cryptographic key to be used for protecting their subsequent communication. Their protocol, known as encrypted key exchange, or EKE, was a great success in showing how one can exchange password authenticated information while protecting poorlychosen passwords from the notorious off-line dictionary attacks. Due in large part to the practical significance of password-based authentication, this initial work has been followed by a number of two-party protocols (e.g., [6, 3, 14, 29, 13]) offering various levels of security and complexity. While two-party protocols for password authenticated key exchange (PAKE) are well suited for client-server architectures, they are inconvenient and costly for use in large scale peer-to-peer systems. Since two-party PAKE protocols require each potential pair of communication users to share a password, a large number of users results in an even larger number of potential passwords to be shared. It is due to this problem that three-party models have been often considered in designing PAKE protocols [12, 24, 17, 20, 1]. In a typical three-party setting, users (called clients) do not need to remember and manage multiple passwords, one for each communicating party; rather, each client shares a single password with a trusted server who then assists two clients in establishing a session key by providing authentication services to them. However, this convenience comes at the price of users’ complete trust in the server. Therefore, whilst the three-party model will not replace the two-party model, it offers easier alternative solutions to the problem of password authenticated key exchange in peer-to-peer network environments. One option for designing three-party PAKE protocols is to use a hybrid model in which each client stores at least one public key of its authentication server in addition to sharing a password with the server [22, 20, 28, 25, 7, 26]. It is probably fair to say that the hybrid model is mostly used because it provides an easy way to prevent both off-line password guessing attack and undetectable on-line password guessing attack [9]. However, the hybrid setting suffers from the disad-
An Inter-domain Key Agreement Protocol Using Weak Passwords
519
vantage that clients must store a public key for each server to whom they wish to authenticate. If a client will need to deal with multiple servers, the client must store multiple public keys. In some sense, this obviates the reason for considering password authenticated protocols in the first place; human users cannot remember and is difficult to securely manage long cryptographic keys. This drawback has motivated research on password-only protocols [18, 19, 21] in which clients need to remember and manage only a short password shared with the server. Up to now, most of literature discussing the problem of password authenticated key exchange focused their attention to the two-party model or the threeparty model. While the three-party model seems to provide a more realistic approach in practice than the two-party model in which clients are expected to store multiple passwords, it is still restrictive in the sense that it assumes that two clients are registered and authenticated by the same server. In reality, the authentication server of one client may be different from that of another client. Indeed, it is a typical environmental assumption [16] that a client registers with the server in its own realm and trusts only its own server, not the servers in other realms. In this case, how to efficiently authenticate a client who is registered with the server in the other realm becomes an important issue. This paper considers password authenticated key exchange in the interdomain distributed computing environment just mentioned above. We propose a new inter-domain PAKE protocol which involves four participants: two clients and two authentication servers, one for each client. To the best of our knowledge, the recent protocol by Yeh and Sun [27] is the only PAKE protocol in the fourparty model. However, the Yeh-Sun protocol employs the hybrid model and so each client in the protocol needs to manage the server’s public key in addition to remembering a password shared with the server. In fact, our construction seems to be the first four-party PAKE protocol that requires each client only to share a password with its authentication server.
2
Protocol Preliminaries
We begin with the requisite definitions. There are four entities involved in the protocol: two clients A and B, and two authentication servers SA and SB respectively of A and B. We denote by IDA , IDB , IDSA , and IDSB , the identities of A, B, SA, and SB, respectively. Computational Diffie-Hellman (CDH) Assumption. Let g be a fixed generator of the finite cyclic group Z∗p . Informally, the CDH problem is to compute g ab given g a and g b , where a and b were drawn at random from {1, . . ., |Z∗p |}. Roughly stated, Z∗p is said to satisfy the CDH assumption if solving the CDH problem in Z∗p is computationally infeasible for all probabilistic polynomial time algorithms. Symmetric Encryption Scheme. A symmetric encryption scheme is a triple of polynomial time algorithms Γ = (K, E, D) such that:
520
Y. Lee, J. Nam, and D. Won
– The key generation algorithm K is a randomized algorithm that returns a key k. Let Keys(Γ ) be the set of all keys that have non-zero probability of being output of K. – The encryption algorithm E takes as input a key k ∈ Keys(Γ ) and a plaintext m ∈ {0, 1}∗. It returns a ciphertext Ek (m) of m under the key k. This algorithm might be randomized or stateful. – The deterministic decryption algorithm D takes as input a key k ∈ Keys(Γ ) and a purported ciphertext c ∈ {0, 1}∗. It returns Dk (c), which is a plaintext m ∈ {0, 1}∗ or a distinguished symbol ⊥. The return value ⊥ indicates that the given ciphertext c is invalid for the key k. We say, informally, that a symmetric encryption scheme Γ is secure if it ensures confidentiality of messages under chosen-ciphertext attack (CCA) and guarantees integrity of ciphertexts [23]. As shown in [2, 15], this combination of security properties implies indistinguishability under CCA which, in turn, is equivalent to non-malleability [10] under CCA. Signature Scheme. A digital signature scheme is a triple of algorithms Σ = (G, S, V) such that: – The probabilistic key generation algorithm G, on input a security parameter 1 , outputs a pair of matching public and private keys (P K, SK). – The signing algorithm S is a probabilistic polynomial time algorithm that, given as input a message m and a key pair (P K, SK), outputs a signature σ of m. – The verification algorithm V is a polynomial time algorithm that on input (m, σ, P K), outputs 1 if σ is a valid signature of the message m with respect to P K, and 0 otherwise. We say that a signature scheme Σ is secure if the probability of succeeding with an existential forgery under adaptive chosen message attack [11] is negligible for all probabilistic polynomial time attackers. Initialization. During some initialization phase, two servers SA and SB agree on the following public parameters: a large prime p, a generator g of Z∗p satisfying the CDH assumption, a one-way hash function H, a secure symmetric encryption scheme Γ = (K, E, D), and a secure signature scheme Σ = (G, S, V). In addition, public/private key pairs are generated for each server by running the key generation algorithm G(1 ). We denote by (P KX , SKX ) the public/private keys of the server X for X ∈ {SA, SB}. As part of the initialization, the client A (resp. B) registers with server SA (resp. SB), by choosing P WA (resp. P WB ) and sending it to the server via a secure channel.
3
A Four-Party Password Authenticated Key Agreement Protocol
In this section we present a new four-party password authenticated key agreement protocol in which two clients wishing to agree on a session key do not need
An Inter-domain Key Agreement Protocol Using Weak Passwords
A
SA
SB
521
B
IDA , IDSA , IDB IDB , IDSB , IDA
IDA , IDB , IDSB
cSA = EP WA (rSA )
IDB , IDA , IDSA
s ∈R Z∗p
t ∈R Z∗p
rSA = g s
rSB = g t
cSB = EP WB (rSB )
rA = g a , a ∈R Z∗p
rB = g b , b ∈R Z∗p
kA = g as rA , uA = H(I|kA )
kB = g bt rB , uB = H(I|kB ) kA = g as ?
uA = H(I|kA )
kB = g bt ?
uB = H(I|kB )
I, rA , σSA = SSKSA (I|rA ) rB , uSA = H(I|kA |rA |rB )
I, rB , σSB = SSKSB (I|rB )
rA , uSB = H(I|kB |rB |rA )
K = g ab
K = g ab sk = H(K|I), I = IDA |IDB |IDSA |IDSB
Fig. 1. Four-party password authenticated key agreement protocol
to store any public key of their authentication server but only need to share a short, easy-to-remember password with the server. In describing the protocol, we will omit ‘mod p’ from expressions for notational simplicity. The proposed protocol is outlined in Fig. 1 and a more detailed description is as follows: 1. Two clients A and B first need to inform each other of their respective authentication server. To this end, A sends to B the message IDA , IDSA , IDB and B sends to A the message IDB , IDSB , IDA . 2. The clients request the assistance of their respective server in establishing a session key between them. Client A (resp. B) does this by sending the message IDA , IDB , IDSB (resp. IDB , IDA , IDSA ) to the server SA (resp. SB). 3. Server SA chooses a random number s ∈R Z∗p , computes rSA = g s and cSA = EP WA (rSA ), and sends the ciphertext cSA to A. Similarly, server SB chooses a random number t ∈R Z∗p , computes rSB = g t and cSB = EP WB (rSB ), and sends the ciphertext cSB to B.
522
Y. Lee, J. Nam, and D. Won
4. After receiving cSA , client A recovers rSA by decrypting cSA , i.e., rSA = DP WA (cSA ), and chooses a random number a ∈R Z∗p . Given rSA and a, client A computes the one time key kA to be shared with the server SA as kA = g as = (rSA )a . Additionally, A computes rA = g a and uA = H(IDA |IDB |IDSA |IDSB |kA ). Then A sends the message rA , uA to the server SA. Meanwhile, client B, having received cSB , computes rSB = DP WB (cSB ) and chooses a random number b ∈R Z∗p . From rSB and b, B computes the one time key kB to be shared with SB as kB = g bt = (rSB )b . B also computes rB = g b and uB = H(IDA |IDB |IDSA |IDSB |kB ) and then sends rB , uB to SB. 5. Upon receiving rA , uA , server SA first computes the one time key kA = g as shared with A and then verifies that uA from A equals the hash value H(IDA |IDB |IDSA |IDSB |kA ). If the verification fails, SA stops executing the protocol; otherwise, SA believes that client A is genuine. SA then sends the message IDA , IDB , IDSA , IDSB , rA , σSA to the server SB, where σSA is the signature on IDA |IDB |IDSA |IDSB |rA generated by the singing algorithm S using the private key SKSA , namely, σSA = SSKSA (IDA |IDB |IDSA |IDSB |rA ). Similarly, upon receiving rB , uB , server SB computes the one time key kB = g bt shared with B and verifies that uB from B equals H(IDA |IDB | IDSA |IDSB |kB ). If the verification fails, SB aborts the protocol; otherwise, SB believes client B as authentic. Then SB sends the message IDA , IDB , IDSA , IDSB , rB , σSB to SA, where σSB is the signature on IDA |IDB | IDSA |IDSB |rB generated by S using the private key SKSB , namely, σSB = SSKSB (IDA |IDB |IDSA |IDSB |rB ). 6. After receiving IDA , IDB , IDSA , IDSB , rB , σSB , SA first verifies the signature σSB using the public key P KSB . SA halts immediately if the verification fails. Otherwise, SA computes uSA = H(IDA |IDB |IDSA |IDSB |kA |rA |rB ) and sends rB , uSA to client A. The server SB, upon receiving IDA , IDB , IDSA , IDSB , rA , σSA , verifies the signature σSA , and if correct, computes uSB = H(IDA |IDB |IDSA |IDSB |kB |rB |rA ) and sends rA , uSB to B.
An Inter-domain Key Agreement Protocol Using Weak Passwords
523
7. The client A checks whether uSA from SA equals H(IDA |IDB |IDSA |IDSB | kA |rA |rB ). If this is untrue, A aborts the protocol. Otherwise, A computes the common secret value K as a K = g ab = rB .
Similarly, client B verifies that uSB from SB equals H(IDA |IDB |IDSA | IDSB |kB |rB |rA ), and if the verification succeeds, computes the common secret value K as b K = g ab = rA . Finally, the clients compute their session key sk as sk = H(K|IDA |IDB |IDSA |IDSB ).
4
Security Analysis
In this preliminary version of the paper, we only provide a heuristic security analysis of the proposed protocol, considering a variety of attacks and security properties; a rigorous proof of security in a formal communication model will be given in the full version of this paper. Off-Line Password Guessing Attack. In this attack, an attacker may try to guess a password and then to check the correctness of the guessed password off-line. If his guess fails, the attacker tries again with another password, until he find the proper one. In the proposed protocol, the only information related to passwords is cSA = EP WA (g s ) and cSB = EP WB (g t ), but because s and t are chosen randomly, these values does not help the attacker to verify directly the correctness of the guessed passwords. Thus, off-line password guessing attacks would be unsuccessful against the proposed protocol. Undetectable On-Line Password Guessing Attack. At the highest level of security threat to password authenticated key exchange protocols are undetectable on-line password guessing attacks [9] where an attacker tries to check the correctness of a guessed password in an on-line transaction with the server, i.e., in a fake execution of the protocol; if his guess fails, he starts a new transaction with the server using another guessed password. Indeed, the possibility of an undetectable on-line password guessing attack in the three-or-more-party setting represents a qualitative difference from the two-party setting where such attack is not a concern. However, this attack is meaningful only when the server is unable to distinguish an honest request from a malicious one, since a failed guess should not be detected and logged by the server. In our protocol, the server is the first who issues a challenge and the client is the first who replies with an answer to some challenge. It is mainly due to this ordering that the protocol is secure against undetectable on-line password guessing attacks. Suppose that an attacker A , posing as A, decrypts cSA by guessing
524
Y. Lee, J. Nam, and D. Won
a password, computes rA , kA , and uA = H(IDA |IDB |IDSA |IDSB |kA ) by choosing his own random a , and sends the fake message rA , uA to server SA. Then, the server SA, upon receiving rA and uA from A , should be easily able to detect a failed guess since the protocol specification mandates SA to check the correctness of uA . Note that the attacker cannot send a correct uA without knowing a correct kA which in turn only can be computed if the guessed password is correct. Hence, the proposed protocol can resist undetectable on-line password guessing attacks. Insider Attack. One of the main differences between the two-party setting and the three-or-more-party setting is the existence of insider attacks, i.e., attacks by malicious insiders. Namely, insider attacks are a concern specific to the three-ormore-party setting, and do not need to be considered in the two-party setting. Suppose one client, say A, tries to learn the password of the other client B during execution of the protocol. Of course, A should not be able to have this ability through a protocol run and this is still an important security concern to be addressed. As mentioned earlier in this section, the only information related to the B’s password is cSB = EP WB (rSB ), where rSB is computed as rSB = g t and t is a random value chosen by the server SB. The actual value rSB itself is never included in any message sent in the protocol and t is a secret information only known to SB. This means that the malicious insider A has no advantage over outside attackers in learning B’s password. In other words, A’s privileged information — P WA , rSA , and kA — gives A no help in learning B’s password. Therefore, our protocol is also secure against insider attacks. Implicit Key Authentication. The fundamental security goal for a key exchange protocol to achieve is implicit key authentication. Loosely stated, a key exchange protocol is said to achieve implicit key authentication if each party trying to establish a session key is assured that no other party aside from the intended parties can learn any information about the session key. Here, we restrict our attention to passive attackers; active attackers will be considered in the full version of this paper. Given rA = g a and rB = g b , the secret value K = g ab cannot be computed, since no polynomial algorithm has been found to solve the computational DiffieHellman problem. Thus, if the random numbers a and b are unknown, then the session key sk cannot be computed since H is a one-way hash function. Hence, the secrecy of the session key is guaranteed based on the computational DiffieHellman assumption in the random oracle model [4]. Perfect Forward Secrecy. Perfect forward secrecy [8] is provided if past session keys are not recovered by compromise of some underlying long-term information at the present time. The proposed protocol also achieves perfect forward secrecy since the long-term information of the protocol participants is used for implicit key authentication only, and not for hiding the session key. Even if an attacker obtains the clients’ passwords and the servers’ signing private keys, he
An Inter-domain Key Agreement Protocol Using Weak Passwords
525
cannot get any of secret values K = g ab computed in previous sessions since a and b chosen respectively by A and B are unknown. Because the session key in this protocol is computed as sk = H(K|IDA |IDB |IDSA |IDSB ), he cannot compute it without knowing the secret value K. Hence, our protocol does indeed achieve perfect forward secrecy. Known Key Security. Known key security is said to be provided if compromise of some session keys does not help an attacker learn about any other session keys or impersonate a party in some later session. In our protocol, the session keys generated in different sessions are independent since the ephemeral secret exponents a and b are chosen independently at random from session to session. Thus, the proposed protocol still achieves its goal in the face of an attacker who has learned some other session keys.
References 1. M. Abdalla, P.-A. Fouque, and D. Pointcheval, Password-based authenticated key exchange in the three-party setting, PKC 2005, LNCS 3386, pp. 65–84, 2005. 2. M. Bellare and C. Namprempre, Authenticated encryption: Relations among notions and analysis of the generic composition paradigm, Asiacrypt 2000, LNCS 1976, pp. 531–545, 2000. 3. M. Bellare, D. Pointcheval, and P. Rogaway, Authenticated key exchange secure against dictionary attacks, Eurocrypt 2000, LNCS 1807, pp. 139–155, 2000. 4. M. Bellare and P. Rogaway, Random oracles are practical: A paradigm for designing efficient protocols, Proc. ACM CCS 1993, pp. 62–73, 1993. 5. S.M. Bellovin and M. Merritt, Encryted key exchange: password-based protocols secure against dictionary attacks, Proc. IEEE Symp. Research in Security and Privacy, pp. 72–84, 1992. 6. V. Boyko, P. MacKenzie, and S. Patel, Provably secure password-authenticated key exchange using Diffie-Hellman, Eurocrypt 2000, LNCS 1807, pp. 156–171, 2000. 7. H.-Y. Chien and J.-K. Jan, A hybrid authentication protocol for large mobile network, The Journal of Systems and Software, vol. 67, no. 2, pp. 123–130, 2003. 8. W. Diffie, P. Oorschot, and M. Wiener, Authentication and authenticated key exchanges, Designs, Codes, and Cryptography, vol. 2, no. 2, pp. 107–125, 1992. 9. Y. Ding and P. Horster, Undectectable on-line password guessing attacks, ACM SIGOPS Operating Systems Review, vol. 29, no. 4, pp. 77–86, 1995. 10. D. Dolev, C. Dwork, and M. Naor, Nonmalleable cryptography, SIAM Journal on Computing, vol. 30, no. 2, pp. 391–437, 2000. 11. S. Goldwasser, S. Micali, and R. Rivest, A digital signature scheme secure against adaptive chosen-message attacks, SIAM Journal of Computing, vol. 17, no. 2, pp. 281–308, 1988. 12. L. Gong, M.-L. Lomas, R.-M. Needham, and J.-H. Saltzer, Protecting poorly chosen secrets from guessing attacks, IEEE Journal on Selected Areas in Communications, vol. 11, no. 5, pp. 648–656, 1993. 13. S. Jiang and G. Gong, Password based key exchange with mutual authentication, SAC 2004, LNCS 3357, pp. 267–279, 2005. 14. J. Katz, R. Ostrovsky, and M. Yung, Efficient password-authenticated key exchange using human-memorable passwords, Eurocrypt 2001, LNCS 2045, pp. 475– 494, 2001.
526
Y. Lee, J. Nam, and D. Won
15. J. Katz and M. Yung, Unforgeable encryption and adaptively secure modes of operation, FSE 2000, LNCS 1978, pp. 284–299, 2000. 16. J.-T. Kohl and B.-C. Neumanm, The Kerberos Network Authentication Service, Version 5 Revision 5, Project Athena, Massachusetts Institute of Technology, 1992. 17. T. Kwon, M. Kang, S. Jung, and J. Song, An improvement of the password-based authentication protocol (K1P) on security against replay attacks, IEICE Transactions on Communications, vol. E82-B, no. 7, pp. 991–997, 1999. 18. T.-F. Lee, T. Hwang, and C.-L. Lin, Enhanced three-party encrypted key exchange without server public keys, Computer & Security, vol. 23, no. 7, pp. 571–577, 2004. 19. S.-W. Lee, H.-S. Kim, and K.-Y. Yoo, Efficient verifier-based key agreement protocol for three parties without server’s public key, Applied Mathmatics and Computation, vol. 167, no. 2, pp. 996–1003, 2005. 20. C.-L. Lin, H.-M. Sun, and T. Hwang, Three-party encrypted key exchange: attacks and a solution, ACM SIGOPS Operating Systems Review, vol. 34, no. 4, pp. 12–20, 2000. 21. C.-L. Lin, H.-M. Sun, M. Steiner, and T. Hwang, Three-party encrypted key exchange without server public-keys, IEEE Communications letters, vol. 5, no. 12, pp. 497–499, 2001. 22. T.-M.-A. Lomas, L. Gong, J.-H. Saltzer, and R.-M. Needham, Reducing risks from poorly chosen keys, ACM SIGOPS Operating Systems Review, vol. 23, no. 5, pp. 14– 18, 1989. 23. P. Rogaway, M. Bellare, J. Black, and T. Krovetz, OCB: A block-cipher mode of operation for efficient authenticated encryption, Proc. ACM CCS 2001, pp. 196– 205, 2001. 24. M. Steiner, G. Tsudik, and M. Waidner, Refinement and extension of encrpyted key exchange, ACM SIGOPS Operating Systems Review, vol. 29, no. 3, pp. 22–30, 1995. 25. H.-M. Sun, B.-C. Chen, and T. Hwang, Secure key agreement protocols for threeparty against guessing attacks, The Journal of Systems and Software, vol. 75, no. 1–2, pp. 63–68, 2005. 26. H.-T. Yeh and H.-M. Sun, Password-based user authentication and key distribution protocols for client-server applications, The Journal of Systems and Software, vol. 72, no. 1, pp. 97–103, 2004. 27. H.-T. Yeh, and H.-M. Sun, Password authenticated key exchage protocols among diverse network domains, Computers and Electrical Engineering, vol. 31, no. 3, pp. 175–189, 2005. 28. H.-T. Yeh, H.-M. Sun, and T. Hwang, Efficient three-party authentication and agreememt protocols resistant to password guessing attacks, The Journal of Information Science and Engineering, vol. 19, pp. 1059–1070, 2003. 29. M. Zhang, New approaches to password authenticated key exchange based on RSA, Asiacrypt 2004, LNCS 3329, pp. 230–244, 2004.
A Practical Solution for Distribution Rights Protection in Multicast Environments Josep Pegueroles, Marcel Fernández, Francisco Rico-Novella, and Miguel Soriano Telematics Engineering Department, Technical University of Catalonia, Jordi Girona 1-3, CAMPUS NORD C3 08034, Barcelona, Spain {josep, marcelf, telfrn, soriano}@entel.upc.edu
Abstract. One of the main problems that remain to be solved in pay-per-view Internet services is copyright protection. As in almost every scenario, any copyright protection scheme has to deal with two main aspects: protect the true content authors from those who may dishonestly claim ownership of intellectual property rights and prevent piracy by detecting the authorized (but dishonest) users responsible of illegal redistribution of copies. The former aspect can be solved with watermarking techniques while for the latter, fingerprinting mechanisms are the most appropriate ones. In internet services such as Web-TV or near video on-demand where multicast is used, watermarking can be directly applied. On the other hand, multicast fingerprinting has been seldom studied because delivering different marked content for different receivers seems a contradiction with multicast basics. In this paper we present a solution to prevent unauthorized redistribution of content in multicast scenarios. The system is based on a trusted soft-engine embedded in the receiver and comanaged by the content distributor. The trusted soft-engine is responsible of the client-side multicast key management functions. It only will allow the decryption and displaying of the actual data if it has previously inserted a fingerprinting mark with the identity of the decoder. Upon finding a pirate copy of any multicast delivered content, this mark can be used to unambiguously reveal the identity of the receiver that decoded the content from which the pirate copies are made.
1 Introduction The use of multicast is an important handicap when adding fingerprinting mechanisms to commercial content delivery over the network. In multicast communications all the receivers get the same stream of bits. If anyone of the end users illegally redistributes the content, there is no way to distinguish the pirate copy delivered by the malicious user from the original ones hold by the honest users. So, unauthorized redistributors would be masked by the rest of the multicast group. In this sense, copyright protection for multicast arises as a challenging problem in commercial network applications. Not many solutions have been previously presented addressing this problem. Moreover, all of them present many disadvantages: incompatibility with multicast encryption schemes or relying in the existence of trusted and programmable M. Gavrilova et al. (Eds.): ICCSA 2006, LNCS 3982, pp. 527 – 536, 2006. © Springer-Verlag Berlin Heidelberg 2006
528
J. Pegueroles et al.
intermediate network nodes. These disadvantages make the current proposals infeasible to be efficiently implemented in real scenarios. In this paper we present a practical, ready-to-implement solution to the multicast fingerprinting problem. Our scheme is based on what we call the “trusted soft-engine” which is either a software or hardware tamper-proof entity embedded in the receiver at the client side. We emphasize that although this engine runs in the client side, some of their actions are actually managed by the server (distributor). In other words, the proposed solution is not only a client-side fingerprinting device and, of course, its server controlled part is assumed to be unbreachable. This engine has the following particularity with respect to the actions it performs: some actions require the management of the content distributor alone; other kinds of actions will require the management of the end user alone; finally, there are some actions that require the activity of a trusted third party. The trusted soft-engine will be the responsible, on one hand, of receiving the stream of bits and decrypting it using the secure multicast group key. At the same time, it will embed a single mark to the actual data that is passed to the media player.
2 Required Background 2.1 Fingerprinting Schemes Multicast networks allow the delivery of “digital goods” that are not even contained in a physical support. Once the buyer has received the content, he is free to store it in the physical support of his choice. Of course, he is also free to make as many (inexpensive) copies of the content as he wishes. This copying and storing easiness constitutes a severe threat to authorship and distribution rights. Many experts share the opinion that e-commerce will not be a total reality, until this threat to intellectual property rights can be efficiently counteracted. Among the mechanisms used in protecting intellectual property we observe two groups: copy prevention mechanisms and copy detection mechanisms. The failure of copy prevention schemes, such as the DVD copy prevention system, has shifted many research efforts towards the search of feasible copy detection mechanisms. These efforts have brought the development of new proposals of copy detection systems for many different types of digital content. A large number of these new proposals can be grouped under the name of watermarking. The watermarking technique consists of embedding a set of marks in each copy of the digital content. Since the embedded set of marks is the same for all copies, watermarking schemes ensure intellectual property protection but fail to protect distribution rights. If protection against distribution rights is also required then the concept of watermarking needs to be extended further off. The functionality of watermarking can be extended by allowing the embedded set of marks to be different in each copy of the object. This true original idea was introduced by [17] and is called fingerprinting [2,11,17] because of its analogy to human fingerprints. The copies of a digital object obtained under a fingerprinting scheme are, of course, all different. Having distinct copies of the same object univocally identifies the buyer of a copy and therefore allows tracing illegal plain redistribution.
A Practical Solution for Distribution Rights Protection in Multicast Environments
529
Unfortunately, the seemingly advantageous extension from watermarking to fingerprinting introduces weaknesses to the resulting scheme. These weaknesses come in the form of a collusion attack. In a collusion attack [3,4] a group of dishonest users get together and by comparing their copies they are able to detect the symbols where their copies differ. By changing some of these symbols they create a new pirate copy that tries to hide their identities. Consequently, an adequate fingerprinting scheme must allow identification of dishonest buyers who took part in a collusion. Detection of such dishonest buyers can be achieved by using sets of marks with traceability properties, such as the ones presented in [3,4,9,19] 2.2 Multicast Security Aside from protecting the copyright, probably the first security requirement in multicast communications is protecting the data from unauthorized eavesdroppers [7]. This is easily achieved by means of encryption. The addition of encryption algorithms in multicast communications has several restrictions: it must not add significant delay to data transmission or excessive packet loss due to time constraint violation will occur, moreover all the delivered data shall be encrypted using a session key shared by all the multicast group members [18]. The shared key provides group secrecy and source authentication. This key must be updated every time membership in the group changes, if so Forward and Backward Secrecy (FS and BS) are provided. FS means that the session key gives no meaningful information about future session keys, that is to say, no leaving member can obtain information about future group communication. BS means that a session key provides no meaningful information about past session keys or that no joining member can obtain information about past group communication [12]). The above mentioned constraints force us to an on-line encryption of the multimedia content (just before sending it) and to use different keys for protecting the same file from eavesdroppers, even during the same session. In order to avoid extra delay in the process, symmetric encryption is generally used. The most successful proposals in multicast key encryption and key management schemes are presented in next sections.
3 Previous Work 3.1 State Of the Art in Fingerprinting Techniques for Multicast. Two distinct approaches have been presented in the literature to achieve multicast fingerprinting: • A priori generation of different copies of the same content with different watermarks and following the appropriate selection of packets during distribution. This leads to allow unique identifiers at the receiver end. • Delivery of a single copy of unwatermarked content while middle entities in the distribution process are responsible of the introduction of the appropriate watermarks.
530
J. Pegueroles et al.
Next, we slightly overview the different proposals of these two approaches: In [10], for a given multicast video, the sender applies two different watermark functions to generate two different watermarked frames for every frame in the stream. Each end user has a uniquely identifying random key. The length of this key is the number of video frames in the stream. This identifying key is also known to the distributor. For each watermarked frame in the stream a different key is used to encrypt it. The random bit stream determines whether a member will be given one key or another for decryption. If there is only one leaking member, the distributor can identify this dishonest redistributor because he can read the watermarks to produce the identifying key and because he knows the bitstreams of all members. Another approach to the problem of copyright protection in multicast is Distributed watermarking (Watercasting) [5]. For a multicast routing tree with maximum depth (d), the distributor generates n≥d different copies of each packet. Each group of n alternate packets (each one with same content but different marks) is called a transmission group. On receiving a transmission group, a router forwards all but one of those packets to each downstream interface. Each last hop router in the distribution tree will receive n-dr packets from each transmission group, where dr is the depth of the route to this router. Exactly one of these packets will be forwarded onto the subnet with receivers. This provides a stream for each receiver with a unique sequence of watermarked packets. In [15], the authors introduce a hierarchy of intermediaries into the network. Each intermediary has a unique ID which is used to define the path from the source to the intermediary. The Path ID is embedded into the content to identify the path it has travelled. Each intermediary embeds its portion of the Path ID into the content before it forwards the content through the network. A watermark embedded by the distributor identifies the domain of a receiver. The last hop allows the intermediaries to mark the content uniquely for any child receivers. Multiple watermarks can be embedded using modified versions of existing algorithms. In [8] it was presented the Hierarchical tagging that allows a content producer to insert a different watermark for each of his distributors. Similarly, each distributor can insert a watermark for several sub-distributors. This process can continue until the end-users receive tagged content identifying all the distributors and the producer. The content is hidden using cryptographic techniques. Each end user receives the same data, performs some processing and retrieves only the data prepared for him. 3.2 Current Proposals Drawbacks All the above presented methods construct a different fingerprinted marked content to each end-user by either pre-processing techniques of the content or trusting the intermediate nodes of the network to perform some actions or both at the same time. The first important drawback of these proposals is that they are not scalable and different copies of the data stream need to be watermarked and encrypted a priori. Moreover, they are not fully compatible with state of the art multicast encryption schemes. In many cases the pre-processing required by the distributor or the client creates a weakness either in terms of computational cost or security. Another drawback when relying on intermediate network nodes is that the distributor needs the information about the tree topology if the illegal copy has to be
A Practical Solution for Distribution Rights Protection in Multicast Environments
531
traced. In open networks such as Internet, this is usually difficult to know a priori. Moreover, network providers may not be willing to provide security functionality to intermediate routers. 3.3 State Of the Art in Multicast Encryption Techniques With respect to the multicast encryption field, several works prevent new group members or leaving members from accessing data sent before they joined or after they leave. The simplest way a Key Server can deliver a new session key to the members is through a secret unicast connection with each of the remaining members of the group [13]. This solution presents the worst behaviour with respect to efficiency parameters. All figures have a dependency on the number of members in the multicast group. In order to reduce the number of messages for rekeying in [1,6,13,16] Logical Key Tree Schemes were presented. Key tree schemes use two types of encryption keys: Session Encryption Keys (SEK) and Key Encryption Keys (KEK). SEKs are used to cipher the actual data that multicast groups exchange, for example, video streams in multicast near-on-demand video sessions. KEKs are used to cipher the keying material that members need in order to get the SEK. Normally, KEKs are structured in logical binary trees. All users know the root of the key tree and the leaf nodes are users' individual keys. Imagine, for instance, the Tree in Figure 1. Nodes will be referred as (level number, position at level), so we will refer to root node as (1,1); the sons of root node will be (2,1) and (2,2) and so on. Key in node (X,Y) will be noted as K(X,Y). Consider a group of 8 users. The tree has 15 nodes, each node corresponds to a KEK. Group members are located at leaf nodes. Keys in the leaves are only known by single users. K(1,1) is known by all members in the group. The rest of the keys are revealed only to users considered sons of the node. For example, K(3,1) is known only by users in leaves (4,1) and (4,2), and K(2,2) is revealed only to nodes (4,5) to (4,8). 3.4 Logical Key Hierarchy (LKH) The simplest logical tree key management scheme is LKH [14]. It works as follows. Consider the multicast group in Figure 1 with 8 members (M1, …, M7) and a Key Server (KS). Each member must store a subset of the controller's keys. This subset of KEKs will allow the member to get the new SEK when it changes. A generic member (Mj) stores the subset of keys in the path from the leaf where he is to the root. In our example, member M1, in node (4,1), will store K(4,1),K(3,1),K(2,1) and K(1,1). When a member (M8) leaves the group, the Key Server must update every KEK in the path from the leaf, where new member was located, to the root. See Figure 2 in which new keys are denoted with quotes. The KS has to reveal the updated keys to the corresponding users. He uses the existing key hierarchy, along with reliable multicast, to efficiently distribute them as follows. He sends one message containing the set of updated keys to the members in node (4,7). This operation can be done via a unicast channel and using M7 ‘s individual key. After that, the key server constructs and sends a multicast message containing the updated keys (K’(2,2) , K’(1,1)) ciphered with K(3,3), so only members in nodes (4,5) and (4,6) can decipher it. Finally, KS also constructs and
532
J. Pegueroles et al.
sends a multicast message containing the new root key (K’(1,1)) and ciphered with K(2,1), so members in nodes (4,1) to (4,4) can decipher it. At this point, the remaining members in the multicast group know the subset of keys from their leaves to the root. Every member knows the root key, so this can be used to cipher a multicast message containing the new session key (SEK’). Following the example it is easy to see how LKH can update keys more efficiently than using only unicast connections with every end-user.
Fig. 1. Logical Key Hierarchy for multicast key management
Fig. 2. Example of LKH leaving
4 Proposed Multicast Fingerprinting Scheme Next we propose a multimedia multicast distribution scheme allowing both copyright protection and encryption. The scheme is fully compatible with existing state-of-the-art multicast encryption techniques and, contrary to the proposals discussed in previous sections, it does not rely on pre-processing techniques nor trusted intermediate nodes. In our proposal, all multimedia content can be on-line encrypted using symmetric key techniques and updated and managed using simple Logical Key Tree algorithms. All the functionalities for fingerprinting codeword embedding (generating different copies for different end-users), are moved to the end-user side. 4.1 Deployment and Operation Next we discuss the operational features of the proposed scheme. The overall process consists in the following 8 steps. See also Figure 3:
A Practical Solution for Distribution Rights Protection in Multicast Environments
533
1. End-user i requests some type of content from the Distributor and joins a multicast group. 2. The Distributor sends all the keys (KEK’s and Ks) that the trusted soft-engine i will need to decrypt the content 1 3. The Distributor notifies the trusted soft-engine i about the TTP that will deliver the fingerprinting codeword. 4. The recipient contacts the TTP asking for the fingerprinting codeword and notifies the identity of the Distributor. 5. The TTP delivers the fingerprinting mark to the trusted soft-engine i, and a proof of it (a hash function of the whole mark) to the Distributor. These messages are interchanged using a secure channel. 6. At the accorded time, the distributor starts sending the encrypted content to the multicast group. 7. Upon receiving the encrypted content the trusted soft-engine decrypts and marks it. 8. The marked content is the one that appears on the player. The system behaviour can be more deeply understood by following the next example. Suppose we want to transmit multimedia content through a multicast channel. As mentioned before, our system needs the interaction of three communication entities to achieve delivery plus secrecy and copyright protection of the multicasted content: the content distributor, the content recipients and a TTP. When an internet user wants to subscribe the near video on demand service of the distributor (say, become a content recipient) he first accesses the distributor storefront (usually a webpage offering different multimedia products) and asks for the content. As the multimedia product will be delivered using multicast, once the service is required the content recipient needs to join a multicast group. The Distributor aims to deliver the content in a convenient manner, so its privacy, integrity and authenticity are preserved. This goal can be accomplished by means of well known cryptographic techniques (i.e., encrypting thre contents using the AES algorithm). Obviously, the session key must be known by both the distributor and all the trusted soft-engines that are receiving the content. In this second step, the distributor gives to the recipient the whole set of keys that he will need to decrypt the data. If only one key (session key) would be given, there would be no way to efficiently update this key. When the session keys shall be updated (every time a recipient joins or leaves the multicast group), the distributor will send the Fig. 3. Atomic steps in the proposed scheme mandatory updating key messages, that once received by the trusted soft-engines 1
In case that some key updating needs to be performed, the responsibility of all key management operations completely lies on the Distributor.
534
J. Pegueroles et al.
will allow them to obtain the new session key. This process is usually done using the LKH algorithm. That is the reason why LKH KEKs are also delivered to the receiver at step 2. Apart from decrypting the content, the trusted soft engine has the responsibility of embedding the fingerprinting codeword to the displayed stream. To protect the recipient from dishonest distributors, this codeword has to be provided by a TTP trusted by both the distributor and the recipient. If the distributor itself decided the codeword to be inserted in the stream, he could dishonestly accuse of piracy an honest recipient by simply distributing different copies with the same codeword. That is the reason why, in step 3, the distributor tells the recipient where to get the certified mark. In step 4, the recipient asks the TTP for a valid codeword that uniquely identifies himself. As the TTP has also to notify information about the codeword to the distributor, the recipient also gives to the TTP the identity of the distributor he is dealing with. After that, the TTP embeds the fingerprinting codeword to the i-th end user trusted soft-engine in order to allow it to properly mark the content. In this step, the TTP also hashes the mark and sends it to the distributor. This value will be used by the distributor as the end-user i node KEK. By means of that, the distributor can be sure that only if the mark is correctly inserted in the soft-engine, the bitstream can be decrypted. If the mark is not inserted in the soft engine, the user i KEK is not known by the soft engine and as a result, the session key won’t be able to be recovered. All these actions have to be performed before the delivery of the multimedia content. When all these steps are finished, the distribution of the content can start. Traditionally, in near-on-demand video services, the content distribution is scheduled at predefined times and users can subscribe for any of those time-slots. At this point, only the decryption and branding of the bitstream are left to be done. This is entirely the responsibility of the soft-engine and only after this step (7) had been accomplished, the player in the end-user machine can display the decrypted and marked content. 4.2 Marking Process Now we tackle the problem of embedding the fingerprinting codeword that lies inside the trusted soft-engine into the content that the engine receives. We assume that the delivered content includes synchronizing information. This means that the player (or decoder) knows at what time has to display each part of the content with respect to the start time. To make the description more clear we will use an MPEG movie delivery as an example. Note that the set of positions that contains the mark is the same for every delivered movie. We show now how our scheme satisfies the above constrain in a very natural way. When the multicast delivered MPEG sequence first enters the trusted soft-engine, a timer is started. This timer expires after a previously defined time Tg, that we call guard time. At Tg the trusted soft-engine starts buffering the MPEG bitstream, until a marking threshold Mth is reached. This threshold is determined by the watermarking algorithm. The value of the threshold is determined by both the length of the fingerprinting codeword (which in turn depends on the number of end-users) and the watermarking algorithm. Note that in general, different watermarking algorithms will require
A Practical Solution for Distribution Rights Protection in Multicast Environments
535
different thresholds. Also note that for a given watermarking algorithm and a given threshold we will have a different buffer size, depending on the received bitstream. At this point, the watermarking algorithm is applied to the buffered MPEG bitstream, and the fingerprinting mark is embedded. The obtained output bitstream uniquely identifies the trusted soft-engine (and obviously the subscribed end-user). This procedure is shown in Figure 4. Note that the same mark will appear in several parts of the content, thus allowing for a given user to join the group at any time during transmission.
Fig. 4. Guard times and buffering times in the marking algorithm
5 Conclusions Multicast fingerprinting schemes have been seldom studied due to the difficulty of generating different marked copies in a multicast delivery scenario. Moreover, considered together with encryption, state-of-the-art multicast fingerprinting techniques are difficult to implement. In this paper we have presented a practical, ready-to-implement solution to the multicast fingerprinting problem. It is based on the existence of what we have called the “trusted soft-engine” a hardware/software device embedded in the end-user side and co-managed by the distributor, the content recipient and a trusted third party. This engine has the responsibilities of handling key updating, obtaining and decrypting the content using the session key, and embedding into the content the mark that uniquely identifies the soft-engine. The presented scheme is fully compatible with almost all existing key management algorithms for multicast and accepts several fingerprinting codes, depending on the required security level.
Acknowledgements This work has been partially supported by the Spanish Research Council under SECONNET project (CICYT - TSI2005-07293-C02-01).
References 1. D. Balenson, D. McGrew, A. Sherman. Key Management for Large Dynamic Groups: One-Way Function Trees and Amortized Initialization. IETF Internet Draft. irtf-smuggroupkeymgmt-oft-00.txt , work in progress, Aug 2000. 2. A. Barg, G. R. Blakley, G. Kabatiansky, Digital fingerprinting codes: Problem statements, constructions, identification of traitors, IEEE Trans. Inform. Theory, 49, 4, 852--865, 2003
536
J. Pegueroles et al.
3. D. Boneh, J. Shaw, Collusion-Secure Fingerprinting for Digital Data, Advances in Cryptology-Crypto'95, LNCS, 963, 452--465, 1995 4. D. Boneh, J. Shaw, Collusion-Secure Fingerprinting for Digital Data, IEEE Trans. Inform. Theory, 44, 5, 1897--1905, 1998 5. 5 I. Brown, C. Perkins, J. Crowcroft. Watercasting: Distributed Watermarking of Multicast Media. First International Workshop on Networked Group Communication (NGC ’99), Pisa, November 17-20, 1999. 6. R. Canetti, T. Malkin, K. Nissim. Efficient Communication Storage Tradeoffs for Multicast Encryption. Eurocrypt’99 p 456-470 1999 7. R. Canetti, B. Pinkas, A taxonomy of multicast security issues, IETF Internet Draft draftirtf-smugtaxonomy -01.txt, work in progress, April, 1999. 8. G. Caronni, C. Schuba. Enabling Hierarchical and Bulk-Distribution for Watermarked Content. The 17th Annual Computer Security Applications Conference, New Orleans, LA, December 10-14, 2001. 9. Chor et al. Tracing Traitors, IEEE Transactions on Information Theory, Vol 46, No 3, pp 893-910. 2000 10. H. Chu , L. Qiao, K. Nahrstedt. A Secure Multicast Protocol with Copyright Protection. Proceedings of IS&T/SPIE Symposium on Electronic Imaging: Science and Technology, January 25-27, 1999. 11. M. Fernandez, M. Soriano. Soft-Decision Tracing in Fingerprinted Multimedia Contents. IEEE Multimedia, vol 11, no2, 2004. 12. T. Hardjono, B. Cain, N. Doraswamy, A Framework for Group Key Management for Multicast Security, IETF Internet Draft, draft-ietf-ipsec-gkmframework-03.txt, work in progress, August 2000 13. H. Harney, E. Harder. Logical Key Hierarchy Protocol (LKH). IETF Internet Draft, harney-sparta-lkhp-sec-00.txt, work in progress, Mar 99. 14. H. Harney , C. Muckenhirn, T. Rivers, Group Key Management (GKMP) Architecture, SPARTA, Inc., IETF RFC 2094, July 1997. 15. P. Judge, M. Ammar. WHIM: Watermarking Multicast Video with a Hierarchy of Intermediaries. The 10th International Workshop on Network and Operation System Support for Digital Audio and Video, Chapel Hill, NC, June 26-28, 2000. 16. J. Pegueroles, W. Bin, M. Soriano, F. Rico-Novella. Group Rekeying Algorithm using Pseudo-Random Functions and Modular Reduction. Grid and Cooperative Computing (GCC). Springer-Verlag Lecture Notes in Computer Science, vol 3032, 2004. pp. 875-882 17. N. Wagner. Fingerprinting. Proceedings of the 1983 IEEE Symposium on Security and Privacy, pp 18--22, April 1983. 18. D. Wallner, E. Harder, R. Agee, Key management for multicast: Issues and architectures, IETF RFC 2627, June 1999. 19. Yoshioka et al., Systematic Treatment of Collusion secure codes: security definitions and their relations, LNCS 2851, pp 408-421.
Audit-Based Access Control in Nomadic Wireless Environments Francesco Palmieri and Ugo Fiore Federico II University, Centro Servizi Didattico Scientifico, Via Cinthia 45, 80126 Napoli, Italy {fpalmieri, ufiore}@unina.it
Abstract. Wireless networks have been rapidly growing in popularity, both in consumer and commercial arenas, but their increasing pervasiveness and widespread coverage raises serious security concerns. Client devices can potentially migrate, usually passing through very light access control policies, between numerous diverse wireless environments, bringing with them software vulnerabilities and possibly malicious code. To cope with this new security threat we propose a new active third party authentication, authorization and audit/examination strategy in which, once a device enters an environment, it is subjected to security analysis by the infrastructure, and if it is found to be dangerously insecure, it is immediately taken out from the network and denied further access until its vulnerabilities have not been fixed. Encouraging results have been achieved utilizing a proof-of-concept model based on current technology and standard open source networking tools. Keywords: Mobile networking, security audit, access control.
1 Introduction The world is going mobile. Today, most consumers take for granted the ability to communicate with friends, family and office anywhere, anytime, at a reasonable cost, so the widespread adoption of low-cost wireless technologies such as WiFi and Bluetooth wireless coverage makes nomadic networking become more and more common. But, as millions of mobile users migrate between home, office, hotel and coffee shop, moving from one wireless access point to the next, they take with them not only their computer, but also electronic hitchhikers they may have picked up in the local shopping mall and unpatched or misconfigured applications. Continual migration from one access point to another with these vulnerabilities threatens the integrity of the other environments, as well as that of other peers within the environments. As corrupted machines move from network to network, they will be able to quickly spread offending code to network resources and users. The traditional paradigm, based on the separation between a secure area inside and a hostile environment outside can no longer be effective, because authenticated users can bring uncontrolled machines in the network. A user may unwittingly bring in active threats such as viruses, Trojan Horses, denial-of-service daemons, or even create a hole for a human intruder; alternatively they may bring in passive threats such as vulnerable packages or poorly M. Gavrilova et al. (Eds.): ICCSA 2006, LNCS 3982, pp. 537 – 545, 2006. © Springer-Verlag Berlin Heidelberg 2006
538
F. Palmieri and U. Fiore
configured software. Particularly, resourceful worms could use nomadic trends to attack and quickly spread in dense urban centers or large campuses, without resorting to the Internet. We believe that this is a fundamental security threat that must be addressed. To contain or at least mitigate the impact and spread of these attack vectors, wireless access control environments must be able to examine and evaluate clients on their first access to the network for potential threats or vulnerabilities. Accordingly, we propose a new third party authentication, authorization and audit/examination paradigm in which once a device enters an environment it is subjected to active analysis by the infrastructure, and if its security level is found to be not adequate it is immediately taken out from the network and denied for any further access until its problems and vulnerabilities have not been fixed. The essence of the proposed architecture is allow or deny access to the network to systems based on their perceived threat, i.e., before they can become infected or attack others. Hence, to evaluate their health status, an active vulnerability assessment is carried out, followed by a continuous passive monitoring against further suspect activities such as port scans, broadcasts or other hostile activities. Our approach seems to be promising for several reasons. Firstly, it does not rely only on global network traffic monitoring. Secondly, it does not require the use of specific agents to be installed on the mobile hosts, implying au undesired strong administrative control on the networked systems, so it is very suitable for the untrusted mobile networking environment. Thirdly, the whole security arrangement is easily applicable at any network location and is not dependent from specific software packages. It would also be effective against unknown viruses, worms or generic malware. Finally, this type of infrastructure would strongly encourage active and timely patching of vulnerable and exploited systems, increasing overall network security. It would benefit users by protecting their systems, as well as keeping them up to date, and benefit local providers by protecting their infrastructure and reducing theft of service. Deployment would also protect the Internet as a whole by slowing the spread of worms, viruses, and dramatically reducing the available population of denial-of-service agents. The remainder of this paper is organized as follows. Section 2 reviews the related scientific literature. Section 3 briefly sketches the ideas behind the proposed architecture. The detailed components of the overall system and their interaction are described in Section 4. Section 5 presents a proof-of-concept implementation realized with current technology. Finally, Section 6 is dedicated conclusions and planning for future research.
2 Related Work IEEE standard 802.11i [1] is designed to provide enhanced security in the Medium Access Control (MAC) layer for 802.11 networks. While the standard provides good coverage of confidentiality, integrity, and mutual authentication when CCMP is used, active research is devoted to identifying potential weaknesses and suggesting security improvements. Many researchers are concentrated on discovering weaknesses in association and authentication, and in particular in the 4-Way Handshake procedure (see, for example [2]). Much effort has also been spent on DoS attacks. Management frames are unprotected, thus even an adversary with moderate equipments can easily forge these frames to launch a DoS attack. Ding et al. [3] proposed the use of a
Audit-Based Access Control in Nomadic Wireless Environments
539
Central Manager (CM) to dynamically manage Access Points and their clients. Another approach is to respond to deauthentication and disassociation frames by triggering a new 4-Way Handshake [4], while Ge and Sampalli [5] suggest that the management frames should be authenticated. He and Mitchell [6] described some attacks and proposed corresponding defenses. They also compiled a survey of the relevant literature. All these works are targeted at hardening the protocols and denying network access to unauthorized devices, while there appears to be much less work focused on guaranteeing network protection by proactively identifying potential threats even if they come from legitimate clients. Our scheme addresses the problem of authenticated users bringing vulnerable or infected machines into a secure network.
3 The Basic Architecture Information security officers and systems and network administrators want to control which devices are allowed onto their networks. In academic environments this is even more important, because university networks are usually well connected to the Internet and are therefore popular targets for hackers. In addition many systems in these environments are owned by people nearly completely unaware of security chores as keeping their operating systems or anti-virus definitions up-to-date. The fundamental concept of the proposed active access control strategy is that once a client mobile device enters a new environment it has to be subjected to analysis by the infrastructure to examine for potential vulnerabilities and contaminants and evaluate its security degree to decide if network access can be granted without compromising the target environment. This enhanced security facility can be implemented as an additional access control phase immediately following the traditional 802.11i authentication and authorization paradigm. Here, the involved wireless Access Point after performing successful 802.1x authentication of the supplicant against the RADIUS or DIAMETER server and when the mobile node receives full network access and is reachable through an IP address, requires its complete security analysis by a new network entity that we will call the Auditor. To ensure the proper model scalability in huge and complex network environments there may be many Auditors associated to the different access points, each one, and eventually more than one, dedicated to the coverage of a specific area. This also provides fault tolerance and implicit load distribution where necessary. There are a large set of possible types of analyses that could be performed by the Auditor, including external network scans and probes of offered services, virus scans, or behavior monitoring. For instance, a simple type of examination might determine the versions of installed OS and software and appropriate security patches, verifying the existence of open vulnerabilities where applicable. During the different examination phases, the Auditor computes a complex score representing the resulting client insecurity degree, built upon the sum of individual scores assigned to the different analysis results. The task of specifying the partial scores to be assigned to different categories is left to the network administrator, because environments and security requirements may vary widely. Once the security analysis terminates, the above degree is compared with an acceptance threshold configured at the auditor level and, if the computed security score does not fall below the threshold, the auditor returns a reject response to the access points that immediately denies any further network access to the examined device. Otherwise, the mobile client don’t loose its network access, but
540
F. Palmieri and U. Fiore
the examination will not stop for the involved node but passes from the active to the passive state, which means that using standard intrusion detection techniques, the auditor, on behalf of the local infrastructure, could continuously examine network traffic to determine if any entity is trying to launch a port scan of any other form of attack or take over other machines. In this case the Auditor quickly notifies the above event to the access point and consequently the node is immediately taken off from the network. The whole process is synthesized in the status diagram below.
Fig. 1. The audit-based access control status diagram
The proposed architecture can scale quite easily. It can be adapted to a large network by simply replicating its components. All that is needed is to properly set up relationships between each Auditor and its controlled access points. The resulting system is flexible, customizable, extensible, and can easily integrate other common off-the-shelf tools. In addition, it provides automated support for the execution of complex tasks that require the results obtained from several different tools.
4 The Operating Scenario Available technologies implement wireless security and authentication mechanisms such as 802.1X and 802.11i to prevent the possibility of unauthorized users gaining undue access to a protected network. Security is typically provided via access control. Unfortunately, this arrangement falls short as a tool to avoid the inside spread of hostile agents, since in the modern nomadic networking environment authentication and authorization are no longer sufficient and need to be completed with a strong security assessment aiming at clearly highlighting any potential menace. Therefore our proposed security strategy operates according to the scenario described below. Four entities are involved, called the Supplicant (the wireless station), the Authenticator (the Access Point), the Authentication Server, and the Auditor. We assume, as in IEEE 802.11i, that also the AP and Auditor have a trustworthy UDP channel
Audit-Based Access Control in Nomadic Wireless Environments
Supplicant
Authenticator
AS
541
Auditor
802.1X EAP Request
802.1X EAP Response Access Request (EAP Request) EAP Authentication Protocol Exchange
Accept / EAP Success / Key material 802.1X EAP Success
Activate Security assessment
Force-Disconnect
Fig. 2. Operating scenario
between them that can be used to exchange information and action triggering messages, ensuring authenticity, integrity and non-repudiation. After successful authentication, when a wireless device is granted access to the network, it usually issues a DHCP request for an address. The local DHCP server then hands an IP address to the wireless device passing through the access point that has been modified to detect the DHCP response message (message type DHCPACK) and acquire knowledge about the layer-3 availability of a new device together with its IP address. Now the access point has to ask the Auditor for the needed security analysis. 4.1 Message Layout Basically, the communication between the Auditor and the Access Point is twofold: the Access Point signals the Auditor that a new Client has associated and that the assessment should begin; the Auditor may request that the Access Point disconnect the Client in case the computed score exceed the admissibility threshold. So only two messages are necessary, respectively the Activation message and the Force-Disconnect message, whose layout is shown in the following figure. Octet Number Protocol Version
1
Packet Type
2
Packet Body Length
3-4
Identifier
5-8
Client MAC Address
9-14
Client IP Address
15-18
Fig. 3. Message layout
542
F. Palmieri and U. Fiore
Port UDP/7171 is used on both the Auditor and the Access Point to exchange messages with the other party. We also preferred to leave room for further extensions. For instance, message authentication may be called upon, as there is potential for DoS attacks if such messages can be forged. − Protocol Version. This field is one octet in length, taken to represent an unsigned binary number. Its value identifies the version of protocol supported by the message sender. − Packet Type. This field is one octet in length, taken to represent an unsigned binary number. Its value determines the type of packet being transmitted. The following types are defined: o Activate. A value of 0000 0000 indicates that the message carries an Activate packet. o Force-Disconnect. A value of 0000 0001 indicates that the message is a Force-Disconnect packet. − Packet Body Length. This field is two octets in length, taken to represent an unsigned binary number. The value of this field defines the length in octets of the packet body following this field; a value of 0 indicates that there is no packet body. − Identifier. The Identifier field is one octet in length and allows matching of responses with requests. − Client MAC Address. This field is six octets in length and specifies the MAC address of the Client. − Client IP Address. This field is four octets in length and indicates the IP address of the Client. No acknowledgement messages have been provided for, as the communication between the Access Point and the Auditor is asynchronous and neither actor needs information about the completion of task by the other. Synchronous communication would induce unacceptable latency. 4.2 The Security Analysis Process The security analysis process should take advantage of a reliable and possibly complete knowledge base that contains semantically-rich information provided by network mapping and configuration assessment facilities. Accordingly, client profiling will be accomplished through the use of specialized network analysis tools that can be used to examine open ports and available services in a fairly non-intrusive manner. This analysis can detect the presence of services that are unnecessary or undesirable in the given environment or identify explicit anomalies and system vulnerabilities. For example, if a port scan was run and determined that a normally unused port, e.g., TCP port 1337, was open on a scanned host, a penalty would be added to the security score indicating that the machine had been potentially exploited and may represent a security problem. Our philosophy is to use only open source applications, in order to avoid licensing costs and legalities, and perform each part of the overall service discovery/analysis task with the tool best suited for the job. This works best if a tool performs one specific task instead of implementing many different functionalities as a monolithic application. A tool that is able to perform many tasks should at least have parameters that can limit the operation of the tool to exactly what is needed.
Audit-Based Access Control in Nomadic Wireless Environments
543
Examples of these tools are Nessus [7] and Nmap [8] both falling under the GPL (General Public License) and configurable by fine tuning the proper options to suit the specific needs of each analysis’ task. The advantage of using such existing tools is that it requires less work to implement the client examination procedures. The first phase of the examination, the information gathering phase, comprises the determination of the characteristics of the target mobile node such as OS type, and “listening” applications e.g. WWW servers, FTP services, etc. This is ordinarily achieved by applying the following techniques: − OS detection: A common OS detection technique is "IP stack fingerprinting" – the determination of remote OS type by comparison of variations in OS IP stack implementations behavior. Ambiguities in the RFC definitions of core internet protocols coupled with the complexity involved in implementing a functional IP stack enable multiple OS types (and often revisions between OS releases) to be identified remotely by generating specifically constructed packets that will invoke differentiable but repeatable behavior between OS types, e.g. to distinguish between Sun Solaris, Microsoft Windows NT and the most common Open Source Unix flavors such as Linux and *BSD. By determining the presence of older and possibly unpatched OS releases the Auditor can increase the overall insecurity score according to the locally configured policy. For example, if network administrators know that all of the legitimate clients are running Windows boxes, they will likely raise the score of an OS fingerprint indicating Linux, or the reverse. In those cases when OS fingerprinting is able to determine the operating system version, besides the type, higher scores can be assigned to the older versions. Multiple tools, such as nmap, queso and p0f have been used together, for results reliability sake, to implement OS fingerprinting in our architecture. − Service Detection: a.k.a. “port scanning” performs the availability detection of a TCP, UDP, or RPC service, e.g. HTTP, DNS, NIS, etc. Listening ports often imply associated services, e.g. a listening port 80/tcp often implies an active web server. The presence of deprecated services, from the security point of view, or, at worst port associated to possible worm infection. Here, for each dangerous service or suspicious port detected the insecurity score is increased according to a configurable policy, such as the example one sketched in the table below. Furthermore, the pattern of listening ports discovered using service detection techniques may also indicate a specific OS type; this method is particularly applicable to “out of the box” OS installations. Service detection will be accomplished with the nmap tool that can provide a very fast and complete assessment of the running services and in some cases their versions [8]. − Vulnerability detection: Based on the results of the previous probe, that attempts to determine what services are running, a new and more sophisticated scanner, with semantic knowledge about up-to-date security issues, tries to exploit known vulnerabilities of each service in an attempt to test the overall system security. From the implementation point of view, the results of the previous nmap scan can be used by a more sophisticated tool, such as Nessus [7] to conduct a more directed and thorough assessment. Nessus, offers much more advanced configuration audit functionalities, has the capability of remotely scanning a client to determine running
544
F. Palmieri and U. Fiore
services, the versions, and if the client is susceptible to specific security threats. The assessment library associated with Nessus is very comprehensive, covering a large variety of architectures, operating systems, and services. After the assessment, each detected problem heavily taxes the computed insecurity score since it is typically a very dangerous symptom or an open vulnerability, whereas the presence of open ports outside an admissible range specified by the network administrator could be evaluated differently, in accordance with locally defined policies. Assessment should be logged and their result retained for a time interval configurable by the network administrator, in order to prevent a Client continuously trying to get access from consuming inordinate amounts of resources. The optimal value of this amount of time is a tradeoff between the quest for accurate and up-to-date security assessment and the compelling need to save resources. The Auditor also acts as a passive unfriendly activity detector. This is a necessary facility, since the analysis can take more time than can be acceptable for mobile users, and so we decided to grant immediate access to users, without waiting for the assessment completion. This open a security risk, because, a mobile client can immediately be active to propagate malware, but this activity would be immediately noticed by the passive detector. In fact, if, at any time, some hostile activity such as scanning, flooding or known attacks is discovered from an associated Client, the Auditor requests the AP to disconnect it. Each Auditor is assigned two IP addresses. The first is utilized to communicate with the connected Access Points. The other will be dedicated to passive listening to signal illegitimate activity. To detect the above misbehavior, a simple but very flexible network intrusion detection system (NIDS) such as snort (open source intrusion detection system) [9] can be used. It performs real-time traffic and protocol analysis, signature matching against a signature database (kept constantly auto-update via Oinkmaster services) and packet logging.
5 Implementation A simple proof-of-concept system was developed to test the effectiveness of the proposed system, with an emphasis on the use of currently available devices and opensource components. The Authenticator has been implemented on a modified Linksys Access Point, starting from its publicly available Linux-based firmware. Here we introduced the following new functions: − Accept, decode and verify the new protocol messages − Detect the DHCP ACK message and send the Activation message − Disconnect the client upon reception of a Force-Disconnect message On the other side, the Auditor has been implemented on a simple Intel 486-based Soekris [10] appliance running OpenBSD [11] with nmap, nessus, queso, p0f and snort applications installed. Although this implementation it is only a “proof of concept”, our simple testbed demonstrated the correct operation of the proposed security framework.
Audit-Based Access Control in Nomadic Wireless Environments
545
6 Conclusions and Future Work In this paper, we address the problem of authenticated users bringing vulnerable or infected machines into a secure network. In a nomadic wireless environment where machines are not under the control of the network administrator, there is no guarantee that operating system patches are correctly applied and anti-malware definitions are updated. We propose an architecture based on a security assessment to detect some characteristics of the inspected host and identify vulnerabilities. The proposed architecture can scale quite easily. It can be adapted to a large network by simply replicating its components. All that is needed is to properly set up relationships between each Auditor and its controlled Access Points. Areas for future research and development include better user interaction. Disconnection, and its reason, should be explicitly notified to users so that they would know what happened and the appropriate actions required to fix the problem. If a downright disconnection is felt to be too draconian and some restricted environment should be created, a tighter integration with DHCP would also be called for.
References 1. IEEE 802.11i, Medium Access Control (MAC) Security Enhancements, Amendment 6 to IEEE Standard for Information technology – Telecommunications and information exchange between systems – Local and metropolitan area networks – Specific requirements – Part 11: Wireless Medium Access Control (MAC) and Physical Layer (PHY) Specifications, July 2004. 2. Mishra, A., and Arbaugh, W. A., An initial securityanalysis of the IEEE 802.1X standard. Technical Report CS-TR-4328,UMIACS-TR-2002-10, University of Maryland, February 2002 3. Ding, P., Holliday, J., and Celik, A., Improving the security of Wireless LANs by managing 802.1X Disassociation. In Proceedings of the IEEE Consumer Communications and Networking Conference (CCNC ’04), Las Vegas, NV, January, 2004. 4. Moore, T. Validating 802.11 Disassociation and Deauthentication messages. Submission to IEEE P802.11 TGi, September, 2002 5. Ge, W., Sampalli, S., A Novel Scheme For Prevention of Management Frame Attacks on Wireless LANs, March 2005, http://www.cs.dal.ca/news/def-1341.shtml 6. He, C., and Mitchell, J., Security Analysis and Improvements for IEEE 802.11i, in 11th Annual Network and Distributed System Security Symposium (NDSS ’05). San Diego, February 2005. 7. Nessus Security Scanner (http://www.nessus.org/). 8. Nmap Security Scanner (http://www.insecure.org/). 9. Snort, Open Source Network Intrusion Detection System (http://www.snort.org/). 10. Soekris Engineering (http://www.soekris.com/). 11. Open BSD (http://www.openbsd.org/).
Cost – Time Trade Off Models Application to Crashing Flow Shop Scheduling Problems Morteza Bagherpour
1
, Siamak Noori2, and S. Jafar Sadjadi2
1
Ph.D. student of Industrial Engineering department, Iran University of Science & Technology, Narmak, Tehran, Iran [email protected] 2 Department of Industrial Engineering, Iran University of Science & Technology, Narmak, Tehran, Iran [email protected], [email protected]
Abstract. Many traditional cost– time trades off models are computationally expensive to use due to the complexity of algorithms especially for large scale problems. We present a new approach to adapt linear programming to solve cost time trade off problems. The proposed approach uses two different modeling flowshop scheduling into a leveled project management network. The first model minimizes makespan subject to budget limitation and the second model minimizes total cost to determine optimum makespan over production planning horizon. Keywords: Flow shop scheduling, linear programming, Leveled project management network, Makespan.
1 Introduction In an m- machine flow shop problem, there are n-jobs to be scheduled on m-machines where each job consists of m-operations and each operation requires a different machine and all jobs are processed in the same order of the machines. Extensive research has been conducted for variants of the regular flow shop problem with the assumption that there exists an infinite buffer space between the machines. Even though such an assumption is valid for some applications, there are numerous situations in which discontinuous processing is not allowed. Such flow shops are known as no-wait. A no-wait flow shop problem occurs when the operations of a job have to be processed continuously from start to end without interruptions either on or between machines. This means, when necessary, the start of a job on a given machine is delayed in order that the operation’s completion coincides with the start of the next operation on the subsequent machine. There are several industries where the no-wait flow shop problem applies. Examples include the metal, plastic, chemical and food industries. For instance, in the case of rolling of steel, the heated metal must continuously go through a sequence of operations before it is cooled in order to prevent defects in the composition of the material. Also in the food processing M. Gavrilova et al. (Eds.): ICCSA 2006 LNCS 3982, pp. 546 – 553, 2006. © Springer-Verlag Berlin Heidelberg 2006
Cost – Time Trade Off Models Application to Crashing Flow Shop Scheduling Problems
547
industry, the canning operation must immediately follow the cooking operation to ensure freshness. Additional applications can be found in advanced manufacturing environments, such as just in time and flexible manufacturing systems. Sequencing methods in the literature can be broadly categorized into two types of approaches, namely optimization and heuristic. Optimization approaches guarantee to obtain the optimum sequence, whereas heuristic approaches mostly obtain nearoptimal sequences. Among the optimization approaches, the algorithm developed by Johnson (1994)[6] is the widely cited research dealing with sequencing n jobs on two machines. Lomnicki (1965)[7] proposed a branch and bound technique to find the optimum permutation of jobs. Since the flow shop scheduling problem has been recognized to be NP-hard, the branch and bound method cannot be applied for large size problems. This limitation has encouraged researchers to develop efficient heuristics. Heuristics are two-fold: constructive and improvement. In constructive category, methods developed by Palmer (1965) [3], Campbell et al.(1970) [2], Gupta (1971)[4], Dannenbring (1977)[12], RKock and Schmidt (1982) [13] and Nawaz et al. (1983) [8] can be listed. Mostly, these methods are developed on the basis of the Johnson’s algorithm. Turner and Booth (1987) [15] and Taillard (1990)[10] have verified that the method proposed by Nawaz et al.(1983) [9], namely NEH, performs superior among the constructive methods tested. On the other hand, Osman and Potts (1989)[11], Widmer and Hertz (1989) [16], Ho and Chang (1990)[ 5], Ogbu and Smith (1990) [10], Taillard (1990) [14], Nowicki and Smutnicki (1996)[8] and BenDaya and Al-Fawzan (1988) [16] have developed improvement heuristics for the problem of interest. Also Aldowaisan and Allahverdi (2004) developed new heuristics for m-machine no-wait flow shop to minimize total completion time.
2 Flow Shop Scheduling There are some assumptions in flow shop scheduling problems as follows: 2.1 Assumption Regarding the Jobs • The sequencing model consists of a set of n jobs, which are simultaneously available. • Job j has an associated processing time, which is known in advance on machine i. • All machines are continuously available. • Set-up time is independent of sequence and is included in the processing time. • Jobs are independent of each other. • Each job, once started, continues in process until it is completed. 2.2 Assumption Regarding the Machines • • • • •
The shop consists of m machines. No machine processes more than one job at a time. Each machine is available at all times for processing jobs without any interruption. Each machine in the shop operates independently. Machines can be kept idle.
548
M. Bagherpour, S. Noori, and S.J. Sadjadi
2.3 Assumption Regarding the Process • Processing time for each job on each machine is deterministic and independent of the order in which jobs are to be processed. • Transportation times of a job between machines and set-up times are included in the order of job processing.
3 Converting Flow Shop Scheduling into Leveled Project Management Network In flow shop problem there are m machines and n jobs, which is assumed that sequence of the jobs on the machines is to be known. As we are interested in reducing make span by employing additional resources, method developed for crashing the project network can be used if flow shop can be represented as a project network. To avoid any resource overlapping it is needed to level the resources. We propose a new approach to reduce makespan using additional resources. The proposed approach converts a flow shop problem into a leveled project management network. 3.1 Network Requirements There are some network requirements needed to hold when a flow shop problem is converted into a network such as nodes, activities, etc as follows: Nodes: Each node represents an event of starting or finishing operation(s) on machines. Activity: Activities are the operations to be proceeding on specific machine and have duration equal to processing time. Predecessors: For any activity representing the previous operation for the same job constitutes a preceding activity to the operation. Further the activity corresponding to the operation of the job on the same machine, which is before this job in the sequence also constitutes preceding activity. Table 1. Data of flow shop problem
Activity 1,1 . . i,j . . m,n
Predecessors . . (i-1,j),(i,j-1) . . (m-1,n),(m,n-1)
Duration time D11
Machine M1
.
.
.
.
Dij
Mi
.
.
.
.
Dmn
Mm
Duration time: "processing time" is the duration of the activity. Resources: Machines are the only resources. Suppose we have a flow shop system with n jobs and m machines. Table 1 describes the specifications:
Cost – Time Trade Off Models Application to Crashing Flow Shop Scheduling Problems
549
3.2 Crashing the Network Different linear programming models – depending type of objectives is used to crash the network. In next section two type of linear programming formulations is explained where the selection of crashed operations is optimized due to objectives.
4 The Proposed Approach Notations: i-j: activity from node i to node j Dn (i-j): normal duration time of activity from node i to node j Df (i-j): minimum crash time of activity i-j dij: crashed activity time for activity i-j Cij: crash cost per unit time for activity i-j e :number of network events ti: planning date for event i B: Pre-specified budget H: Overhead cost K: Fixed cost 4.1 Model 1: Minimizing Makespan Subject to Budget Limitation In model 1, the objective is to determine the operations to be crashed in order to find the minimum make span subject to budget limitation. When we convert flow shop scheduling problem into a network, it is possible to crash the network to find minimum make span. This problem can be formulated as follows: Min Z= (tn – t1) subject to: tj- ti≥ dij Df (i-j)≤dij≤ Dn (i-j) ∑∑ Cij (Dn (i-j)-dij)<=B ti≥0 , dij ≥0
(1) (2) (3)
The objective function of model 1 is to minimize additional cost for crashing the activities. Constraint (1) states that the start time of event j should be at least equal to start time of i and crash duration of activity i-j. Constraint (2) is related to the lower and upper bounds on crash duration and finally constraint (3) is for budget limitation. Note that all ti's and dij must be non-negative and also integer. However, due to the structure of the problem, model 1 can be solved as a linear programming problem. 4.2 Model 2: Minimizing Total Cost to Determine Optimum Makespan In model 2 we are interested in optimum common due date when total cost is minimized. Therefore, we convert the flow shop scheduling problem into a leveled
550
M. Bagherpour, S. Noori, and S.J. Sadjadi
project management network by using the crashing of operations on critical path to attain the objective. Thus, the problem can be formulated as follows: Min Z=H(tn – t1) + ∑∑ Cij (Dn (i-j)-dij) +K subject to: tj- ti≥ dij (1) Df (i-j)≤dij≤ Dn (i-j) (2) ti≥0 , dij ≥0 Note that all constraints described in model 1 and the objective function is the minimization of total cost over the production planning horizon. 4.3 Converting Network into Flow Shop Scheduling Problem In the previous section we have shown how to convert flow shop problems to critical path method (CPM) network. We can calculate the earliest start, latest start, floats for all the activities using CPM method. Earliest finish time of the project is equal to make span for the flow shop problem. Then interpretation of results gained by linear programming models into flow shop scheduling problems could be done.
5 Numerical Example In this section, in order to show the implementation of our proposed models, two numerical examples are given. 5.1 The First Numerical Example Consider a flow shop problem with three machines and five jobs, where the sequence is obtained as E-C-A-D-B [Campbell et al]. The corresponding processing times are given in table 2 and the overhead cost is equal to 20 per unit and the fixed cost is equal to 100. Critical value is equal to 41. Also pre specified budget is considered equal to 26. At first the problem is converted to a network where the information of prerequisite is given in table 3 and the corresponding network is shown in figure 1. The white arrows considered as dummy activities. Table 2. The processing times of a flow shop with 3 machines and five jobs
Cost – Time Trade Off Models Application to Crashing Flow Shop Scheduling Problems Table 3. The Project network data for corresponding example
Table 4 summarizes the optimal solution according to budget limitation. Table 4. Optimal duration for each activity according to budget limitation
Fig. 1. Project network for given example
551
552
M. Bagherpour, S. Noori, and S.J. Sadjadi
5.2 The Second Numerical Example Consider available data available in pervious example. In table 5, according the objective, optimal duration of each activity is given. Table 5. Optimal duration of each activity
As we could observe, with an implementation of a linear programming model, we could easily find an optimal resource allocation for a flowshop problem. Obviously, it is an easy task to apply the proposed method of this paper for relatively large scale projects which are embedded with numerous numbers of activities to be crashed. Although there are many researches which are either on flow shop scheduling, project management network, linear programming applications to crashing network, to the best of our knowledge, there is no closely related research which could help us compare our results.
6 Conclusion and Further Research In this paper, we have presented two applications for flow shop scheduling problems. The proposed approach converts flow shop problem to a leveled project management network and the optimal makespan has obtained using linear programming models. The objective of the first model is to minimize makespan subject to budget limitation and in the second model the objective is to find common due date to provide better delivery to the customer. However, the proposed approach can be used for large scale projects which are embedded with numerous numbers of activities to be crashed. Further research could be focused on applying linear programming methods to crashing parallel flow shop scheduling where the problem includes k flow shop problems which are working in parallel systems, simultaneously.
Cost – Time Trade Off Models Application to Crashing Flow Shop Scheduling Problems
553
References [1] Aldowaisan.T, Allahverdi.A, 2004, New heuristics for m-machine no-wait flowshop to minimize total completion time, International journal of management science, 32, 345352. [2] Campbell HG, Dudek RA, Smith ML. 1970, A heuristic algorithm for the n-job, mmachine sequencing problem. Management Science; 16(B):630–7. [3] Dannenbring.DG. 1977, An evaluation of flow shop sequencing heuristics. Management Science,23(11):1174–82. [4] Gupta JND. 1971,A functional heuristic algorithm for the flow shop scheduling problem. Operational Research Quarterly,;22(1):39–47. [5] Ho JC, ChangYL. 1990,A new heuristic for the n-job, m-machine flow shop problem. European Journal of Operational Research; 52:194–206. [6] Johnson SM. 1954, Optimal two- and three-stage production schedules with setup times included. Naval Research Logistics Quarterly;1(1):61–8. [7] Lomnicki ZA. 1965, A branch and bound algorithm for the exact solution of the threemachine scheduling problem, Operational Research Quarterly;16(1):89–100. [8] Nowicki E, Smutnicki C. 1996,A fast tabu search algorithm for the permutation flow shop problem. European Journal of Operational Research; 91:160–75. [9] Nawaz M, Enscore Jr. EE, Ham I. 1983, A heuristic algorithm for the m-machine, n-job flow shop sequencing problem. OMEGA: International Journal of Management Science;11(1):91–5. [10] Ogbu FA, Smith DK. 1990, The application of the simulated annealing algorithm to the solution of the n/m/Cmax flow shop problem. Computers and Operations Research;17:243–53. [11] Osman IH, Potts CN. 1989, Simulated annealing for permutation flow shop scheduling. OMEGA: International Journal of Management Science; 17:551–7. [12] Palmer DS. 1965,Sequencing jobs through a multi-stage process in the minimum total time—a quick method of obtaining a near optimum. Operational Research Quarterly; 16(1):101–7. [13] RKock H, Schmidt G. 1982, Machine aggregation heuristics in shop scheduling. Methods of Operations Research; 45: 303–14. [14] Taillard E. 1990, Some efficient heuristic methods for the flow shop sequencing problem. European Journal of Operational Research; 47:65–74. [15] Turner S, Booth D. 1987, Comparison of heuristics for flow shop sequencing. OMEGA: International Journal of Management Science; 15:75–8. [16] Widmer M, Hertz A. 1989,A new heuristic for the flow shop sequencing problem. European Journal of Operational Research; 41:186–93.
The ASALB Problem with Processing Alternatives Involving Different Tasks: Definition, Formalization and Resolution* Liliana Capacho1,2 and Rafael Pastor2 1
Universidad de Los Andes, Dpto. de I.O. - EISULA y CESIMO, Mérida, Venezuela 2 Technical University of Catalonia, IOC Research Institute, Av. Diagonal 647, Edif. ETSEIB, p.11, 08028 Barcelona, Spain {liliana.capacho, rafael.pastor}@upc.edu Abstract. The Alternative Subgraphs Assembly Line Balancing Problem (ASALBP) considers assembly alternatives that determine task processing times and/or precedence relations among the tasks. Capacho and Pastor [3] formalized this problem and developed a mathematical programming model (MILP) in which the assembly alternatives are determined by combining all available processing alternatives of each existing sub-assembly. In this paper an extended definition of the ASALBP is presented in which assembly sub-processes involving different tasks are also considered. Additionally, a mathematical programming model is proposed to formalize and solve the extended version of the ASALBP, which also improves the performance of the former MILP model. Some computational results are included.
1 Introduction The classical Assembly Line Balancing Problem (ALBP) consists in assigning a set of tasks to a group of workstations in order to optimize certain efficiency measure (e.g. the number of workstations). Each task is characterized by a processing time and a set of precedence relations, which specifies the allowed processing order. ALBP are classified into two well-known categories [1]: Simple Assembly Line Balancing Problems (SALBP) and General Assembly Line Balancing Problems (GALBP). SALBP involve very simple and restrictive problems which consider, for example, a unique serial line that processes a single model of one product. GALBP are those in which one or more assumptions of the simple case are varied. Amongst such problems the following groups are usually considered: UALBP that involve Ushaped lines that may be used to overcome the inflexibility of serial lines; MALBP which appear when a line produces units of different models of a unique product (see, for example, Milterburg [8]); MOALBP which consider several optimization objectives; and RALBP that consider robotic lines. Other less common GALBP considered in the literature include problems that involve parallel workstations; parallel tasks; stations not equally equipped, which may imply equipment selection; two-sided or buffered lines; workstation capacity constrained; problems involving processing times that are dependent on the sequence, stochastic or fuzzy; and problems considering multi-product lines (e.g. [5], [9], [12]). *
Supported by the Spanish MCyT project DPI2004-03472 and co-financed by FEDER.
M. Gavrilova et al. (Eds.): ICCSA 2006, LNCS 3982, pp. 554 – 563, 2006. © Springer-Verlag Berlin Heidelberg 2006
The ASALB Problem with Processing Alternatives Involving Different Tasks
555
Assembly line balancing problems have been widely studied (e.g. Baybars [1], Becker and Scholl [2], Scholl and Becker [10]). Although most published research work on assembly line balancing focuses on the simple case, in recent years a significant amount of research effort has been directed at the general case in order to address more realistic problems. Numerous procedures have been developed to solve assembly line balancing problems, which are usually grouped into two main categories: exact methods and approximate methods. According to Becker and Scholl [2], most exact methods are based on linear and dynamic programming and branch and bound procedures (e.g. Scholl and Klein [11]). A wide range of approximate methods have been developed to efficiently solve more realistic ALBP. Among these there are heuristics based on priority rules and enumeration procedures (e.g. Lapierre and Ruiz [7]); metaheuristics such as genetic algorithms (e.g. Kim et al. [6]); simulated annealing (e.g. Suresh and Sahu [12]); tabu search (e.g. Pastor et al. [9]); and others that include procedures based on ant colonies, constrained logic programming, fuzzy logic, expert systems and algorithms based on networks theory (e.g. Gen et al. [5]). A common feature of ALBP is that a unique and predetermined precedence graph is used to represent all relations among the tasks required to assemble a particular product. In real-life problems, however, is possible that some parts of a product admit several alternative assembly variants that cannot be represented in a standard precedence graph. In this regard, Capacho and Pastor [3] presented and formalized a new GALBP entitled ASALBP (Alternative Subgraphs Assembly Line Balancing Problem), which considers the possibility of processing alternatives in which the processing time and/or precedence relations of some tasks depend on the selected sub-assembly alternative. In this problem, each processing alternative consists of a particular processing order of a subset of tasks. Usually, in assembly line balancing processes an alternative for each sub-assembly is selected a priori and the line is then balanced. To solve the ASALBP efficiently, two subproblems need to be solved simultaneously: the decision problem, which selects the assembly sequence of those parts that admit alternatives, and the balancing problem, which assigns the tasks to the workstations. The remainder of this paper is organized as follows. Section 2 introduces the Alternative Subgraphs Assembly Line Balancing Problem. Section 3 presents the extended definition of the ASALBP. Section 4 describes the model proposed to solve the ASALBP regarding the extended definition. Section 5 provides some computational results. Finally, Section 6 presents the conclusions and proposes future research work.
2 ASALBP: The Alternative Subgraphs Assembly Line Balancing Problem As previously mentioned, the ASALBP is a new general assembly line balancing problem that considers assembly alternatives, in which an alternative has to be
556
L. Capacho and R. Pastor
selected for each part that admits assembly variants. Each processing alternative could be represented by a precedence graph as can be seen in Figure 1, which shows two assembly alternatives for a simple example involving six tasks. In alternative 2 the processing order changes for tasks B, C, D and E. Furthermore, the processing time of task B increases 2 time units when it is processed after task E.
Alternative 1
Alternative 2
11
5
4
8
6
8
A
B
C
D
E
F
11
4
8
6
7
8
A
C
D
E
B
F
Fig. 1. Two assembly alternatives
Normally, when a problem with assembly alternatives has to be solved one of the available alternatives for each sub-assembly is selected a priori and the line is then balanced. Smallest total processing time is a common criterion that system designers usually use to choose an assembly alternative. If these two processes – the selection process and the line balancing – are carried out independently, it cannot be guaranteed that the global problem can be solved optimally. However, better solutions can be obtained if the two problems are solved simultaneously, as illustrated in Figure 1. If the two resulting balancing problems are solved optimally, minimizing the number of workstations given an upper bound on the cycle time of 15 time units, the results shown in Table 1 are obtained. Even though it has a longer total processing time (44), Alternative 2 provides the best number of workstations for the problem. If the selection process had been carried out a priori, Alternative 1 would have been selected, since its total processing time is 42, and the best solution would have been discarded. Table 1. Results of balancing the assembly alternatives
Alt 1 2
I A (11) A,C (15)
Tasks per station (time) II III B,C (9) D,E (14) D,E (14) B,F (15)
IV F (8) -
Total processing time 42 44
No. of stations 4 3
The assembly alternatives of Figure 1 represent two variants of an assembly process, which determine two alternative subgraphs: S1, which consists in performing tasks B, C, D and E (in the shown order); and S2, which consists in performing tasks C, D, E and B (in that order). A diagramming scheme called S-Graph, proposed by Capacho and Pastor [3], is used here to represent all alternative subgraphs in a unique precedence graph, as shown in Figure 2. To solve the ASALBP, Capacho and Pastor [3] developed an integer linear programming mathematical model (hereafter referred to as the preliminary model, M1), which decides on both the assembly subgraphs and the line balancing. The
The ASALB Problem with Processing Alternatives Involving Different Tasks
557
task-workstation assignment variables in M1 are defined for each total assembly precedence graph (i.e., global route) obtained by combining the alternative subgraphs of each available subassembly contained in the S-Graph. Therefore, each of the global routes considers the whole set of tasks required to assemble a product. Figure 3 shows two global routes (GR1 and GR2) for the example shown in Figure 1. The computational experiment carried out with M1 shows that optimal solutions can be obtained in a reasonable amount of time for small problems.
S1 11
5
4
8
6
B
C
D
E 8
A
F 4
8
6
7
C
D
E
B
S2
Fig. 2. Precedence S-Graph
GR1
GR2
11
5
4
8
6
8
A
B
C
D
E
F
11
4
8
6
7
8
A
C
D
E
B
F
Fig. 3. Alternative global routes
To address more general problems, the definition of the ASALBP is extended in this paper: it is possible to consider alternative assembly processes that involve different and independent sets of tasks. Additionally, a new integer linear programming mathematical model (hereafter referred to as the enhanced model, M2) has been developed to formalize and solve this extended version of the ASALBP.
3 The Extended ASALBP Definition As mentioned above, the former ASALBP definition considers the possibility of assembly alternatives in such a way that all alternatives of a particular sub-assembly (one of which must be selected) involve the same tasks. In practice, however, industrial problems can involve alternative assembly processes that consider different and mutually exclusive sets of tasks. For instance, consider the toy-manufacturing example given in Das and Nagendra [4], in which a large number of products are assembled from molded plastic parts or from metal stamping. In this example, the tasks are performed only if the assembly process they belong to is selected. Figure 4 shows the S-Graph for an example with three assembly processes involving 3 different and independent sets of tasks: S1 with tasks B1 to B3; S2 with tasks C1 to C4; and S3 with tasks D1 to D3. The new ASALBP definition considers alternative assembly precedence subgraphs that can involve either the same or different sets of tasks. In these two cases, task processing times and/or precedence relations can be dependent on the selected
558
L. Capacho and R. Pastor
subgraph. As in the former definition, it is assumed that assembly alternatives do not overlap each other; thus, each alternative of each subassembly is represented by a unique and independent precedence subgraph. This new ASALBP definition enables more realistic problems to be addressed and on the other hand, relaxes the SALBP hypothesis which states that all tasks must be processed only once.
S1
B1
B2
B3
S2
A
C1 S3
D1
C3
C2 D2
E
C4 D3
Fig. 4. S-Graph with alternative assembly processes
4 Enhanced Model for the Alternative Subgraphs Assembly Line Balancing Problem (M2) Before presenting model M2, some aspects concerning assignment variables and precedence relations are discussed. 4.1 Modeling Assumptions In model M1, each overall assembly alternative referred to as a global route was determined by a precedence graph. As a result, there were as many global assembly routes as combinations of alternative subgraphs for each available subassembly contained in the S-Graph. Consider, for example, the S-graph in Figure 5, which involves 9 assembly tasks. In this example, 9 global routes need to be considered as follows. GR1: A-S1-E-S4-I (i.e. A-B-C-D-E-F1-F2-F3-I), GR2: A-S1-E-S5-I (i.e. AB-C-D-E-G1-G2-G3-I), GR3: A-S1-E-S6-I, GR4: A-S2-E-S4-I, GR5: A-S2-E-S5-I, GR6: A-S2-E-S6-I, GR7: A-S3-E-S4-I, GR8: A-S3-E-S5-I and GR9: A-S3-E-S6-I. Observe that tasks that do not admit processing alternatives are also involved in all global routes. This implies that a large number of task-workstation assignment variables have to be defined for even a small number of routes. In the proposed model, alternative subgraphs (referred to as partial routes) are considered only for those tasks that admit processing alternatives and a unique route (known as a base route) is considered for those tasks without processing alternatives. Task-workstation
S1
R0 I
S4
R1 A
S2
C
D R0
R2 C
S3
D
E
S5
B
D
F2
F3
G2
G3
H2
H3
R0
R5 G1
S6
R3 C
B
R4 F1
R6 H1
Fig. 5. S-Graph for an example with 9 tasks
I
The ASALB Problem with Processing Alternatives Involving Different Tasks
559
assignment variables are thus defined only for the partial routes (alternative subgraphs), thereby considerably reducing the size of the model to be solved. For the example in Figure 5, tasks A, E and I have only one possible processing alternative. Therefore, such tasks are processed according to the base route (referred to as R0). Partial routes R1, R2 and R3 represent the processing alternatives involving tasks B, C and D; R4 is the processing alternative that involves tasks F1 to F3; R5 involves G1 to G3 and R6 involves H1 to H3. For each subset of routes that are alternative to each other (e.g., R4, R5 and R6), one partial route must be selected. Observe that only 7 partial routes are involved; the difference between the number of global and partial routes is even greater because in M1 the number of global routes increases exponentially with the number of partial routes. For global routes, the immediate predecessors of a task for each route are fixed and the precedence constraints can be easily established. However, this is not the case for partial routes. The difficulty arises because an immediate predecessor or the task itself may have alternatives and only one of these is to be selected. Therefore, all possible immediate predecessors of a task must be considered. In order to account for all possible precedence relations implied when considering partial routes and to facilitate their formalization, tasks can be divided into two categories: fixed, which are those without alternatives (i.e., those processed throughout the base route); and mobile, which are those that form a part of alternative routes. As can be seen in Figure 6, tasks A, F and G are fixed whereas tasks B, C, D and E are mobile because they are involved in alternative assembly routes: R1 and R2 involving tasks B and C, and R3 and R4 involving tasks D and E. A fictitious task with nil processing time, α, is used in the S-Graph to represent precedence relations that involve mobile tasks with predecessors that are also mobile but affected by different sets of alternative routes. This case is represented in Figure 6 by the mobile tasks D and E, whose predecessors C and B are also mobile tasks affected by different routes.
S1
S3
R1
B
R3
D
C
E
R0
α
A S2
S4
R2
C
B
R0
R0
F
G
R4
E
D
Fig. 6. Fixed and mobile tasks - precedence relations
Table 2 shows the 5 basic cases of task-predecessor relations, formalized below, which are based on the example in Figure 6. Table 2. Task-predecessor relation typology 1 2 3 4 5
Cases Task i fixed and its predecessor p fixed. Task i fixed and its predecessor p mobile. Task i mobile and its predecessor p fixed. Task i mobile and its predecessor p mobile, with i and p in the same route. Task i mobile and its predecessor p mobile, with i and p in different route.
i G F B C D
p F E,D A B C,B
560
L. Capacho and R. Pastor
4.2 Model M2 Parameters: n Number of tasks (i = 1,…,n). nr Number of routes (r = 0,…,nr). nsr Number of different sets of routes (subgraphs) such that the routes within a set are alternatives to each other (q=1,…,nsr). For instance, in the example of Figure 6 there are 2 such subsets (nsr=2), one containing routes R1 and R2, and one containing routes R3 and R4. mmin, mmax Lower and upper bounds on the number of stations. Set of all routes through which task i can be processed (i = 1,…,n). Ri ct Cycle time. Duration of task i when processed through route r (i = 1,…,n; r ∈ Ri ). tir TRr Pir
PTi
Set of tasks that are affected by route r. Set of the possible immediate predecessors of task i, if task i is processed through route r ( i = 1,..., n ; r ∈ Ri ). Set of all possible immediate predecessors of task i ( P T i = ∪ ∀ r ∈ R Pir ). i
Eir, Lir
Earliest and latest station that task i can be assigned to, if task i is processed through route r ( i = 1,..., n; r ∈ Ri ).
SCRq
Subset q of routes that are alternative among one another (q=1,…,nsr). For the example in Figure 6, there are two of such subsets: SCR1 involving R1 and R2 and SCR2 involving R3 and R4.
Decision variables: xijr ∈{0,1} =1 if task i is assigned to workstation j and processed through route r (i = 1,..., n; ∀r ∈ Ri ; ∀ j ∈ [ E ir , Lir ]) .
yj ∈{0,1} =1 if there is any task assigned to workstation j (j=mmin+1,…,mmax). arr ∈{0,1} = 1 if there is any task processed through route r (r = 1,…,nr). Mathematical Model for the ASALBP-1: to minimize the number of workstations given ct m M in im iz e Z = ∑ j⋅ yj . (1) m ax
j = m m in + 1
Li 0
∑
j= Ei0 L ir
∑ ∑
∀ r ∈ R i j = E ir
nr
∑
∑
x ij 0 = 1 x ijr =
r = 0 ∀ i ( r ∈ Ri ) ∧ ( j ∈[ Eir , Lir ])
nr
∑
∑
r =0 ∀i ( r∈Ri ) ∧( j∈[ Eir ,Lir ])
∑
∀ r∈ Ri
∀i | i ∈ TR0 . a rr
tir ⋅ xijr ≤ ct
tir ⋅ xijr ≤ ct ⋅ yj
∀i | i ∉ TR0 .
(2) (3)
j = 1,..., m min .
(4)
j = mmin + 1,..., mmax .
(5)
The ASALB Problem with Processing Alternatives Involving Different Tasks Lp 0
∑
j ⋅ x pj 0 ≤
j =Ep 0
Lps
∑ ∑ j⋅x
∀s∈Rp j = Eps
Lp0
∑ j⋅x
j =Ep0
L pr
∑
j = E pr
pj 0
Lir
≤
∑ ∑
∀s∈R p j = E ps
ijr
∀r∈Ri j =Eir
j ⋅ x pjr ≤ L ps
∑ ∑ j⋅x Lir
∑
j = E ir
j ⋅ x pjs ≤
pjs
Li 0
∑
j = Ei 0
Lir
∀r ∈Ri j = Eir
Li 0
∑ j⋅x
≤
j = Ei 0
+ mmax ⋅ (1−
(6)
∀i ∈TR0 ,∀p ∈PTi | p ∉TR0 .
ij 0
∑ ar )
∀r∈Ri
r
(7)
∀i ∉ TR0 , ∀p ∈ PTi | p ∈ TR0 .
(8)
∀i ∉ TR0 , ∀r ∈ Ri , ∀p ∈ Pir | [ p ∉ TR0 ∧ r ∈ R p ] .
j ⋅ x ijr
∑ ∑
j ⋅ xij 0 ∀i ∈TR0 ,∀p ∈PTi | p ∈TR0 .
j ⋅ xijr ∀i ∉ TR0 , ∀p ∈ PTi | [ p ∉ TR0 ∧ ( Ri ∩ R p = ∅)]
∑
r ∈SCRq
arr = 1
Lir
∑ ∑x
∀i∈TRr j = Eir
ijr
q = 1,..., nsr
≤ arr ⋅ TRr
.
r = 1,..., nr .
561
.
(9) (10) (11) (12)
The objective function (1) minimizes the number of workstations for a given upper bound on the cycle time. The constraints are: (2) and (3), which ensure that all tasks belonging to a selected route are assigned to one and only one workstation, and otherwise tasks are not assigned; (4) and (5) ensure that the total processing time assigned to workstation j does not exceed the cycle time; (6) to (10) are the precedence constraints, which guarantee that no task is assigned to an earlier workstation than an immediate predecessor; (11) are the route uniqueness constraints that ensure that one and only one route for each subassembly is selected from among the possible routes; and (12) guarantees that tasks belonging to a particular precedence subgraph are assigned to the same route. The mathematical formulation for ASALBP-2, considering the extended definition, can be easily obtained by modifying model M2 developed to solve ASALBP-1, in which the objective function to be considered optimizes the cycle time ct instead of the number of workstations, which is a given parameter.
5 Computational Experiment To solve problems involving alternative processes with different sets of tasks, model M1 was easily adapted by properly defining the global routes. To compare models M1 and M2 and evaluate their performance regarding industrial-sized problems, a computational experiment was carried out. The test-problem instances were designed using some of the benchmark data sets available at the webpage www.assembly-linebalancing.de for assembly line balancing research (i.e., the problems of Bowman, Mansor, Buxey, Gunther, Kilbrid, Hann, Warnecke, Tonge and Arc with 8, 11, 29, 35, 53, 58, 70 and 111 tasks respectively). The problems were adapted by incorporating a number of assembly alternatives (between 2 and 14) and using 3 or 4 different cycletime values. When alternative assembly processes involving different tasks were considered, new sets of tasks were also added to the original problems. For example, 9 new tasks were added to Hann’s problem to contemplate 4 new assembly processes
562
L. Capacho and R. Pastor
involving 2, 3, 2 and 2 tasks respectively. A total of 82 test-problem instances were defined and solved with both models using the optimization software ILOG CPLEX 9.0 on a PC Pentium 4, CPU 2.88 GHz with 512 Mb of RAM. Table 3 presents the data and results for some of these problems, including the name of the problem, the number of tasks n, the cycle time ct, and the number of global routes for model M1 and partial routes for model M2. Table 3. Results of optimally solving ASALBP instances Problem
n
ct
Bowman Mansor Mansor Buxey Buxey Gunther Gunther Kilbrid Kilbrid Hann Hann Hann Warnecke Warnecke Tonge
8 11 11 29 29 35 40 45 45 53 58 62 58 58 70
20 62 62 54 54 41 81 56 69
No. of routes Global Partial
4676 2004 2806
111 111 185
18 12 15 12 6 32 60 12 24 18 24 36 2 4 8
9 8 9 8 6 11 13 8 10 9 10 12 3 5 7
Constraints M1 M2
M1
366 288 352 861 444 2633 4287 1383 2505 2424 3400 5210 368 648 1428
1744 804 1002 3850 1936 25806 28824 10840 17312 4780 19516 22340 3186 6318 21356
77 74 78 147 134 205 189 204 217 238 263 280 235 253 342
Variables M2 888 547 614 2581 1941 8911 6276 7247 7241 2403 8157 7471 4754 7888 18702
Solving Time M1 M2 0.53 0.63 2.10 61547 18485 89558 467 213 830 114 8356 19785 7200 17709 259200
% of Improv.
0.03 0.09 0.16 92.03 0.86 14805 0.31 1.41 1.06 0.13 3.48 249 638 1410 80122
94,3 85,7 91,0 99,9 100 83,5 99,9 99,3 99,9 99,9 100 98,7 91,1 92,0 69,1
Average percentage of improvement of M2 over M1
93,6
Table 3 shows that model M2 outperformed model M1 in all cases. Furthermore, M2 achieved around 94 % of average improvement; reaching a 100% in nearly half of the problems solved. On the other hand, the number of variables and constraints was significantly reduced (as intended) and the solving time was considerably smaller than with the preliminary model M1. Therefore, optimal solutions could be obtained and guaranteed in a reasonable amount of time for problem instances involving up to 70 tasks and 40 assembly alternatives (i.e., 14 partial routes). For the problems for which no optimal solution was guaranteed, models M1 and M2 were solved with the computing time restricted to 1800 seconds (a realistic time window in an industrial environment). Table 4 shows the results obtained with model M2; it presents no data for M1 because the established time limit was exceeded without any solution being provided for these problems. Observe that model M2 obtained solutions with one workstation deviation from the benchmark optimum. Furthermore, the known optimal solution was found for Gunther’s and Tonge’s problems (marked in Table 4 with an asterisk). Table 4. Problems solved within a 1800-seconds time window Problem name
No. of tasks
Gunther* Warnecke Tonge Tonge* Arc2
35 58 70 70 111
Cycle time 41 111 320 220 17067
No. of routes Global Partial 32 11 2 3 8 7 8 7 4 5
No. of workstations Obtained Optimal 14 14 15 14 12 11 17 17 10 9
The ASALB Problem with Processing Alternatives Involving Different Tasks
563
6 Conclusions and Further Research In this paper an extended definition of the ASALBP has been presented. More realistic problems can be addressed with this new definition since it allows alternative assembly processes to be considered that can involve either the same or different and mutually exclusive sets of tasks. Therefore, the hypothesis that states that tasks must be processed only once is relaxed because some tasks are not processed if the alternative they belong to is not selected. Additionally, a mathematical programming model has been presented, which solves the ASALBP considering the extended definition. With this new model, the problems are solved in a significantly shorter time than with the preliminary model adapted to the new definition; moreover, medium-sized problems are solved in a very reasonable amount of time. For bigger problems, good feasible solutions were obtained in 1800 seconds, a realistic time window in an industrial environment. Nevertheless, since the solving time increases exponentially with the number of tasks and assembly alternatives, further research is needed in order to develop heuristics to solve the ASALBP regarding the extended definition presented and formalized in this paper.
References 1. Baybars, I. A survey of exact algorithms for the simple assembly line balancing problem. Management Science, 32, 909-932 (1986). 2. Becker, C. and Scholl, A. A survey on problems and methods in generalized assembly line balancing. Eur. J. of Op. Res., 168, 694-715 (2006). 3. Capacho, L. and Pastor, R. ASALBP: the Alternative Subgraphs Assembly Line Balancing Problem. Technical Report IOC-DT-P-2005-5. UPC. Spain. Jan. (2005). 4. Das, S. and Nagendra, P. Selection of routes in a flexible manufacturing facility. Int. Journal of Production Economics, 48, 237-247 (1997). 5. Gen, M., Tsujimura, Y. and Li, Y. Fuzzy assembly line balancing using genetic algorithms. Comp. & Ind. Eng., 31, 631-634 (1996). 6. Kim, Y.K., Kim, Y.J. and Kim, Y. Genetic algorithms for assembly line balancing with various objectives. Comp. and Ind. Eng., 30, No.3, pp. 397-409 (1996). 7. Lapierre, S.D. and Ruiz, A.B. Balancing assembly lines: an industrial case study. J. of the Operational Research Society, 55, 559-597 (2004). 8. Miltenburg, J. Balancing and scheduling mixed-model U-shaped production lines, International Journal of Flexible Manufacturing Systems, 14, 119-151. (2002). 9. Pastor, R., Andres, C., Duran, A. and Perez, M. Tabu search algorithms for an industrial multi-product, multi-objective assembly line balancing problem, with reduction of task dispersion, J. Op. Res. Society, 53, 1317-1323 (2002). 10. Scholl, A. and Becker, C. State-of-the-art exact and heuristic solution procedures for simple assembly line balancing. European J. of Op. Res., 168, 666-693 (2006). 11. Scholl, A. and Klein, R. SALOME: A bidirectional branch and bound procedure for assembly line balancing. INFORMS Journal on Comp., 9, 319-334 (1997). 12. Suresh, G. and Sahu, S. Stochastic assembly line balancing using simulated annealing, Int. Journal of Production Research 32, 1801-1810 (1994).
Satisfying Constraints for Locating Export Containers in Port Container Terminals Kap Hwan Kim and Jong-Sool Lee Department of Industrial Engineering, Pusan National University, Changjeon-dong, Kumjeong-ku, Pusan 609-735, Korea {kapkim, jong-sool}@pusan.ac.kr
Abstract. To allocate spaces to outbound containers, the constraint satisfaction technique was applied. Space allocation is pre-assigning spaces for arriving ships so that loading operations can be performed efficiently. The constraints, which are used to maximize the efficiency of yard trucks and transfer cranes, were collected from a real container terminal and formulated in the form of constraint. Numerical experiments were conducted to evaluate the performance of the developed algorithm.
1 Introduction Operations in port container terminals consist of the discharging operation (during which containers are unloaded from ships), the loading operation (during which containers are loaded onto ships), the delivery operation (during which inbound containers are transferred from the marshalling yard to outside trucks), and the receiving operation (during which outbound containers are transferred from outside trucks to the marshalling yard). The discharging operation and the loading operation are together called the “ship operation.” For sake of optimal customer service, the turnaround time of container-ships must be minimized by increasing the speed of the ship operation, and the turnaround time of outside trucks must be shortened as much as possible. Figure 1 shows a container yard. In container terminals, the loading operation for outbound containers is carefully pre-planned by load planners. For the load planning, the responsible container-ship agent usually transfers a load profile (an outline of a load plan) to the terminal operating company several days before a ship’s arrival. In the load profile, each slot (cell) is assigned a container group, which is identified by type (full or empty), port of destination, and the size of container to be stowed onto. Because a cell of a ship can be filled with any container within its assigned group, the handling work in the marshalling yard can be facilitated by optimally sequencing outbound containers for the loading operation. In sequencing the containers, load planners usually attempt to minimize the handling of quay cranes and the yard equipment. The output of this decisionmaking process is called the “load sequence list.” To find an efficient load sequence, outbound containers must be laid out in the optimal location. The main focus of this paper is to suggest a method of pre-allocating storage space for arriving containers so that maximum efficiency is achieved in the loading operation. M. Gavrilova et al. (Eds.): ICCSA 2006, LNCS 3982, pp. 564 – 573, 2006. © Springer-Verlag Berlin Heidelberg 2006
Satisfying Constraints for Locating Export Containers in Port Container Terminals
Slot
565
Bay
Stack Block
Yard Crane Fig. 1. An illustration of a container yard
In order to obtain an efficient load sequence, the following must be considered during the space planning process. Figure 2 shows a containership’s cross-sectional view, which is called a ship-bay. The figure shows cells into which the containers of two groups (defined by size and port of destination) are assigned. A widely accepted principle for space planning is that yard-bays assigned to a containership should be located near the berthing position of the corresponding ship. In addition, there may be other principles of space planning that depend on the type of yard equipment. One such example is that containers of different groups should not be mixed in the same yard-bay. This principle is valid only for the indirect transfer (combined) system of yard-side equipment (yard crane or straddle carrier) and prime movers. During the loading operations of containers, containers of the same group are likely to be loaded onto cells located close together, as illustrated in Figure 2, and thus, the containers are usually loaded consecutively. Therefore, for the case of the indirect transfer system, the travel distance of the yard-side equipment can be reduced by placing containers of the same group in the same yard-bay. In addition, there are many practical rules that yard planners use for allocating spaces to different groups of containers. The rules will be discussed further in the following sections. There have been many related studies regarding container terminals. TalebIbrahimi [1] analyzed the space-allocation problem with a constant or cyclic space requirement for stacking containers. Kim and Kim [2] formulated a quadratic mixedinteger-programming model for the dynamic space-allocation problem, but they did not suggest an efficient algorithm for the mathematical model. Kim and Kim [3] addressed the space allocation problem for inbound container. Kim and Kim [4] discussed the factors that affect the efficiency of the loading operation of outbound containers. Kim et al. [5] suggested a method for determining storage locations for outbound containers so that the number of rehandles during the loading operation is minimized. Cao and Uebe [6] suggested a transportation model with a non-linear constraint for assigning available space to space requirements. However, they did not consider the dynamic aspect of container flows over the time horizon. Kozan [7] proposed a network model to describe the flow of containers in port container terminals. The model attempted to determine flows of different types of containers in a way that minimizes the total handling cost. Roll and Rosenblatt [8] suggested a grouped
566
K.H. Kim and J.-S. Lee
storage policy that is based on a concept similar to the space-allocation problem in container terminals. They applied the group storage strategy as a storage policy for warehouses. Tsang [9] described the constraint satisfaction technique in detail. Zhang et al. [10] discussed the storage space allocation problem in the storage yards of container terminals. They decomposed the space allocation problem into two levels: the subproblem in the first level attempts to balance workloads among different yard blocks, while the second subproblem minimizes the total transportation distance for moving containers between blocks and vessel berthing locations. Kim and Park [11] proposed a multicommodity minimal cost flow problem model for the space allocation problem. A subgradient optimization technique was applied to solve the problem. All the previous studies assumed that the objective function is clearly defined, and that feasible solutions can be easily obtained. However, in container terminals, there are many complicated constraints to be satisfied, and so, finding a feasible solution itself is a difficult problem. This is why CSP technique is applied to the space allocation problem in container terminals. Destination G port
Weight group
(Rotterdam)
(heavy, medium, light)
y{tG y{tG sGYWG sGYWG
G
y{tG y{tG G sGYWG sGYWG
Size (40 feet)
zpuG sGYWG
zpuG sGYWG
zpuG sGYWG
zpuG G sGYWG
G
y{t sGYW
y{t sGYW
y{t sGYW
y{t G sGYW
G
G
G G
G
G
G
G
G
G
G
G
G
oht sG[W
ohtG G sG[WG
G
G
G
y{tG y{tG tGYWG tGYWG
G
zpuG tGYWG
zpuG sGYWG
G
y{t sGYW
y{t sGYW
G
G
mls sGYW
mls sGYW
G
G
oht sG[W
ohtG sG[WG
G
G
y{tG y{tG oGYWG tGYWG
G
zpuG zpuG tGYWG tGYWG
G
y{t oGYW
y{t oGYW
G
G
mls oGYW
mls tGYW
G
G
oht tG[W
ohtG tG[WG
G
G
y{tG oGYWG
G
zpuG oGYWG
y{t oGYW
G
G
mls oGYW
G
oht oG[W
ohtG oG[WG
G
Bay 05
zpuG oGYWG
Bay 06
Bay 07
Fig. 2. An example of a stowage plan of a container-ship
2 Space Allocation Problem for Export Containers It is assumed that the allocation of space is performed periodically. The length of the allocation period may be one day, 12 hours, or 6 hours, depending on the level of uncertainty and the time of the computation. Each period is called a “stage” in the decision-making process of space allocation. The level of inventory in containers that arrive at a container yard follows a similar pattern. Arriving containers are classified into container groups, each of which is a collection of containers of the same length, vessel, and destination port. It is also assumed that containers of different groups are not stacked in the same yard-bay. The space must be pre-allocated for each group of containers that will arrive during the next stage. However, if decisions for the next stage are made without considering future changes in the yard, it may be impossible to find a feasible solution for the succeeding stages. Thus, an investigation must be performed on the effects of the decisions for the next stage on those for the subsequent stages. In this study, the investigation is performed by the CSP technique.
Satisfying Constraints for Locating Export Containers in Port Container Terminals
567
By using the forecasted arrival of containers, space requirements are estimated for each group of containers that will arrive at the yard in the next stage and the subsequent stages. A container group that requires an allocation of space is called an SDU (Space Demand Unit). The amount of space needed by each SDU is expressed in the unit of one yard-bay for 20-ft containers and two yard-bays for 40-ft containers. Based on the expected arrival of containers, SDUs for the next stage and the subsequent stages, all of which represent the demand side of the space allocation, must be specified. Next, the supply side of space allocation must be considered. A container yard for outbound containers is usually divided into several blocks, each of which consists of 20 to 30 yard-bays. Each yard-bay consists of 20 to 30 stacks in a straddle carrier system and 6 to 8 stacks in a yard crane system. Space can be allocated in units of stacks or yard-bays, depending on the type of handling equipment and the spaceallocation strategy used. In this study, a yard-bay is considered to be the unit of space allocation (SAU). Figure 3 shows the conceptual representation of the space allocation for this study. The space allocation assigns one or two available SAUs to an SDU. No SAU can be assigned to more than one SDU.
SDU 1
SAU 1
SDU 2
SAU 2
SDU 3
SAU 3
: :
SAU 4
:
:
:
S D U n -1 SD U n
: S A U m -1 SAU m
SDUs
SA U s
Fig. 3. Matching SDUs with SAUs
One of the difficulties of the space allocation problem is that the quality of the allocation decisions can be evaluated only when the loading operation is performed. However, the efficiency of the loading operation is dependent on the load sequencing of the outbound containers as well as the allocations. Because load sequencing is another complicated decision-making problem and there are many complicated constraints to be satisfied for the space allocation, the optimization is not a practical approach for the space allocation problem.
568
K.H. Kim and J.-S. Lee
This paper discusses how the technique for the constraint satisfaction problem (CSP) can be applied to the space allocation problem. The following explains the cons traints that were collected from an actual container terminal. Most of the constraints a re related to rules that have been used for a long time by yard planners for efficient lo ading operations in container terminals. (Constraint 1) The distance between the berth that a vessel arrives at and the location of a block where the outbound containers of the vessel are stacked must be less than a specified maximum limit. This constraint is necessary to reduce the travel cost of yard trucks between the apron and the yard. (Constraint 2) The maximum distance between blocks where containers for one vessel are located must be less than a specified value. This constraint is necessary to reduce the travel distance of transfer cranes. (Constraint 3) Containers for one vessel must be stacked in the blocks that are located in the same row of the yard. This constraint is necessary because transfer cranes can travel more easily in the lengthwise direction of blocks than in the widthwise direction of blocks. (Constraint 4) A block’s space cannot be allocated to the receiving operation of a vessel when the loading operation of another vessel is scheduled at the same time at the same block. This constraint is necessary to prevent the congestion of transfer cranes in the same block. (Constraint 5) The number of vessels onto which containers stacked in a block will be loaded cannot exceed a specified limit (NVmax ). This constraint has the effect of simultaneously restricting both the maximum number of blocks to be allocated to a vessel and the minimum number of containers to be stacked for a vessel. (Constraint 6) The number of blocks, in which the containers to be loaded onto the same vessel are stacked, cannot exceed a specified limit (NBmax). When the containers for one vessel are scattered over too many blocks, the travel distance of the transfer cranes may be excessive. (Constraint 7) A 40-ft container requires two consecutive 20-ft yard-bays. In addition to the above constraints, other constraints can be additionally considered without significantly modifying the search algorithm.
3 Application of the CSP Techniques to the Allocation of Spaces Figure 4 shows the structure of the program developed for the space allocation problem. The system consists of an interface layer, a constraint specification layer, and a search layer. In the interface layer, variables and their domains are specified. In the space allocation problem, variables correspond to SDUs, while the domain of an SDU corresponds to SAUs that can be allocated to the SDU. In the constraint specification layer, constraints, which are expressed in the form of equations, are specified. Various program modules are already provided and can be easily used only by specifying the values of parameters. The following describes the search procedure in this study: Step 1: Define the variables and domain, which is a set of values that the variable can take, of each variable.
Satisfying Constraints for Locating Export Containers in Port Container Terminals
569
Step 2: If there remains no more variable, then stop. Otherwise, select the next variable. Step 3: Select the next value. Assign the selected value to the variable. If all the variables are assigned values, then stop. Otherwise, go to Step 4. Step 4: Reduce the problem. In this step, values of the remaining variables, which do not satisfy at least one constraint, will be removed from the domains of the variables. Check if there is a variable whose domain becomes empty. If yes, then go to Step 5. If no, then go to Step 2. Step 5: Check if there remains any value to assign for the current variable. If yes, then go to Step 3. If no, then let the current variable be the previous variable (backtracking) and go to Step 3.
Variable specification
Domain specification
Interface layer
Constraint specification
Constraint selection layer
SEARCH ENGINE Constraint propagation Search techniques
Solver layer
Fig. 4. The structure of the program developed for the space allocation
4 Numerical Experiment Numerical experiment was conducted to test the performance of the search algorithm and find the best search strategies. 4.1 Input Data for the Numerical Experiment The algorithm developed in this study was applied to solve a real space allocation problem of a large container terminal (PECT: Pusan Eastern Container Terminal) in Pusan. Various search strategies were tested to evaluate the speed of the search algorithm. The strategies used were variable-ordering rules, value-ordering rules, and constraint-ordering rules. A problem with two stages, 84 variables (SDUs) which approximately equal to 15 (vessels) 2 sizes) × 3 (destination ports), and 600 values (SAUs) of each variable, which corresponds to the number of bays in the yard, was solved. The data set came from a practical case with 15 vessels and 24 blocks, and 4 berths. The problems in the experiment considered all seven constraints mentioned in section 2. Parameters for constraints 5 and 6 were set as follows: NVmax = 3 and NBmax = 3.
× (
570
K.H. Kim and J.-S. Lee
4.2 Experiment to Evaluate Various Variable-Ordering Strategies The following three criteria were used for ordering variables. (1) Stage of SDU: The SDUs of earlier stages have higher priorities than those of later stages. (2) Size of containers of SDU: The SDUs of 40-ft containers have higher priorities than those of 20-ft containers. (3) Vessel of SDU: The SDUs of vessels arriving at a terminal are prioritized based on chronological order. By combining the three different criteria, three variable-ordering rules were constructed as follows: (Rule 1) Sequence SDUs according to the stage criteria. (Rule 2) Sequence SDUs according to the size criteria. (Rule 3) Sequence SDUs according to the stage criteria first, and followed by the vessel criteria. (Rule 4) Sequence SDUs according to the stage criteria first, the vessel criteria second, and the size criteria third. SDUs with the same values of sequencing criteria are sequenced in a random order. The values in the domains are sequenced in the order of increasing bay ID. The problem was solved for ten initial distributions of containers. Results in Table 1 show the CPU time to find a feasible solution for ten problems. Through a statistical test, three null hypotheses that the computational time by rule 4 is not greater than that by each of the other three rules were rejected under the significance level of 1%. The results of the hypothetical test imply that using rule 4 results in a shorter computation time, compared to the other rules. Table 1. The computational time for various variable-ordering rules (in seconds)
Initial distribution 1 2 3 4 5 6 7 8 9 10 Average
Rule 1
Rule 2
Rule 3
Rule 4
654 670 643 665 663 657 655 647 660 662 658
596 602 579 588 607 577 589 590 610 588 593
556 570 554 586 564 565 573 546 577 573 566
497 521 504 487 479 518 510 496 488 505 501
Satisfying Constraints for Locating Export Containers in Port Container Terminals
571
4.3 Experiment to Evaluate Two Value-Ordering Strategies An experiment was performed on different sequences of values in the domains. Rule 3 was used as the variable-ordering rule. Two rules for value-ordering were compared with each other. The first rule is to sequence SAUs in the alphabetical order of the bay ID, and this rule will be called the “bay ID rule.” The second rule is to give higher priorities to SAUs that are located in the blocks nearer to the berthing location of the vessel corresponding to the SDU. The second rule will be called the “closest-toberth rule.” As in the case of the first experiment, ten problems with different initial distributions of stacked containers were solved. The results of the numerical experiment are shown in Table 2. Table 2. The computational time for two value-ordering rules (in seconds)
Initial distribution of containers 1 2 3 4 5 6 7 8 9 10 Average
Bay ID rule
Closest-to-berth rule
556 570 554 586 564 565 573 546 577 573 566
537 546 532 566 563 535 549 532 545 558 543
By a statistical test, it was concluded that the closest-to-berth rule outperforms the bay ID rule in computational time under the significance level of 1%. 4.4 Experiment to Evaluate Various Constraint-Ordering Strategies There are many constraints that solutions of the space allocation problem must satisfy. The propagation sequence of constraints during the search process is expected significantly affect the computational time, which will be tested in this subsection. Constraints 3, 5, 6, and 7 were considered. It was assumed that NVmax = 3 and NBmax = 3. Rule 3 and the bay ID rule were used as the variable-ordering rule and the valueordering rule, respectively. For each sequence of constraints, ten problems with different initial distributions of containers were solved. Table 3 shows the average computational time of the ten test problems for different sequence of constraints. The table shows that the sequence of constraints significantly affects the computational time and that constraint 5 should be propagated first during the search process.
572
K.H. Kim and J.-S. Lee Table 3. The computational time for different sequences of constraints
Seq.
Sequence of constraints
Search time (in s)
Seq. 13
Sequence of constraints 3Æ6Æ7Æ5
Search time (in s) 564
1
5Æ7Æ6Æ3
465
2
5Æ6Æ3Æ7
472
14
3Æ5Æ6Æ7
567
3
5Æ7Æ3Æ6
477
15
3Æ7Æ5Æ6
567
4
5Æ3Æ7Æ6
497
16
3Æ7Æ6Æ5
570
5
6Æ5Æ3Æ7
498
17
3Æ5Æ7Æ6
584
6
5Æ3Æ6Æ7
501
18
3Æ6Æ5Æ7
592
7
5Æ6Æ7Æ3
503
19
7Æ5Æ3Æ6
598
8
6Æ7Æ5Æ3
509
20
7Æ3Æ6Æ5
607
9
6Æ7Æ3Æ5
512
21
7Æ3Æ5Æ6
612
10
6Æ3Æ5Æ7
513
22
7Æ5Æ6Æ3
629
11
6Æ3Æ7Æ5
520
23
7Æ6Æ5Æ3
630
12
6Æ5Æ7Æ3
532
24
7Æ6Æ3Æ5
632
5 Conclusions The constraint satisfaction problem (CSP) technique was applied to a space allocation problem for outbound containers. A program that realized the CSP concept was developed for the space allocation. Constraints for the space allocation problem were introduced. Using real data collected from the Pusan Eastern Container Terminal, Korea, numerical experiments were conducted to evaluate the performance of the developed algorithm. Various variable-ordering rules were compared with each other in terms of their computational time. The results showed that sequencing space demand (requirement) units by the stage criteria first, the vessel criteria second, and the size criteria third results in the shortest computational time. It was also shown that the value-ordering rule significantly affects the computational time. Lastly, various sequences of constraint propagation during the search process were compared with each other. It was also shown that the sequence of constraints significantly affects the computational time.
Acknowledgments This work was supported by the Regional Research Centers Program (Research Center for Logistics Information Technology), granted by the Korean Ministry of Education & Human Resources Development.
Satisfying Constraints for Locating Export Containers in Port Container Terminals
573
References 1. Taleb-Ibrahimi, M., Castilho, B., and Daganzo, C. F.: Storage Space vs Handling Work in Container Terminals. Transportation Research l27B (1) (1993) 13-32 2. Kim, K. H. and Kim, D. Y.: Group Storage Methods at Container Port Terminals. The Material Handling Engineering Division 75th Anniversary Commemorative Volume ASME MH-Vol.2 (1994) 15-20 3. Kim, K. H. and Kim, H. B.: Segregating Space Allocation Models for Container Inventories in Port Container Terminals. International Journal of Production Economics 59 (1999) 415-423 4. Kim, K. H. and Kim, K. Y.: An Optimal Routing Algorithm for a Transfer Crane in Port Container Terminals. Transportation Science 33(1) (1999) 17-33 5. Kim, K. H., Park, Y. M., and Ryu, K. R.: Deriving Decision Rules to Locate Export Containers in Container Yard. European Journal of Operational Research 124 (2000) 89-101 6. Cao, B. and Uebe, G.: Solving Transportation Problems with Nonlinear Side Constraints with Tabu Search. Computers Ops Res. 22(6) (1995) 593-603 7. Kozan, E.: Optimizing Container Transfers at Multimodal Terminals. Mathematical and Computer Modelling 31 (2000) 235-243 8. Roll, Y. and Rosenblatt, M. J.: Random versus Grouped Storage Policies and Their Effect on Warehouse Capacity. Material Flow 1 (1983) 199-205 9. Tsang, E.: Foundations of Constraint Satisfaction. Academic Press Limited, UK (1993) 10. Zhang, C., Liu, J., Wan, Y.-W., Murty, K. G, and Linn, R. J.: Storage Space Allocation in Container Terminals. Transportation Research. 37B (2003) 883-903 11. Kim, K. H. and Park, K. T.: Dynamic Space Allocation for Temporary Storage. International Journal of Systems Science 34 (2003) 11-20.
A Price Discrimination Modeling Using Geometric Programming Seyed J. Sadjadi1 and M. Ziaee2 Iran University of Science and Technology [email protected]
Abstract. This paper presents a price discrimination model which determines the product’s selling price and marketing expenditure in two markets. We assume production as a function of price and marketing cost in two states. The cost of production is also assumed to be a function of production in both markets. We propose a Multi Objective Decision Making (MODM) method called Lexicograph Approach (LA) in order to find an efficient solution between two objectives for each market. Geometric Programming is used to solve the resulted model. In our GP implementation, we use a transformed dual problem in order to reduce the model to an optimization of a nonlinear concave function subject to some linear constraints and solved the resulted model using a simple grid line search. Keywords: Mathematical Programming, Production and Operations Management, Economics.
1
Introduction
One of the most important issues on having a fair price discrimination strategy is on choosing a right model. Many traditional discrimination models assume production as a function of price. The production function often is in a form of linear or quadratic. On the other hand, economists normally study the cost of production as a function of production with similar linear or quadratic pattern. The production and cost functions considered in this paper have an exponential form of price, marketing expenditure and production, respectively. This type of modeling have been widely used in the literature [4, 7, 8, 10]. They consider demand as a function of price and marketing expenditure and assume that when demand increases production will be less costly. Sadjadi et. al. [10] study the effects of integrated production and marketing decisions in a profit maximizing firm. Their model formulation is to determine price, marketing expenditure, demand or production volume, and lot size for a single product with stable demand when economies of scale are given. Hoon and Cerry [5] present a method for optimal inventory policies for an economic order quantity with decreasing cost functions. Lee [7] considers the same demand function to determine order quantity and selling price. In their implementation, they use a previous [8] model formulation, with an adaptation of geometric programming(GP), to determine M. Gavrilova et al. (Eds.): ICCSA 2006, LNCS 3982, pp. 574–580, 2006. c Springer-Verlag Berlin Heidelberg 2006
A Price Discrimination Modeling Using Geometric Programming
575
the global solution of the model. The primary assumption of this paper is to determine price and marketing strategy in two markets. Suppose a fortune 500 company which markets its product in North America and plans to penetrate into a new market with the same advertisement expenditure. Therefore, we have a new market with a new price strategy. However, due to a big advertisement strategy, the company has already introduced its brand and does not need for any additional advertisement. Therefore, we assume that demands in both markets are functions of different prices but unique marketing cost. The objective function of our modeling is to maximize the profits for both markets. However, the primary objective is to maximize the profitability in the first market and the profit maximization in the second market is our secondary objective. Therefore, we have to maximize the profitability for two objective functions. The resulted MODM problem is solved using Lexicograph Approach where we maximize the profit for the first state and then we optimize the second market’s profit keeping the optimal profitability of the first market as a constraint. The resulted problem in each stage is in posynomial GP problem [3]. We use GP method to find the optimal solution of the resulting model. This paper is organized as follows. We first present the problem statement. Next, the problem statement is presented in MODM form. GP method is used to find the optimal solution of the problem formulation. Throughout the paper, we use a numerical example in order to show the implementation of the algorithm. Finally, concluding remarks are presented at the end to summarize the contribution of the work.
2
Problem Statement
Consider a single product where demand is affected by selling price. Let Pi , αi , M and γi be the selling price per unit, price elasticity to demand, marketing expenditure per unit and marketing expenditure elasticity to demand in the market i = 1, 2, respectively. For both markets we assume, Di = ki Pi−αi M γi , i = 1, 2.
(1)
where productions (Di , i = 1, 2) are defined as a function of price per unit (Pi ) and marketing expenditure (M ) with αi > 1, 0 < γi < 1, i = 1, 2. The scaling constants ki represent other related factors and the assumption αi > 1, i = 1, 2 implies that D1 and D2 increase at a diminishing rate as P1 and P2 decrease. This type of relationship is widely used in the literature [7, 8, 9, 10]. Besides, (1) can be easily estimated by applying linear regression to the logarithm of the function. Let c1 and c2 be the production cost per unit for state one and two, respectively. We assume that the unit production cost (c1 ) and (c2 ) can be discounted with β1 and β2 , respectively. Therefore we have, c1 = u1 D1−β1 ,
c2 = u2 D2−β2 ,
(2)
where D1 and D2 are production lot size(units), u1 and u2 are the scaling constants for unit production cost in state one and two, respectively . The exponent
576
S.J. Sadjadi and M. Ziaee
β1 and β2 represent lot size elasticity of production unit cost with 0 < β1 , β2 < 1 which are almost the same as price elasticity α and we suggest a small value for it, say β1 , β2 = 0.01. We will also explain that the algorithm we use imposes some other limitations for all the parameters in our model. 2.1
The Proposed Model
In this section, we present our proposed production lot sizing and marketing model (πi , i = 1, 2) based on the explained assumptions. For the first we are interested in maximizing the profit π(Pi , M ) simultaneously in order to determine the prices and marketing expenditure for the planning horizon as follows, max πi (Pi , M ) = Revenue in Market i -Production cost in Market i (3) - Marketing expenditure in Market i = Pi Di − Ci Di − M Di , i = 1, 2. As we can observe in (3), we are interested in maximization of π1 and π2 . Let π1∗ be the optimal solution for π1 . Using an auxiliary variable t we have, max
P2 D2 − C2 D2 − M D2 ,
subject to π1 = P1 D1 − C1 D1 − M D1 ≥ tπ1∗ .
(4)
In order to solve (4), we need to have the optimal solution π1∗ . The optimal solution for π1 is obtained as follows, max z = P1 D1 − C1 D1 − M D1 .
(5)
Problem (5) is in Geometric Programming which can be easily formulated in posynomial form. Since there are two variables and three terms associated with (5) the degree of difficulty is equal to 3-(2+1)=zero [3]. Therefore we have, max z = π1 or min z −1 (6) subject to P1 D1 − C1 D1 − M D1 ≥ z or min z −1 subject to α (β1 −1)
k1 P11−α1 M γ1 − u1 k11−β1 P1 1
M γ1 (1−β1 ) − k1 P1−α1 M γ1 +1 ≥ z. (7)
Therefore we have, min z −1 subject to
u1 k1−β1 P1α1 β1 −1 M −β1 γ1 + P1−1 M + k1−1 P1α1 −1 M −γ1 z ≤ 1.
(8)
A Price Discrimination Modeling Using Geometric Programming
577
Problem (8) is in posynomial form and can be solved using its dual problem formulation as follows, d(π1 ) = max f (w) = subject to
1 w0
w0
−β
u1 k1 1 λ w1
w1
λ w2
w2
k1−1 λ w3
w3
w0 = 1 −w0 + w3 = 0
(9)
(α1 β1 − 1)w1 − w2 + (α1 − 1)w3 = 0 −β1 γ1 w1 + w2 − γ1 w3 = 0. Thus, w1 = (γ1 + 1 − α1 )/(α1 β1 − β1 γ1 − 1), w3 = 1 w2 = (β1 γ1 − γ1 )/(α1 β1 − β1 γ1 − 1), λ = (α1 β1 − α1 )/(α1 β1 − β1 γ1 − 1) (10) Using wi , i = 0, . . . , 3 from (10), one can determine the optimal solution π1∗ from (9) and solve (4) as follows, min π1∗ −1 t−1 + z2−1 subject to P1 D1 − C1 D1 − M D1 ≥ tπ1∗ P2 D2 − C2 D2 − M D2 ≥ z2 .
(11)
or min π1∗ −1 t−1 + z2−1 subject to u1 k1−β1 P1α1 β1 −1 M −β1 γ1 + P1−1 M + π1∗ k1−1 P1α1 −1 M −γ1 t ≤ 1, u2 k2−β2 P2α2 β2 −1 M −β2 γ2 + P2−1 M + k2−1 P2α2 −1 M −γ2 z2 ≤ 1.
(12)
Problem (12) is a minimization of a nonlinear posynomial objective function subject to two posynomial constraints. Since there are eight terms and five variables, the degree of difficulty is 8-(5+1) = 2. Let δ01 δ11 δ13 δ22
= π1∗ −1 t−1 , = u1 k1−β1 P1α1 β1 −1 M −β1 γ1 = π1∗ k1−1 P1α1 −1 M −γ1 t = P2−1 M
δ02 δ12 δ21 δ23
= z2−1 = P1−1 M = u2 k2−β2 P2α2 β2 −1 M −β2 γ2 = k2−1 P2α2 −1 M −γ2 z2 .
(13)
Let wij be the dual variables associated with δij for i = 0, 1, 2 and j = 1, 2, 3, respectively. Therefore we have,
578
S.J. Sadjadi and M. Ziaee
d(π2 ) = max f (w) = ∗ −1 w01 π1 λ0 w01
λ0 w02
−β
u2 k2 2 λ2 w21
w02 w21
−β
u1 k1 1 λ1 w11
λ2 w22
w22
w11 k2−1 λ2 w23
λ1 w12
w12
π1∗ k1−1 λ1 w13
w13
×
w23
subject to −w01 + w13 = 0 −w02 + w23 = 0 (α1 β1 − 1)w11 − w12 + (α1 − 1)w13 = 0, (α2 β2 − 1)w21 − w22 + (α2 − 1)w23 = 0, −β1 γ1 w11 + w12 − γ1 w13 − β2 γ2 w21 + w22 − γ2 w23 = 0, λ0 = w01 + w02 = 1, λ1 = w11 + w12 + w13 , λ2 = w21 + w22 + w23 . (14) We rewrite the linear equations of (14) in terms of two dual variables, w01 and w11 . Therefore we have, w13 = w01 , w23 = 1 − w01 , w12 = (α1 β1 − 1)w11 +(α1 −1)w01 , w02 = 1 − w01 , w21 = ((α1 β1 − β1 γ1 − 1)w11 + (α1 − γ1 − α2 + γ2 )w01 + (α2 − γ2 − 1)) /(β2 γ2 + 1 − α2 β2 ), w22 = (α2 − 1)(1 − w01 ) + (α2 β2 − 1)w21 . (15) As we can observe, the linear constraints in (14) can be converted into (15) where there are only two unknowns. Therefore, we may use a simple grid search to find the optimal solution. Note that in order to have a feasible solution in (14) the following must hold, l1 = (α1 − 1)/(1 − α1 β1 ),
l2 = (α1 − γ1 − α2 + γ2 )/(β1 γ1 + 1 − α1 β1 )
l3 = (α2 − γ2 − 1)/(β1 γ1 + 1 − α1 β1 ), l4 = (α1 − 1 − γ1 )(1 − α2 β2 ) + γ2 (1 − β2 ) / (β1 γ1 + 1 − α1 β1 )(1 − α2 β2 ) , l5 = γ2 (1 − β2 ) / (β1 γ1 + 1 − α1 β1 )(1 − α2 β2 ) , max 0, l4 w01 + l5 < w11 < min l1 w01 , l2 w01 + l3
(16)
Once the optimal values of dual variables are obtained, we may choose the relationships to determine the optimal primal variables, P1∗ , P2∗ and M ∗ as follows, ∗ ∗ δij = wij /λ∗i
λ∗i = 0, i = 1, 2, j = 1, 2, 3.
(17)
∗ Once we determine the optimal values of δij , we may use (13) in order to find ∗ ∗ ∗ the optimal values of P1 , P2 and M . Next section, we demonstrate the implementation of the proposed method of this paper using a numerical example.
A Price Discrimination Modeling Using Geometric Programming
2.2
579
Numerical Example
In this section we present numerical experience of the implementation of the proposed method. Suppose we have, α1 = 1.5, α2 = 2.0, β1 = 0.01, β2 = 0.02, γ1 = .1, γ2 = .2, u1 = 4, u2 = 5, k1 = k2 = 1000000. This example is solved using the procedure explained in this section. The procedure first finds the optimal solution, π1∗ . In the second phase we find the optimal solution of π2∗ subject to π1 ≥ π1∗ . The optimal weights are calculated to be ∗ ∗ ∗ ∗ ∗ ∗ w01 = 0.99, w02 = 0.01, w11 = 0.3997, w12 = 0.1013, w13 = 0.99, w21 = 0.0103, ∗ ∗ w22 = 9.8 × 10−5 , w23 = 0.01, λ∗1 = 1.491, λ∗2 = 0.0204,
and the optimal solution is summarized as follows, P1∗ = $13.5162, P2∗ = $8.1668, M ∗ = $0.9187, π1∗ = 1.9741 × 105 , π2∗ = 4.6014 × 104 , π ∗ = π1∗ + π2∗ = 2.4343 × 105 . Example (1) demonstrates the implementation of the proposed method of this paper. Since the resulted dual model can be simply run when different input parameters are changed, one can practically use this method in order to determine the optimal production and marketing expenditure in two states. Although, a sensitivity analysis has been extensively used by others [6, 7, 8] for single objective model, we believe that the argument would not be necessarily used as an applicable tool for multi-objective model presented in this paper.
3
Conclusion
In this paper, we have presented a price discrimination model which determines the product’s selling price and marketing expenditure in two markets. We have assumed that the production is a function of price and marketing expenditure in two states and the cost of production is also assumed to be a function of production in both markets. A Multi Objective Decision Making (MODM) method called Lexicograph Approach (LA) has been proposed to find an efficient solution between two objectives for each market. Geometric Programming has been used to solve the resulted model. In our GP implementation, we have used a transformed dual problem in order to reduce the model to an optimization of a nonlinear concave function subject to some linear constraints. The resulted model has been solved using a simple grid line search.
580
S.J. Sadjadi and M. Ziaee
References 1. Beightler, C. S. and D. T. Phillips: Applied geometric programming. New York: Wiley, (1976) 2. Dembo, R. S.: Sensitivity analysis in geometric programming. Journal of Optimization Theory and Applications, 37 (1982) 1–21 3. Duffin, R. J., Peterson, E. L., & Zener: Geometric programming-Theory and application. New York: John Wiley & Sons, (1967) 4. Freeland, J. R.: Coordination strategies for production and marketing in a functionally decentralized firm. AIIE Transactions, 12 (1982) 126–132 5. Hoon, J. and M. K. Cerry: Optimal inventory policies for an economic order quantity model with decreasing cost functions. Eropean Journal of Operational Research, 165 (2005) 108–126 6. Kim, D. and Lee, W. J.: Optimal joint pricing and lot sizing with fixed and variable capacity. European Journal of Operational Research, 109 (1998) 212–227 7. Lee, & W. J.: Determining selling price and order quantity by geometric programming, Optimal solution, bounds and sensitivity. Decision Sciences, 24 (1993) 76–87 8. Lee, W. J., and D. Kim: Optimal and heuristic decision strategies for integrated production and marketing planning. Decision Sciences, 24 (1993) 1203–1213 9. Lilien, G. L., Kotler, P. Moorthy, and K. S.: Marketing models. Englewood Cliffs, NJ: Prentice Hall, (1992) 10. Sadjadi S. J., M. Orougee, and M. B. Aryanezhad: Optimal Production and Marketing Planning. Computational Optimization and Applications, 30(2) (2005) 11. Starr M. K.: Managing production and operations. Englewood Cliffs, NJ: Prentice Hall, (1989)
Hybrid Evolutionary Algorithms for the Rectilinear Steiner Tree Problem Using Fitness Estimation* Byounghak Yang Department of Industrial Engineering, Kyungwon University, San 65 Bockjung-dong , Sujung-gu, Seongnam-si, Kyunggi-do, Korea [email protected]
Abstract. The rectilinear Steiner tree problem (RSTP) is to find a minimumlength rectilinear interconnection of a set of terminals in the plane. A key performance measure of the algorithm for the RSTP is the reduction rate that is achieved by the difference between the objective value of the RSTP and that of the minimum spanning tree without Steiner points. We introduced four evolutionary algorithm based upon fitness estimation and hybrid operator. Experimental results show that the quality of solution is improved by the hybrid operator and the calculation time is reduced by the fitness estimation. The best evolutionary algorithm is better than the previously proposed other heuristics. The solution of evolutionary algorithm is 99.4% of the optimal solution.
1 Introduction Given a set V of n terminals in the plane, the rectilinear Steiner tree problem(RSTP) in the rectilinear plane is to find a shortest network, a Steiner minimum tree, interconnecting S. The points in S are called Steiner points. We should find the optimal number of Steiner points and their location on rectilinear plane. It is well-known that the solution to RSTP will be the minimal spanning tree (MST) on some set of points V∪S. Let MST(V) be the cost of MST on set of terminals V. Then the reduction rate R(V) = {MST(V∪S) - MST(V) } / MST(V) is used as performance measure for algorithms on the Steiner tree problem. The RSTP is known to be NP-complete [8]. Polynomial-time algorithm for the optimal solution is unlikely to be known [7]. Warme, Winter and Zachariasen [17] presented exact algorithm for RSTP and showed the average reduction rate is about 11.5% for the optimal solution. Lee, Bose and Hwang [14] presented algorithm similar to the Prim algorithm for the Minimum Spanning tree. Beasley[3] presented standard instances for Steiner tree problem and heuristics algorithms for the RSTP[4]. The best heuristics of RSTP are the Batched Iterated 1-Steiener (BI1S) of Kahng [13] and the Edge-based heuristic of Borah [5] with average reduction rate of 11%. The genetic algorithm for Euclidian Steiner tree problem was presented by Hesser et al. [10] and Barreiros[2], but they didn’t introduce algorithm for RSTP. Julstrom[12] introduced three kinds of genetic algorithm for RSTP. In his research, the genetic algorithm *
This research was supported by the Kyungwon University Research Fund in 2005.
M. Gavrilova et al. (Eds.): ICCSA 2006, LNCS 3982, pp. 581 – 589, 2006. © Springer-Verlag Berlin Heidelberg 2006
582
B. Yang
based on lists of edges was relatively better than the other genetic algorithm. But, reduction rate of genetic algorithms were worse than the other heuristics. Sock and Ahn[15] used genetic algorithm to solve the degree constrained minimum spanning tree. Yang[19] introduced an evolutionary algorithm for RSTP; the 11.0% of reduction rate of the evolutionary algorithm by Yang was almost near that of optimal solution. But, the computation load for evaluating the cost function was quite heavy. Main motivation of this research is to develop another evolutionary algorithm to find near optimal solution within reasonable calculation time.
Fig. 1. A minimum rectilinear spanning tree (a) and a minimum rectilinear Steiner tree (b)
Hanan showed that for any instance, an optimal RST exists in which every Steiner point lies at the intersection of two orthogonal lines that contain terminals [9]. Hanan’s theorem implies that a graph G called the Hanan grid graph is guaranteed to contain an optimal RST. Hanan’s grid graph is constructed as follows: draw a horizontal and vertical line through each terminal. The vertices in graph correspond to the intersections of the lines. For 3 terminals RST problem, we can find optimal Steiner point easily. Let (xi,yi) be the coordinates of the given terminal Ti; France [6] proved that the Steiner point S is located at (xm,ym), where xm and ym are the medians of { xi} and { yi}.
2 Evolutionary Algorithm Evolutionary algorithms are based on models of organic evolution. They model the collective learning process within a population of individuals. The starting population is initialized by randomization or some heuristics method, and evolves toward successively better regions of search space by means of randomized process of recombination, mutation and selection. Each individual is evaluated as the fitness value, and the selection process favors those individuals of higher quality to reproduce more often than worse individuals. The recombination mechanism allows for mixing of parental information while passing it to their dependents, and mutation introduces innovation into the population. Although simplistic from a biologist's viewpoint, these algorithms are sufficiently complex to provide robust and powerful adaptive search mechanisms [1]. In this research, some Steiner points in Hanan’s grid may become an individual, and the individual is evaluated by minimum spanning tree with terminals and Steiner points in the individual.
Hybrid Evolutionary Algorithms for the RSTP Using Fitness Estimation
583
For most evolutionary algorithm, a large number of fitness evaluations are needed and fitness evaluation is not trivial. To reduce the evaluating complexity, we can employ fitness approximation. One of the fitness approximations is evolutionary approximation that fitness evaluations can be shared by estimating the fitness value of the child from the fitness value of their parents [11]. 2.1 General Evolutionary Algorithm By the Hanan’s theorem, the optimal Steiner points should be on the Hanan’s grid. An individual are introduced as represented the location of Steiner point on Hanan’s grid. Each vertical line and horizontal line is named as increasing number. Most left vertical line is the first vertical line, and most bottom horizontal line is the first horizontal line. Let vi be the index of vertical line on Steiner point Si and hi be the index of horizontal line on Steiner point Si then each individual has multiple (vi,hi) which is the location of i-th Steiner point Si on Hanan’s grid. And (xi, yi) is the real location on plane for Steiner point Si, and is calculated from (vi, hi). In this research, individual is an assembly of (vi,hi) of a non-fixed number of Steiner points and number of Steiner points as follows ; {m, (v1,h1) , (v2,h2) ,……, (vm,hm) } where m is the number of Steiner points.
Fig. 2. The individual for evolutionary algorithm
Each individual has different length depending on its number of Steiner points, and represents one Steiner Tree. The fitness of the individual corresponds to the length of the MST that can be constructed by using the original terminals and the Steiner points in individual by Prim algorithm. The recombination or crossover operator exchanges a part of an individual between two individual. We choose some Steiner points in each individual and exchange with other Steiner points in the other individual. We have two parents to crossover as P1={3, (1,3), (2,4), (3,2)} and P2={2, (6,4), (7,8)}. By the random number, we choose the number of exchanging which is less than the number of Steiner points in both parents. Then by the random number, some Steiner points are choose and exchanged. In this case, we choose the 2 as exchanging number. First and third Steiner points are selected in P1, and two Steiner points are selected in P2. Therefore we have two children by performing crossover as follows: O1={3, (6,4), (2,4), (7,8)} and O2={2, (1,3), (3,2)}.
584
B. Yang
Fig. 3. The crossover operator
The mutation operator changes the locations of some selected Steiner points. Let (vi, hi) be the selected location of Steiner point. Then we can have new location (vi +v,hi+h) where v and h are some random integer number. We use the tournament selection. The k-tournament selection method select a single individual by choosing some number k of individuals randomly from the population and selecting the best individual from this group to survive to the next generation. The process is repeated as often as necessary to fill the new population. A common tournament size is k=2. For the 3-terminal RST problem, the median of terminals is the optimal Steiner point. We assume that a median point of some adjacent 3-terminal has high probability to be survived in the optimal Steiner tree. For the convenience to find adjacent 3-terminal, we make a minimum spanning tree for the V and find every directly connected 3 terminals in the minimum spanning tree as like Fig. 4(b). And a Steiner point for these connected 3 terminals is calculated. We make Steiner point for every connected 3 terminals in minimum spanning tree, and put in Steiner points pool as like Fig. 4(c) and use one of candidate Steiner point. We make initial population for the evolutionary algorithm as following procedure: Step 1: Make a minimum spanning tree (MST) for the terminals. Step 2: Choose every connected three terminals in MST. Let them be 3-neighbor terminals (3NT). Step 3: Make a Steiner point for each 3NT and add it in the Steiner points pool. Step 4: Choose some Steiner points from Steiner points pool by random function and make one individual. Repeat Step 4 until all individuals are decided. We make an initial population with 200 individuals.
Fig. 4. A Steiner points pool from minimum spanning tree
Hybrid Evolutionary Algorithms for the RSTP Using Fitness Estimation
585
2.2 Hybrid Operator For searching new solution in evolutionary algorithm, the local search can be used. In this research, an insertion operator, a deletion operator and a moving operator are introduced. Mutation operator and crossover have the location of Steiner point moved to search a solution, but they don’t search new Steiner point. We need to introduce a new Steiner point in current individual to enforce the variability of the solution. For some selected individual, randomly generated Steiner point in Hanan’s grid is inserted in individual and increase the number of Steiner point. On the other hand, some Steiner points in tree are connected with only one or two other node. Those kinds of Steiner point in tree are obviously useless to reduce the tree cost. We introduce deletion operator to delete poor Steiner points which are connected less than 2 other node in Steiner tree. For the Steiner point connected with 3 other node, we know the optimal location of Steiner point as using the results of 3-terminal case. We trace the Steiner point (S) which is connected with exactly 3 other node(A,B,C) and calculate the optimal location(S*) of Steiner point for those 3 node(A,B,C) and move the location of S to S* as a moving operator. 2.3 Estimated Fitness Evaluation We should solve the MST problem to evaluate each the individual. In Fig. 5(a), we have an evaluated parent individual. By some evolutionary operator as like mutation or crossover, some Steiner points are moved as like Fig. 5(b). As we know the tree structure of its parent, we can let the tree structure of child be that of parent. Then the tree cost of child is calculated simply by updating the cost of arcs which is directly connected to the moved Steiner points as like Fig. 5(c). If we solve the MST algorithm, then we get the MST as like Fig. 5(d) which has exact or minimum tree cost of child. Therefore the tree cost of Fig.5(c) is the estimation of child fitness. The difference between estimated value and exact value is increasing while the evaluation is reestimated. Therefore we perform exact evaluation for all individuals to refresh the value of individuals if there is no improvement in solution for some given number of iterations. We use this kind of estimated evaluation to reduce the evaluation time.
Fig. 5. The estimated evaluation for the child using the tree structure of parent
2.4 Alternative Algorithms We introduced four evolutionary algorithms. First one is the general evolutionary algorithm (EV) without the estimated evaluation and the hybrid operator. Second one
586
B. Yang
is the hybrid evolutionary algorithm (EA_H) which uses the hybrid operator. Third one is the estimated evolutionary algorithm (EA_E) which uses the estimated evaluation. And last one is the hybrid estimated evolutionary algorithm (EA_HE) which uses the both estimated evaluation and hybrid operator.
3 Computational Experience The computational study was made on Pentium IV processor. The evolutionary algorithm was programmed in Visual Studio. Problem instances are from OR-Library [3], 15 instances for each problem size 10, 20… 100, 250, 500, 1000. First test is to compare the reduction rate and calculation time of four evolutionary algorithms. In Fig. 6(a), the reduction rates of EA_H and EA_HE are better than those of EA and EA_E in every size of problems. Therefore it is better to use the hybrid operator for finding good solution. In Fig. 6(b), the calculation times of EA_E and EA_HE are smaller than those of EA and EA_H. The average calculation time of EA_HE was 24% of that of EA_H. Therefore the estimated evaluation is acceptable for reducing the calculation time. As considering the both of the reduction rate and the calculation time, the EA_HE is the best algorithm in these four evolutionary algorithms. Second test is to compare EA_HE and EA_H with other heuristics and optimal solutions. For the RST problem, Beasley has solved same instances as our ones [4]. The optimal solutions of same instances are shown by Warme[17]. The result of Borah’s heuristic [5] and Kahng’s one [13] and Julstrom’s genetic algorithm [12] are shown in their papers. Those results are in Table1. In Table1, the evolutionary algorithms introduced in this paper give larger percentage reduction (about 11.1%) in most instances than the other heuristics. Table 2 shows the tree cost of solution and the % gap which is gap of
Fig. 6. The reduction rate and the calculation time of EA, EA_H, EA_E and EA_HE
Hybrid Evolutionary Algorithms for the RSTP Using Fitness Estimation
587
tree cost between EA_HE, EV_H and optimal solution. The solution of evolutionary algorithm is 99.4% of the optimal solution. In order to compare the evolutionary algorithm with optimal solutions, the 46 test problem were solved by the evolutionary algorithm. The 46 test problems were given by Soukup and Chow [16] and their optimal solution are shown in Warme’s web site [18]. Table 3 shows the Steiner tree cost of optimal solutions, and those of the EA_H and EA_HE. The evolutionary algorithms found every optimal solution except only two instances which are the instances 43 and 44 in Table 3. Table 1. The reduction rate of EA_HE, EA_H, Beasley’s, Kahng’s, Borah’s and the optimal solution n 10 20 30 40 50 60 70 80 90 100 250 500 1000 mean
EA_HE 10.625 11.711 11.306 10.724 10.655 11.736 11.099 11.142 11.186 11.445 11.113 10.706 10.483 11.072
EA_H 10.611 11.721 11.356 10.807 10.705 11.755 11.163 11.081 11.203 11.501 11.090 10.706 10.414 11.086
Beasley 9.947 10.590 10.250 9.956 9.522 10.146 9.779 9.831 10.128 10.139 9.964 9.879 9.888 10.001
Kahng 10.36 10.44 * * 10.71 * * * * 10.89 10.88 * * 10.656
Borah 10.33 10.4 * * 10.71 * * * * 10.84 10.88 10.94 11.04 10.734
Julstrom * * * * 9.167 * 2.8 * 4.06 0.72 * * * 4.187
Optimal 10.656 11.798 11.552 10.913 10.867 11.862 11.387 11.301 11.457 11.720 11.646 11.631 11.653 11.419
* : They didn’t show the results Table 2. The tree cost of EA_HE, EA_H and the optimal solution n 10 20 30 40 50 60 70 80 90 100 250 500 1000 mean
EA_HE 2.21948194 3.296306033 4.12676294 4.9060314 5.375625147 5.71908182 6.266763493 6.76866624 7.12752516 7.543638527 11.68976045 16.36012215 23.29998097 8.053826636
EA_H 2.219786293 3.295892153 4.124317633 4.901290273 5.372616413 5.717780633 6.262297327 6.773256053 7.126108713 7.53888388 11.69272585 16.34098039 23.30472768 8.051589484
Optimal 2.218712547 3.293057227 4.11490786 4.895676353 5.36284096 5.710776713 6.246542193 6.756550107 7.105630147 7.520270667 11.61949869 16.19058313 22.99475535 8.002292458
%gap of EA_HE %gap of EA_H 0.034677468 0.048395033 0.098656247 0.086087987 0.28810074 0.228675189 0.211514118 0.114670979 0.238384594 0.182281246 0.145428671 0.122643913 0.323719898 0.252221675 0.179324258 0.247255573 0.308136124 0.288201978 0.310731635 0.247507226 0.604688336 0.630209288 1.047145854 0.928918076 1.327370562 1.348013153 0.393682962 0.363467794
588
B. Yang Table 3. The comparison of the tree costs of Souckup and Chow’s problems
Instances
n
EA_HE
EA_H
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46
5 6 7 8 6 12 12 12 7 6 6 9 9 12 14 3 10 62 14 3 5 4 4 4 3 3 4 4 3 12 14 19 18 19 18 4 8 14 14 10 20 15 16 17 19 16
1.87 1.64 2.36 2.54 2.26 2.42 2.48 2.36 1.64 1.77 1.44 1.8 1.5 2.6 1.48 1.6 2 4.04 1.88 1.12 1.92 0.63 0.65 0.3 0.23 0.15 1.33 0.24 2 1.1 2.59 3.13 2.68 2.44 1.51 0.9 0.9 1.66 1.66 1.55 2.24 1.53 2.57* 2.52 2.2 1.5
1.87 1.64 2.36 2.54 2.26 2.42 2.48 2.36 1.64 1.77 1.44 1.8 1.5 2.6 1.48 1.6 2 4.04 1.88 1.12 1.92 0.63 0.65 0.3 0.23 0.15 1.33 0.24 2 1.1 2.59 3.13 2.68 2.43 1.51 0.9 0.9 1.66 1.66 1.55 2.24 1.53 2.57* 2.54* 2.2 1.5
Optimal solution 1.87 1.64 2.36 2.54 2.26 2.42 2.48 2.36 1.64 1.77 1.44 1.8 1.5 2.6 1.48 1.6 2 4.04 1.88 1.12 1.92 0.63 0.65 0.3 0.23 0.15 1.33 0.24 2 1.1 2.59 3.13 2.68 2.41 1.51 0.9 0.9 1.66 1.66 1.55 2.24 1.53 2.55 2.52 2.2 1.5
4 Conclusion In this paper, we introduced a hybrid operator and an estimated evaluation. Four evolutionary algorithms for rectilinear Steiner tree problem are suggested. Computational
Hybrid Evolutionary Algorithms for the RSTP Using Fitness Estimation
589
results showed that the hybrid operator improves the value of solution. The reduction rate of EA_HE and EA_H is about 11.1% which is similar to 11.4% of the optimal solutions. The estimated fitness evaluation reduces the calculation time by 76% without loosing the quality of solution.
References 1. Bäck, Thomas : Evolutionary Algorithm in Theory and Practice, Oxford University Press (1996) 2. Barreiros, Jorge, : An Hierarchic Genetic Algorithm for Computing (near) Optimal Euclidean Stein Steiner Trees. Workshop on Application of hybrid Evolutionary Algorithms to NP-Complete Problems, Chicago, 2003 3. Beasley, J. E.: OR-Library: distributing test problems by electronic mail. Journal of the Operational Research Society 41 (1990) 1069-1072 4. Beasley, J. E: A heuristic for Euclidean and rectilinear Steiner problems. European Journal of Operational Research 58 (1992) 284-292 5. Borah, M., Owens, R. M. : An Edge-Based Heuristic for Steiner Routing. IEEE Trans. on Computer Aided Design 13(1994) 1563-1568 6. France, R.L.: A note on the optimum location of new machines in existing plant layouts, J. Industrial Engineering 14(1963) 57-59 7. Ganley, Joseph L.: Computing optimal rectilinear Steiner trees: A survey and experimental evaluation. Discrete Applied Mathematics 90 (1999) 161-171 8. Garey, M. R., Johnson, D. S.: The rectilinear Steiner tree problem is NP-complete. SIAM Journal on Applied Mathematics 32(1977) 826-834 9. Hanan, M.: On Steiner’s problem with rectilinear distance. SLAM Journal on Applied Mathematics 14. (1966) 255-265 10. Hesser, J., Manner, R., Stucky,O. : Optimization of Steiner Trees using Genetic Algorithms, Proceedings of the Third International Conference on Genetic Algorithm (1989) 231-236 11. Jin, Y.: A Survey on fitness Approximation in Evolutionary Computation. Journal of Soft Computing 9(2005) 3-12 th 12. Julstrom, B.A. : Encoding Rectilinear Trees as Lists of Edges. Proceedings of the 16 ACM Symposium on Applied Computing (2001) 356-360 13. Kahng, A. B., Robins, B. : A New Class of Iterative Steiner Tree Heuristics with Good Performance. IEEE Trans. on Computer Aided Design 11(1992) 893-902 14. Lee, J. L., Bose, N. K., Hwang, F. K.: Use of Steiner's problem in suboptimal routing in rectilinear metric. IEEE Transaction son Circuits and Systems 23(1976) 470-476 15. Sock, S.M., Ahn, B.H. : A New Tree Representation for Evolutionary Algorithms. Journal of the Korean Institute of Industrial Engineers 31(2005) 10-19 16. Soukup, J., Chow, W.F.: Set of test problems for the minimum length connection networks. ACM/SIGMAP Newsletter 15(1973) 48-51 17. Warme, D.M., Winter, P., Zachariasen, M. : Exact Algorithms for Plane Steiner Tree Problems : A Computational Study In: D.Z. Du, J.M. Smith and J.H. Rubinstein (eds.): Advances in Steiner Tree, Kluser Academic Publishers (1998) 18. Warme, D.M. : Http://www.group-w-inc.com/~warme/research 19. Yang, B.H. : An Evolution Algorithm for the Rectilinear Steiner Tree Problem. Lecture Notes in Computer Science 3483(2005) 241-249
Data Reduction for Instance-Based Learning Using Entropy-Based Partitioning Seung-Hyun Son and Jae-Yearn Kim Department of Industrial Engineering, Hanyang University, 17 Haengdang-Dong, Sungdong-Ku, Seoul, 133-791, South Korea [email protected], [email protected]
Abstract. Instance-based learning methods such as the nearest neighbor classifier have proven to perform well in pattern classification in several fields. Despite their high classification accuracy, they suffer from a high storage requirement, computational cost, and sensitivity to noise. In this paper, we present a data reduction method for instance-based learning, based on entropy-based partitioning and representative instances. Experimental results show that the new algorithm achieves a high data reduction rate as well as classification accuracy.
1
Introduction
As competition among corporations intensifies and awareness of the importance of information grows, data mining methods that can extract useful information from large amounts of data are receiving increased interest. Information discovered through data mining can facilitate informed decision-making. Among data mining’s several methods, classification techniques create models that distinguish data classes. A model is used to predict the class of objects whose class label is unknown. For example, a classification model may be built to categorize bank loan applications as either safe or risky, and used in customer confidence assessments by credit card companies, as well as in many marketing fields. This paper presents a data reduction method for instance-based learning that is designed to improve classification accuracy through data reduction using entropy-based partitioning and representative instances. Also, through this method, the original data set is purged of irrelevant attributes and the number of instances is decreased. Several data reduction methods for classification purposes have been proposed. Liu, Hussain, Tan, and Dash introduce a data reduction method that differentiates between attribute values [1], and Cano, Herrera, and Lozano introduce a combination of stratification and evolutionary algorithms [2]. Datta and kibler introduced the prototype learner, which finds representative instances in each partition after dividing by the attribute value of each class [3]. They proposed a symbolic nearest mean classifier, which uses k-means clustering to group instances of the same class [4]. Wai Lam’s prototype generation filtering (PGF) M. Gavrilova et al. (Eds.): ICCSA 2006, LNCS 3982, pp. 590–599, 2006. c Springer-Verlag Berlin Heidelberg 2006
Data Reduction for Instance-Based Learning
591
algorithm operates by combining nearest instances and preferentially calculating the distance of each instance [5]. J.S. Sanchez’s reduction by space partitioning (RSP) algorithms divide the training set into several subsets based on its diameter [6]. The diameter of a set is defined as the distance between its two farthest instances. Most existing research methods use clustering techniques to create several partitions that maintain the homogeneity of the data [3, 4, 5, 6]. These clustering techniques calculate the distances of all instances repeatedly and create clusters. However, if the size of data increases, the computing time increases greatly. Also, these methods consider all attributes, including those that are irrelevant, which further increases computing time. This paper presents an algorithm that accelerates partitioning as compared to existing methods and can remove irrelevant attributes. In addition, this new method can find the representative instances of each partition more quickly. Section 2 of this paper introduces instance-based learning and measures of entropy and distance; Section 3 describes the procedure used by the proposed algorithm; Section 4 applies an example to explain the proposed algorithm; Section 5 presents experimental results; and finally Section 6 presents conclusions.
2
Preliminaries
This section introduces instance-based learning and measures for finding data partitioning and center instances. 2.1
Instance-Based Learning
The instance-based learning is a machine learning technique that has proven to be successful over a wide range of classification problems. The instance-based knowledge representation uses the instances themselves to represent what is learned, rather than inferring a rule set or decision tree and storing it instead. The nearest neighbor algorithm is one of the most widely studied examples of instance-based learning methods [7, 8, 9]. This algorithm retains all of the training set and classifies unseen cases by finding the class labels of instances that are closest to them. It learns very quickly, because it only needs to read a training set without much further processing, and it generalizes accurately for many applications. Despite its high classification accuracy, however, it has a relatively high storage requirement and because it must search through all instances to classify unseen cases, it is slow to perform classification. Data reduction for instance-based learning can be used to obtain a reduced representation of the data, while minimizing the loss of information content. 2.2
Entropy Measure
Let S be a set consisting of s data instances. Suppose the class label attribute has m distinct values defining m distinct classes, Ci (for i = 1, . . . , m). Let si
592
S.-H. Son and J.-Y. Kim
be the number of instances of S in class Ci . The expected information needed to classify a given instance is given by I(s1 , s2 , . . . , sm ) = −
m
pi log2 (pi ) ,
(1)
i=1
where pi is the probability that an arbitrary instance belongs to class Ci and is estimated by ssi [10]. Let attribute A1 have v distinct values, {a1 , a2 , . . . , av }. Attribute A1 can be used to partition S into v subsets,{S1 , S2 , . . . , Sv }, where Sj contains those instances in S that have value aj of A1. Let sij be the number of instances of class Ci in a subset Sj . The entropy based on the partitioning into subsets by A1, is given by E(A1) = s
v s1j + . . . + smj I(s1j , . . . , smj ) . s j=1
(2)
+...+s
The term 1j s mj acts as the weight of the jth subset and is the number of instances in the subset divided by the total number of instances in S. The smaller the entropy value is, the greater the purity of the subset partitions. 2.3
Distance Measure
The distance measure used to find the center instance in each partition is the Euclidean distance(ED) and is given by a ED(x, y) = d(xi , yi )2 , (3) i=1
where x and y are two instances, a is the number of attributes, and xi refers to the ith attribute value, for instance x [10]. For numerical attributes, d(xi , yi ) is defined as their absolute difference (i.e., |xi − yi |). For categorical attributes, the distance between two values is typically given by d(xi , yi ) = 0 if xi = yi , and 1 otherwie .
(4)
The center instance is based on the sum of the distances between instances in each partition, i.e., the center instance xith of each partition is given by min{
n k=1
ED(x1st , y kth ),
n k=1
ED(x2nd , y kth ), . . . ,
n
ED(xnth , y kth )} ,
(5)
k=1
where n is the number of instances in each partition, xith and y kth are the ith instance and the kth instance, respectively. The center instance is decided based on the least Euclidean distance measure. These formulas are used at the step that locates the center instance.
Data Reduction for Instance-Based Learning
593
Fig. 1. Proposed algorithm process
3
Proposed Algorithm
The proposed algorithm consists of parts that seek the data partition and the representative instances. First, we calculate the entropy of each attribute. The data set is segmented preferentially via the attribute that has the smallest entropy. Using this method, data having homogeneity are gathered in the same partition, and each partition has the characteristics of the original data set. Second, we locate the representative instances using Euclidean distance. The representative instances consist of instances that represent the characteristics of each partition. The procedure used by the proposed algorithm is as follows: Steps 1–4 : Data partitioning Step 1. Calculate the entropy of all attributes using Equations (1) and (2). Select the attribute that has the lowest entropy. Partition the data set via the attribute’s values. Step 2. In each partition, calculate the entropy of the remaining attributes and redivide the partition via the attribute that has the smallest entropy value in each partition. Step 3. This partitioning process continues until all partitions are pure, meaning all the class values are the same, or no further partitioning is possible. Step 4. Several partition sets are composed. Steps 5–8 : Finding the representative instances Step 5. Find the center instance of each partition. The center instance is determined by using Equations (3), (4), and (5). In this case, not all attributes are considered to find the center instance; attributes that are used once at each partition can be ignored because they have the same value in each partition set, and attributes that are not used at the data partitioning stage are regarded as irrelevant attributes and are thus purged. Step 6. In each partition, find the k nearest instances to the center instance; k is proportional to the number of instances in each partition. Step 7. In each partition, find the representative instances. The representative instances consist of the union of the center instance and the k nearest instances to the center instance.
594
S.-H. Son and J.-Y. Kim
Fig. 2. Example data set
Step 8. The reduced data set consists of the representative instances in each partition, and is used for the instance-based learning. The proposed algorithm process is shown in Figure 1.
4
Example
The example data set consists of 16 instances, 6 attributes, and 1 class, and is presented in Figure 2. Steps 1–4 : Data partitioning Step 1. Calculate the entropy of all attributes. To calculate the expected information of attribute A1, we use Equation (1) as follows: For A1 = 0: s11 = 6,s21 = 2, I(s11 , s21 ) = − 86 log2 68 − 28 log2 28 = 0.31 + 0.50 = 0.81 For A1 = 1: s12 = 3,s22 = 5, I(s12 , s22 ) = − 83 log2 38 − 58 log2 58 = 0.53 + 0.42 = 0.95 Using Equation (2), the entropy needed to classify a given sample if the samples are partitioned according to A1 is E(A1) =
8 16 I(s11 , s21 )
+
8 16 I(s12 , s22 )
8 8 = ( 16 ) × 0.81 + ( 16 ) × 0.95 = 0.88.
Similarly, we can compute E(A2) = 0.88, E(A3) = 0.88, E(A4) = 0.88, E(A5) = 0.82, and E(A6) = 0.60. Since A6 has the lowest entropy among the attributes, it is selected as the first attribute to use for partitioning. Next, divide the example data set by the values of attribute A6. The partitions are P 1 and P 2. After step 1, the result of partitioning is shown in Figure 3(a). The attribute values considered in the partition are marked in gray.
Data Reduction for Instance-Based Learning
Fig. 3. (a) Partition sets after step1
595
(b) Partition sets after step 2
Fig. 4. Partition sets after data partitioning
Step 2. Partitions P 1 and P 2 are created. For each partition, calculate the entropy of the attributes that are not considered. Because the entropy of attribute A5 is the smallest in partition P 1, partition P 1 is divided using attribute A5. Partition P 2 is divided using attribute A1 for the same reason. The partitioning results after step 2 are shown in Figure 3(b). Partitions P 11 and P 12 are formed in partition P 1, and partitions P 21 and P 22 are formed in partition P 2. Step 3. This partitioning process can continue until all partitions are pure or no further partitioning can be done. Step 4. : Seven partition sets are composed in example: {P 11, P 121, P 1221, P 1222, P 21, P 221, and P 222}. The results after the data partitioning are shown in Figure 4. Steps 5–8 : Finding the representative instances Step 5. Find the center instance of each partition using Equation (5).
596
S.-H. Son and J.-Y. Kim
When finding the center instance for partition P 11, only attributes A1 and A2 are used. Therefore, this method has the advantage of being able to locate the center instance faster, using only some and not all attributes. The center instance is decided based on the least Euclidean distance measure. For example, if we calculate the sum of the distances between instances for all instances in partition P 11, we get the following: The√sum √ of the√distances #4 and the remaining instances = √ with instance √ 1 + 1 + 2 + 2 = 2 + 2 2, The√sum √ of the√distances instance #8 and the remaining instances = √ with √ 1 + 2 + 1 + 1 = 3 + 2, The√sum √ of the√distances instance #12 and the remaining instances = √ with √ 1 + 2 + 1 + 1 = 3 + 2, The√sum √ of the√distances instance #13 and the remaining instances = √ with √ 2 + 1 + 1 + 0 = 2 + 2, The√sum √ of the√distances instance #16 and the remaining instances = √ with √ 2 + 1 + 1 + 0 = 2 + 2. Therefore, the center instance in partition P 11 becomes instance #13 or instance #16. Step 6. In each partition, find the k instances nearest to the center instance. k is predefined at the data partitioning stage. For example, one instance is selected in partition P 11, and also one is selected in P 21. And no one instance is selected in the remaining partitions because they have only one or two instances. Step 7. Find the representative instances in each partition. For example, two instances are selected in partition P 11: one center instance and one nearest instance. The result is the same in Partition P 21. Only one center instance is selected as the representative instance in the remaining partitions. Step 8. The reduced data set consists of the representative instances in each partition. For example, the final reduced data set is shown in Figure 5.
Fig. 5. Final reduced data set
5
Experimental Results
This data reduction algorithm was empirically compared with the k-nearest neighbor(k-nn) algorithm and Wai Lam’s PGF algorithm. Six data sets from
Data Reduction for Instance-Based Learning
597
Table 1. Data sets Data set Zoo Audiology Vote Soybean Credit Mushroom
Number of instances Number of attributes 101 226 435 683 690 8124
16 63 17 36 5 22
the widely used UCI Database Repository were tested in the experiments [11]. The data sets used in the experiments are shown in Table 1. Experiments were performed on a PC with Pentium IV 3.0 GHz CPU and 512 MB of RAM. The implementation was done using Visual C++. For each data set, ten-fold crossvalidation was used to estimate the average classification accuracy [10]. For each data set, we randomly partitioned the data into ten mutually exclusive subsets, S1 , S2 , . . . , S10 , each of approximately equal size. Training and testing were performed 10 times. The classifier of the first iteration was trained on subsets S2 , S3 , . . . , S10 and tested on S1 , and the classifier of the second iteration was trained on subsets S1 , S3 , . . . , S10 and tested on S2 , and so on. The classification accuracy estimate is the overall number of correct classifications from the ten iterations, divided by the total number of instances in the data set. The data reduction rate is the product of the number of instances and the number of attributes in the reduced data set, divided by the product of the number of instances and the number of attributes in the original data set. Note that higher classification accuracy and a better data reduction rate imply better performance. The detailed performance of each algorithm for each individual data set can be found in Table 2. Symbols “—” indicate that k-nn algorithm retains 100% of the original size (i.e., data reduction rate is 0%). The proposed algorithm removes 93.64% of the original size on average and the average accuracy is 89.42%. In the Zoo data set, using the proposed algorithm, irrelevant 7 attributes among 16 attributes were removed, and the data set consisted of 14 representative instances. The data reduction rate of the Zoo data set is shown in Figure 6.
Fig. 6. Data reduction rate of the Zoo data set
598
S.-H. Son and J.-Y. Kim
Table 2. Classification accuracy(Accuracy) and data reduction rate(Size) for each data set Data set Zoo Audiology Vote Soybean Credit Mushroom
Average
k-nn
PGF
Proposed Algorithm
Accuracy Size Accuracy Size Accuracy Size Accuracy Size Accuracy Size Accuracy Size
97.00 — 76.10 — 95.50 — 90.80 — 80.70 — 99.90 —
90.00 91.10 67.20 87.00 92.60 93.90 89.10 87.90 84,50 97.70 99.60 99.10
92.10 92.20 78.50 90.10 92.80 94.00 90.50 88.35 86.70 97.80 95.50 99.40
Accuracy Size
89.67 —
87.17 92.78
89.42 93.64
Also, 2 − 7 attributes among 16 attributes were used to find the representative instances in each partition, and only 3.8 attributes were used on average. In the Audiology, Vote, and Soybean data set, our proposed algorithm resulted in greater accuracy in classification and better reduction than the PGF algorithm. In particular, the proposed algorithm reduced 63 attributes to 30 in the Audiology data set. In the Credit data set, the data reduction rate was high (97.8%). An initial 690 instances were reduced to 15 instances. In the Mushroom data set, many attributes were regarded as irrelevant and removed. An initial 8124 instances were reduced to 178 instances, and 22 attributes were pared to 6 attributes. When the proposed algorithm was compared with the k-nn algorithm, classification accuracy demonstrated similar results. In the case of the credit data set, the classification accuracy of the proposed algorithm was higher (86.7%). Also, when the proposed algorithm was compared with the PGF algorithm, the classification accuracy and data reduction rate were high in most cases. Through entropy-based partitioning that computing time is less than agglomerative hierarchical clustering method like PGF algorithm, data reduction was also achieved more quickly.
6
Conclusions
We have presented a new data reduction method for instance-based learning that integrates the strength of instance partitioning and attribute selection. Reducing the amount of data for instance-based learning reduces data storage
Data Reduction for Instance-Based Learning
599
requirements, lowers computational costs, minimizes noise, and can facilitates a more rapid search. Using the proposed algorithm, the initial data set is segmented into several partitions. Each partition is divided continuously based on entropy, and this partitioning process can continue until all partitions are pure or no further partitioning can be done. Finally, the homogeneity of instances in the same partition can be maintained. After dividing the initial data set, the attributes that are not used in the data partitioning stage are regarded as irrelevant and removed. Because irrelevant attributes can be removed, this method can find the representative instances of each partition more quickly than other methods. Experimental results show that the proposed algorithm achieves a high data reduction rate as well as classification accuracy. The proposed algorithm can be employed to preprocess data used for data mining as well as in instance-based learning.
References 1. Liu, H., Hussain, F., Tan, C.L., Dash M.: Discretization: an enabling technique. Data Mining Knowledge Discovery. 6 (2002) 393-423 2. Cano, J.R., Herrera, F., Lozano M.: On the combination of evolutionary algorithms and strafitied strategies for training set selection in data mining. Applied Soft Computing, In Press, Correted Proof, (2005) 3. Datta, P., Kibler, D.: Learning prototypical concept description. Proceedings of the 12th International Conference on Machine Learning. (1995) 158-166 4. Datta, P., Kibler, D.: Symbolic nearest mean classifier. Proceedings of the 14th National Conference of Artificial Intelligence. (1997) 82-87 5. Lam, W., Keung, C.L., Ling C.X.: Learning good prototypes for classification using filtering and abstraction of instances. Pattern Recognition, Vol. 35. (2002) 14911506 6. Sanchez, J.S.: High training set size reduction by space partitioning and prototype abstraction. Pattern Recognition, Vol. 37. (2004) 1561-1564 7. Dasarath, B.V.: Nearest Neighbor Norms : NN Pattern Classification Techniques. IEEE Computer Society Press, Los Alamitos, CA (1991) 8. Wilson, D.R., Martinez, T.R.: Reduction Techniques for instance-based learning algorithms. Mach. Learning. 38 (2000) 257-286 9. Cano, J.R, Herrera, F., Lozano, M.: Using evolutionary algorithms as instance selection for data reduction in kdd: an experimental study. IEEE Transactions on Evolutionary Computation. 7 (6) (2003) 561-575 10. Han, J., Kamber M.: Data Mining : Concepts and Techniques. Morgan Kaufman (2001) 11. Merz, C.J., Murphy, P.M. : UCI Repository of Machine Learning Databases, Internet: http://www.ics.uci.edu/∼mlearn/MLRepository.html
Coordinated Inventory Models with Compensation Policy in a Three Level Supply Chain Jeong Hun Lee and Il Kyeong Moon* Department of Industrial Engineering, Pusan National University, Busan, 609-735, Korea Tel.: +82-51-510-2451; Fax: +82-51-512-7603 {jhlee, ikmoon}@pusan.ac.kr
Abstract. In this paper, we develop inventory models for the three level supply chain (one supplier, one warehouse, and one retailer) and consider the problem of determining the optimal integer multiple n of time interval, time interval between successive setups and orders in the coordinated inventory model. We consider three types of individual models (independent model, retailer’s point of view model, and supplier’s point of view model). The focus of this model is minimization of the coordinated total relevant cost, and then we apply the compensation policy for the benefits and losses to our coordinated inventory model. The optimal solution procedure for the developed model is derived and the effects of the compensation policy on the optimal results are studied with the help of numerical examples.
1 Introduction While SCM is relatively new, the idea of coordinated model is not. The study of multi-echelon inventory/distribution systems began as early as 1960 by Clark and Scarf [5]. Since that time, many researchers have investigated multi-echelon inventory and distribution systems. Many researches have been aimed at coordinated model with two levels, while researchers who studied models with three levels are less. Erengüc et al. [7] point out that though a dominant firm in the supply chain usually tends to optimize locally with no regard to its impact on the other members of the chain, there are cases of such firms capable of fostering more cooperative agreements in the chain. An empirical study on buyer-supplier relationship highlighted the importance of strong linkages for efficient JIT operations [3]. They called for replacing the traditional adversarial roles between buyers and sellers with mutual cooperation. Kang et al. [13] have reviewed past and present supply chain models and then analyzed those in view of environment factors, operations, solution approaches. Goyal [10] presented an integrated inventory model for a single supplier-single customer problem. Banerjee [1] presented a joint economic-lot-size model where a vendor produces to order for a purchaser on a lot-for-lot basis under deterministic conditions. Goyal [11] further generalized Banerjee [1]’s model by relaxing the assumption of the *
Corresponding author.
M. Gavrilova et al. (Eds.): ICCSA 2006, LNCS 3982, pp. 600 – 609, 2006. © Springer-Verlag Berlin Heidelberg 2006
Coordinated Inventory Models with Compensation Policy in a Three Level Supply Chain
601
lot-for-lot policy of the vendor. As a result of using the approach suggested by Goyal [11], significant reduction in inventory cost can be achieved. Several researchers have shown that one partner’s gain may exceed the other partners’ loss in integrated models. Thus, the net benefit can be shared by both parties in some equitable fashion [12]. Eum et al. [8] proposed a new allocation policy considering buyers’ demands using the neural network theory. Douglas and Paul [6] defined three categories of operational coordination (buyer-vendor coordination, production-distribution coordination and inventory-distribution coordination). The value of information sharing among supply chain players has received much attention from researchers [4, 14]. They showed that using the information on the outstanding orders of the products resulted in improvement in system performance in a two-product model. Bourland et al. [2] demonstrated the value of obtaining demand information at the retailers. Gavirneni et al. [9] captured the value of information flow in a two-echelon capacitated model. Recently, Lee et al. [14] addressed the issue of quantifying the benefits of sharing information and identifying the drivers of the magnitude of these drivers. Most of the works in the literature consider two level supply chain. But our work considers three level supply chain (one supplier, one warehouse and one retailer). While they provided the joint analysis of two level supply chain (the supplier and the buyer), our paper aims to study coordinated analysis among the supplier, the warehouse and the retailer, and strategies to encourage coordination among supply chain partners. The rest of this paper is organized as follows. We introduce the problem in Section 2. In Section 3 we optimize for individual models. Section 4 establishes the coordinated model and develops the procedure for the minimization of the total cost in a three level supply chain. Section 5 develops compensation policies for benefits and losses using the developed models. We present numerical examples in Section 6 and conclude in Section 7.
2 Problem Definition We consider a supply chain with three levels. Suppose that a retailer periodically orders some quantity (Q) of an inventory item from a warehouse, while a warehouse periodically orders integer multiple of the retailer's order quantity (n·Q) of item from a supplier. Upon receipt of an order, the supplier produces the integer multiple quantity of the item. But the warehouse ships some quantity (Q) to the retailer during the multiple times (n). In addition to the deterministic conditions, we assume that there are no other warehouses and retailers for this item and the supplier is the sole manufacturer. Fig. 1 shows the inventory time plots for n=3. The retailer’s time interval between successive orders is TR, and the supplier and the warehouse's time interval between successive setups and orders are TS=3TR and TW=3TR, respectively. At the end of time interval in the supplier, it delivers the completed lot to the warehouse. At the beginning of time interval in the warehouse, it directly delivers as the retailer’s order quantity (Q) to the retailer (similar to cross-docking, a process in which product is exchanged between trucks so that each truck going to a retailer store has products from different suppliers). In the remaining time interval, it delivers two more times with the same quantity to the retailer.
602
J.H. Lee and I.K. Moon
Fig. 1. Graphical representation of the model (n = 3)
The following assumptions are made to develop the models: (1) The demand is deterministic and constant. (2) Supplier, warehouse, retailer’s lead-time are either zero or replenishment is instantaneous. (3) The holding cost values are hR > hW. Because the warehouse takes charge of the storage and distribution professionally, the warehouse’s holding cost is less than the retailer. (4) The supplier’s time interval between setups and the warehouse’s time interval between orders are integer (n>1) multiple of the retailer’s time interval between orders. (5) Shortages and backlogs are not allowed. (6) Supplier and retailer’s inventory policies can be described by simple EOQ inventory model. The following notations are used in developing the models: D S AW AR TS TW TR hS hW hR n
: : : : : : : : : : :
annual demand for the item supplier’s setup cost per setup warehouse’s ordering cost per order retailer’s ordering cost per order time interval between successive setups at supplier time interval between successive orders at warehouse time interval between successive orders at retailer supplier’s holding cost per unit per unit time warehouse’s holding cost per unit per unit time retailer’s holding cost per unit per unit time positive integer number(n >1)
Coordinated Inventory Models with Compensation Policy in a Three Level Supply Chain
603
3 Individual Model We consider three types of individual models. Firstly, we formulate an independent individual model. In the independent individual model, manufacturing and ordering policies are independent. Secondly, we develop an individual model from retailer’s point of view. In this model, the retailer is a decision-maker. Therefore, the other parties follow the retailer's ordering policy. For example, department stores decide the ordering policies regardless of the other parties, because they have a power in the marketplace. Thirdly, we consider an individual model from supplier’s point of view. This model is opposite to the retailer's point of view model. Because supplier is a decision-maker, the warehouse and retailer decide ordering policies according to the supplier's decision. For instance, the high-technology products are made by the supplier's decision regardless of the warehouse and retailer's order. Because the supplier and retailer’s inventory policies can be described by simple EOQ, we can easily derive the optimal policies. The results of individual optimization are summarized in Table 1. Table 1. Summary of total costs and individual optimal policies Supplier
Warehouse
S DTS TC S (TS ) = + hS TS 2
Independent
TS* =
A (n − 1) DTW TCW (n, TW ) = W + hW TW 2
n * = 1 , TW* = TS*
2S DhS
TC W ( n * , TW* ) =
TC S (TS* ) = 2 DShS TC S (TS ) =
Retailer’s point of view
Supplier’s point of view
S DTS + hS TS 2
TCW (n, TW ) =
TS* = TR*
AW (n − 1) DTW + hW TW 2
n * = 1 , TW* = TR* * R
TC S (TS* ) =
DT S + h TR* 2
TC S (TS ) =
S DTS + hS TS 2
TS* =
AW TS*
TC W ( n * , TW* ) = TCW (n, TW ) =
2S DhS
TC S (T ) = 2 DShS * S
AW TR*
AW (n − 1) DTW + hW TW 2
n * = 1 , TW* = TS* TC W ( n * , TW* ) =
AW TS*
Retailer A DTR TC R (TR ) = R + hR TR 2
TR* =
2 AR DhR
TC R (TR* ) = 2 DAR hR TC R (TR ) =
TR* =
AR DTR + hR 2 TR
2 AR DhR
TC R (TR* ) = 2 DAR hR TC R (TR ) =
AR DTR + hR TR 2
TR* = TS*
TC R (TR* ) =
AR DTS* + h TS* 2
The stock in the warehouse is depleted according to the demand and supply. If the warehouse is replenished at a time interval of TW and the quantity received can satisfy multiple orders, then the total cost per unit time is given by TCW (n, TW ) =
AW (n − 1) DTW + hW TW 2
(3.1)
604
J.H. Lee and I.K. Moon
The objective is to find the optimal values of n and TW which minimize TCW(n, TW). Since n is a positive integer and TW is a real number, we can optimize the total cost per unit time as given below: For any given n (≥ 1), as the second order derivative of TCW (n, TW) is always positive, the necessary condition for the minimum of TCW (n, TW) is given by ∂ TCW (n − 1) D hW AW = − 2 =0 ∂ TW 2 TW
(3.2)
Solving equation (3.2) we get TW* =
2 AW ( n − 1) D hW
(3.3)
Substituting TW from equation (3.3) into equation (3.1), the total cost per unit time can be found for any given n. It is to be observed that there exists a unique optimal solution (n*, TW* ) as TCW (n, TW) is convex for any given n. TCW (n) = 2(n − 1) DAW hW
(3.4)
Minimizing TCW (n) is equivalent to minimizing (TCW (n)) 2 = 2(n − 1) DAW hW
(3.5)
We define Y(n) which enables our problem to be equivalent to the minimization of Y(n). Y (n) = 2(n − 1) DAW hW
(3.6)
However, Y(n) is a linear increasing function which depends on n. Therefore, the optimal minimum value of n is always 1. It means that the supplier directly delivers order quantity to the retailer. The role of the warehouse is similar to the cross-docking (CD) system. Hence, the warehouse is spending only the ordering cost, and the optimal value of TW is equal to TS* and TR* .
4
Coordinated Model
The relevant total cost of the coordinated model for the supplier, the warehouse and the retailer can be derived by adding the individual total costs per unit time from the previous section. CTC (n, TR ) =
A AR DTR hR (n − 1) DTR hW nDTR hS S + W + + + + 2 2 2 TR nTR nTR
(4.1)
where TW=n·TR and TS=n·TR The optimal values of n and TR can be obtained using the following propositions. Proposition 1: For any given n (≥ 1), the time interval between successive setups and reorders in the coordinated model can be determined uniquely.
Coordinated Inventory Models with Compensation Policy in a Three Level Supply Chain
605
Proof: Differentiating equation (4.1) with respect to TR, we get ∂ CTC D (hR + nhW − hW + nhS ) nAR + AW + S = − ∂ TR 2 nTR2
(4.2)
Differentiating equation (4.2) again with respect to TR, we get ∂ 2 CTC 2(nAR + AW + S ) = > 0, ∀n ≥ 1 nTR3 ∂ TR2
(4.3)
Hence CTC(n, TR) is convex in TR when n is given. Therefore, there exists a unique solution of the equation ∂CTC(n, TR) / ∂TR=0 which yields TR* =
2(nAR + AW + S ) nD{hR + (n − 1)hW + nhS )
(4.4)
Substituting TR* into equation (4.1), we obtain the minimum total cost of the coordinated model as follows: CTC (n) =
2 D{hR − hW + n(hW + hS )}(nAR + AW + S ) n
(4.5)
We can find the optimal value of n using the following proposition. Proposition 2: The optimal value of n satisfies the following inequality. n * (n * − 1) ≤
(hR − hW )( S + AW ) ≤ n * (n * + 1) AR (hS + hW )
Proof: Minimizing CTC(n) is equivalent to minimizing (CTC (n)) 2 = 2 D{(hR − hW )( AR +
S + AW ) + (hS + hW )(nAR + AW + S )} n
(4.6)
After ignoring the terms on the right hand side of equation (4.6) which are independent of n, we define Z(n) as follows: Z ( n) =
(hR − hW )( S + AW ) + nAR (hS + hW ) n
(4.7)
The optimal value of n = n* is obtained when Z (n * ) ≤ Z (n * − 1) and Z (n * ) ≤ Z (n * + 1)
(4.8)
We get the following inequalities from (4.8) (hR − hW )( S + AW ) (h − hW )( S + AW ) + n * AR (hS + hW ) ≤ R + (n * − 1) AR (hS + hW ) and n* n* − 1 (hR − hW )( S + AW ) (h − hW )( S + AW ) + n * AR (hS + hW ) ≤ R + (n * + 1) AR (hS + hW ) n* n* + 1
(4.9)
606
J.H. Lee and I.K. Moon
Accordingly, it follows that 1 (hR − hW )( S + AW ) and n * (n * − 1) 1 AR (hS + hW ) ≥ * * (hR − hW )( S + AW ) n (n + 1) AR (hS + hW ) ≤
(4.10)
The following condition is obtained from equation (4.10): n * (n * − 1) ≤
(hR − hW )( S + AW ) ≤ n * (n * + 1) AR (hS + hW )
(4.11)
< Procedure for finding n* and TR* > Step 1: Determine (hR − hW )(S + AW ) AR (hS + hW )
If n is greater than 1 in inequality (4.11), set n*=n. Otherwise, set n*=1. Go to Step 2. Step 2: Determine the optimal value of TR using equation (4.4).
5
Compensation Policy
Several researchers have shown that one partner’s gain may exceed the other partner’s loss in the integrated model [9, 14]. Thus, the net benefit should be shared among parties (the supplier, the warehouse and the retailer) in some equitable fashion. We propose a compensation policy that shares benefits and losses according to the ratio of individual models’ total cost per unit time. This method extends Goyal [10]’s method to the three level supply chain. Applying Goyal’s method to our coordinated model, we get ZS =
TC S (TS* ) TC S (T ) + TCW (n * , TW* ) + TC R (TR* ) * S
(5.1)
Cost of supplier = Z S ⋅ CTC (n * , TR* )
ZW =
TCW (n * , TW* ) TC S (T ) + TCW (n * , TW* ) + TC R (TR* ) * S
(5.2)
Cost of warehouse = Z W ⋅ CTC (n * , TR* ) ZR =
TC R (TR* ) TC S (T ) + TCW (n * , TW* ) + TC R (TR* ) * S
Cost of retailer = Z R ⋅ CTC (n * , TR* )
Note that ZS+ZW+ZR=1
(5.3)
Coordinated Inventory Models with Compensation Policy in a Three Level Supply Chain
6
607
Numerical Examples
For numerical examples, we use the following data:
D = 10,000 unit/year, S = $400/setup, AW = $200/order, AR = $50/order hS = $3/unit/year, hW = $3/unit/year, hR = $3/unit/year The optimal values of n, TR and total cost for the individual models and the coordinated model are summarized in Table 2. Table 2. Summary of results Independent Supplier’s setup interval Warehouse’s order interval Retailer’s order interval Supplier’s annual cost Warehouse’s annual cost Retailer’s annual cost Total cost
Individual models Retailer’s point of view
Supplier’s point of view
Coordinated model
0.1633 year
0.0447 year
0.1633 year
0.1581
0.1633 (n=1)
0.0447 (n=1)
0.1633 (n=1)
0.1581 (n=3)
0.0447
0.0447
0.1633
0.0527
$4,898.98
$9,615.34
$4,898.98
$4,901.53
$1,224.75
$4,472.17
$1,224.75
$2,319.00
$2,236.07
$2,236.07
$4,388.68
$2,266.30
$8,359.80
$16,323.58
$10,512.41
$9,486.83
Fig. 2 shows that the total cost function CTC(n, TR) is a convex function in n and TR and a typical configuration of the surface.
100000
5
50000
4
CTC(n, TR) 0
3 0.2
0.4
2
0.6
TR
0.8
n
1
Fig. 2. Graphical Representation of CTC(n, TR)
If we coordinate the three level supply chain, we can reduce $6,836.75(approximately 42%) of the total cost against retailer’s point of view model. Therefore, we need to share the benefits. Applying the compensation policy using equations (5.1), (5.2), and (5.3), we get the following results:
608
J.H. Lee and I.K. Moon Z S = 0.5890, Cost of supplier = $5,587.74 Z W = 0.2740, Cost of warehouse = $2,599.39 Z R = 0.1370, Cost of retailer = $1,299.70
Fig. 3 summarizes the process of applying the compensation policy and information sharing.
Fig. 3. Graphical illustration of the solutions
7 Conclusions We developed an inventory model for a three level supply chain (one supplier, one warehouse, and one retailer). We proposed a procedure for determining the optimal value of n and TR for the coordinated model. The compensation policy gives better results than individual models in terms of the total cost per unit time. The total cost per unit time obtained by the coordinated model with compensation policy has been reduced significantly compared to the individual models. We may develop other types of compensation policy (i.e. price quantity discounts policy). In addition, our model can be extended to the case with multiple suppliers, one warehouse, and multiple retailers. . Finally, it must be an interesting extension if one could develop the model by relaxing the assumptions of deterministic demand and lead time.
Acknowledgements This work was supported by the Regional Research Centers Program(Research Center for Logistics Information Technology), granted by the Korean Ministry of Education & Human Resources Development.
References 1. Banerjee, A., A joint economic-lot-size model for purchaser and vendor, Decision Sciences, 17 (1986) 292-311 2. Bourland, K., Powell, S. and Pyke, D., Exploring timely demand information to reduce inventories, European Journal of Operational Research, 92 (1996) 239-253
Coordinated Inventory Models with Compensation Policy in a Three Level Supply Chain
609
3. Chapman, S. N. and Carter, P. L., Supplier/customer inventory relationships under just in time, Decision Science, 21 (1990) 35-51 4. Chopra, S. and Meindl, P., Supply Chain Management: Strategy, Planning and Operation, Prentice-Hall, Inc (2001) 5. Clark, A. J. and Scarf, H., Optimal policies for a multi-echelon inventory problem, Management Science, 6 (1960) 475-490 6. Douglas J. T. and Paul, M. G., Coordinated supply chain management, European Journal of Operational Research, 94 (1996) 1-15 7. Erengüc, S. S., Simpson, N. C. and Vakharia, A. J., Integrated production/distribution planning in supply chains: an invited review, European Journal of Operational Research, 115 (1999) 219-236 8. Eum, S. C., Lee Y. H. and Jung J. W., A producer’s allocation policy considering buyers’ demands in the supply chain, Journal of the Korean Institute of Industrial Engineers, 31 (2005) 210-218 9. Gavirneni, S., Kapuscinski, R. and Tayur, S., Value of information in capacitated supply chains, Management Science, 45 (1999) 16-24 10. Goyal, S. K., An integrated inventory model for a single supplier-single customer problem, International Journal of Production Research, 15 (1976) 107-111 11. Goyal, S. K., A joint economic-lot-size model for purchaser and vendor: a comment, Decision Sciences, 19 (1988) 236-241 12. Goyal, S. K. and Gupta, Y. P., Integrated inventory models: the buyer-vendor coordination, European Journal of Operational Research, 41 (1989) 261-269 13. Kang, K. H., Lee B. K. and Lee Y. H., A review of current status and analysis in supply chain moderling, Journal of the Korean Institute of Industrial Engineers, 30 (2004) 224-240 14. Lee, H. L., So, K. C. and Tang, C. S., The value of information sharing in a two-level supply chain, Management Science, 46 (2000) 626-643
Using Constraint Satisfaction Approach to Solve the Capacity Allocation Problem for Photolithography Area Shu-Hsing Chung1, Chun-Ying Huang1 and Amy Hsin-I Lee2 1
Department of Industrial Engineering and Management, National Chiao Tung University, No. 1001, Ta Hsueh Road, Hsinchu, Taiwan, R.O.C. [email protected], [email protected] 2 Department of Industrial Management, Chung Hua University, No. 707, Sec.2, Wu Fu Road, Hsinchu, Taiwan, R.O.C. [email protected]
Abstract. This paper addresses the capacity allocation problem for photolithography area (CAPPA) under an advanced technology environment. The CAPPA problem has two characteristics: process window and machine dedication. Process window means that a wafer needs to be processed on machines that can satisfy its process capability (process specification). Machine dedication means that after the first critical layer of a wafer lot is being processed on a certain machine, subsequent critical layers of this lot must be processed on the same machine to ensure good quality of final products. A production plan, constructed without considering the above two characteristics, is difficult to execute and to achieve its production targets. Thus, we model the CAPPA problem as a constraint satisfaction problem (CSP), which uses an efficient search algorithm to obtain a feasible solution. Additionally, we propose an upper bound of load unbalance estimation to reduce the search space of CSP for searching an optimal solution. Experimental results show that the proposed model is useful in solving the CAPPA problem in an efficient way.
1 Introduction Due to its diverse characteristics, such as reentry process, time-constrained operation and different batch sizes for machines, wafer fabrication has received a lot of research attention, especially in photolithography process [13, 14, 7, 10]. The photolithography process uses masks to transfer circuit patterns onto a wafer, and the etching process forms tangible circuit patterns onto the wafer chip. With the required number of processes in the photolithography, integrated circuitry products with preset functions are developed on the wafer. As wafer fabrication technology advances from micrometer level to nanometer level, more stringent machine selection restrictions, the so-called process window control and machine dedication control, are imposed on the production management of photolithography area for wafer lots. Process window constraint, also called equipment constraint, is related to the strict limitation to the choice of a machine to process higher-end fabrication technology in the process of a wafer lot to meet increasingly narrower line width, distance between lines, and tolerance limit. In other words, wafer lots could only be processed on M. Gavrilova et al. (Eds.): ICCSA 2006, LNCS 3982, pp. 610 – 620, 2006. © Springer-Verlag Berlin Heidelberg 2006
Using Constraint Satisfaction Approach to Solve the Capacity Allocation Problem
611
machines that meet certain process capability (process recipe or process specification). On the contrary, wafers that only need a lower-end fabrication technology have less stringent machine selection restriction. Due to the difference in adjustable ability among photolithography machines regarding process recipes, functions of various machines in fact vary to a certain extent even though they are grouped in the same workstation. Hence, the situation is that some machines can handle more process capabilities (simultaneously handle higher- and lower-end fabrication technology) while other machines can handle less process capabilities (only handle lower-end fabrication technology). Some related studies are as follows. Leachman and Carmon [9] and Hung and Cheng [5] use linear programming to obtain a production plan for maximizing the profit. Toktay and Uzsoy [12] transform the capacity allocation problem with machines’ capabilities constraint into maximum flow problem. However, only a single product type is considered in the study. Machine dedication constraint considers layer-by-layer process on wafers, in which the circuit patterns in the layers can be correctly connected in order to provide particular functions. If electrical circuits among the layers cannot be aligned and connected, this will cause defective products. The alignment precision provided by different machines varies to a certain extent, even for machines of the same model, due to some differences, which are referred to as machine difference. It has been stipulated that when the first critical operation of a wafer lot is done on a particular machine, the rest of its subsequent critical processes will need to be processed by the very same machine to avoid the increase in defective rate due to machine difference. A related study was done by Akçali et al. [1], in which a study was conducted on the correlation between photolithography process characteristics and production cycle time using a simulation model, and machine dedication policy was set as one of the experiment factors. Experimental results indicate that dedicated assignment policy has a remarkable impact on cycle time. With advanced fabrication technology, the impact of process window and machine dedication constraints on wafer fabrication is increasingly evident. Capacity requirement planning is difficult because wafer fabrication has special characteristics of reentry and long cycle time, and the number of layers of products, required process window, number and distribution of critical layers are different. As a result, the effectiveness of production planning and scheduling system is seriously impacted if the constraints of process window and machine dedication are not considered. Up to now, the CAPPA problem has not been tackled except by Chung et al. [3], in which a mixed integer-linear programming (MILP) model is devised. However, a practical scale-sized problem may take an exponential time. In this research, we adopt the efficient constraint satisfaction approach, which treats load unbalance among machines as one of the constraints and obtain the optimal solution by constantly narrowing down the upper bound of the load unbalance. Because a relatively large amount of settings and long solving process may still be required, a load unbalance estimation to reduce the search space is also applied. In section 2, the MILP model, the constraint satisfaction problem, and the load unbalance estimation are introduced. Section 3 demonstrates the effectiveness of proposed model. Section 4 uses a real-world case to show the applicability of proposed model. In the last section, the research results are summarized.
612
S.-H. Chung, C.-Y. Huang and A.H.-I Lee
2 Model Development Indices: i j k l t r
Index of order number, where i = 1, …, I. Index of layer number, where j = 1, …, Ji. Index of machine number in photolithography area, where k = 1, …, K. Index of process capability number, where l = 1, …, L. Index of planning period, where t = 1, …, T. Index of ranking, where r = 1, …, L.
Parameters: Akl ACkt ALt CLij CRijl CSrt
= 1 if machine k has process capability l; 0, otherwise. Available capacity of machine k in planning period t. Average capacity loading of workstation in planning period t. = 1 if layer j of order i is a critical layer; 0, otherwise. = 1 if layer j of order i has a load on process capability l; 0, otherwise. Cumulative available capacity from the first to the r-th rank process capability. DClt Capacity requirement of process capability l in planning period t. DSlt Ratio of capacity requirement to available capacity of process capability l in planning period t. Ji Number of photolithography operations for order i. LTijt = 1 if layer j of order i has a load in planning period t; 0, otherwise. MLt The maximum loading level among machines in planning period t. pij Processing time of layer j of order i. SQ(r) Function of the processing capacity of the r-th rank. SClt Available capacity of process capability l in planning period t.
Decision Variables: bik u kt+
u kt− xijk 2.1
= 1 if the first critical layer of order i is assigned to machine k; 0, otherwise. Positive difference between utilization rate of machine k and average utilization rate of the entire workstation that machine k belongs to in planning period t. Negative difference between utilization rate of machine k and average utilization rate of the entire workstation that machine k belongs to in planning period t. = 1 if layer j of order i is assigned to machine k; 0, otherwise.
Mixed Integer-Linear Programming Model
To solve the CAPPA problem, we need to know the load occurrence time of photolithography workstation for each layer of each order in the production system. An interview with several semiconductor fabricators found out that X-factor, the ratio of remaining time before delivery to processing time of an order, is used as a reference for controlling the production progress to make sure that the delivery of orders can be accomplished on time (see also [8]). With the information of X-factor, processing time
Using Constraint Satisfaction Approach to Solve the Capacity Allocation Problem
613
of an order, production plan and WIP level, the loading occurrence time of each order (LTijt) can be estimated. A MILP model is constructed as follows: Minimize
∑t =1 ∑k =1 (u kt+ + u kt- ) T
K
(1)
Subject to
∑t ∑k ∑l ∑ j ( xijk Akl CRijl LTijt ) = ∑t ∑l ∑ j (CRijl LTijt ) , for all i ∑ k xijk = 1 , for all i, j ∑t ∑l ∑ j ( xijk CLij CRijl LTijt ) = bik × ∑t ∑l ∑ j (CLij CRijl LTijt ) , for all i, k ukt+ − ukt− = ∑i ∑ j ∑l ( xijk pij Akl CRij LTijt ) ACkt − ALt , for all t, k
(2) (3) (4)
(5)
ukt+ ≤ 1 − ALt , for all t, k
(6)
u ≤ ALt , for all t, k
(7)
kt
xijk ∈ {0,1} , for all i, j, k
bik ∈ {0,1} , for all i, k
(8) (9)
+ kt
(10)
− kt
(11)
u ≥ 0 , for all t, k u ≥ 0 , for all t, k
The objective function (1) is to balance the capacity utilization rates among machines. Constraint (2) ensures that each layer of an order, including new release orders and WIP orders, must be assigned to a machine k if it has a capacity request in this planning horizon. In the machine assignment, process window constraint must be considered. Constraint (3) is to make sure that each layer of an order can only be assigned to a particular single machine. Constraint (4) states the machine dedication control. If the first critical layer of order i is assigned to machine k for process, bik is set to one. Note that the orders in a planning horizon can either be orders planned to release or WIP orders that were released to shop floor in the previous planning horizon. Therefore, bik is a decision variable if the order is a planned-to-release order or a WIP order which its first critical layer has not been decided on a particular machine in previous planning horizon, and is a known parameter if the order is a WIP order which its first critical layer has been decided to process on machine k, but unfinished, in the previous planning horizon. Constraint (5) calculates the difference between the utilization rate of machine k and the average utilization rate of the entire workstation in each period of the planning horizon. The detail definition of ALt is shown as equation (14) and (15) in section 2.3. Constraint (6) and (7) limit the upper value of ukt+ and ukt− , respectively. 2.2 Constraint Satisfaction Problem (CSP)
Constraint satisfaction problem (CSP) searches for a feasible solution which satisfies all constraints under a finite domain of variables. CSP originated from artificial intelligence (AI) in computer science. Through consistency checking techniques, constraint propagation and intelligent search algorithms, CSP has a relatively high solving efficiency in a combinatorial optimization problem, and it has been widely
614
S.-H. Chung, C.-Y. Huang and A.H.-I Lee
applied in many research fields, such as vehicle routing related problem, production scheduling, facility layout and resource allocation [2, 11, 4]. Although CSP algorithm primarily aims to derive a feasible solution, it can be adjusted to search for an optimal solution. A feasible solution is generated by CSP first, then the feasible solution is set as the upper bound of the objective function (for a minimization problem), and such a relationship is treated as a constraint to solve the new CSP. With the continuation of lowering the upper bound of the objective function, the optimal solution can finally be obtained as the solution of the previous CSP when the current CSP can no longer be solved [2]. In other words, by deleting the objective function and solving the constraints part, we could convert the problem into a CSP. If the objective function is added into the constraints by setting its upper bound, an optimal solution for the CAPPA problem can be obtained by constantly reducing the upper bound of the objective function (E), as shown by equation (12).
∑t∑k (ukt+ + ukt− ) ≤ E
(12)
Chung et al. [3] stated that loading balance is a critical factor for maintaining stability of production cycle time. Thus, we believe that the balance of loading among machines is more suitable than the emphasis of the minimization of the sum of the differences among machine utilization rates in a workstation. As a result, constraint (13) replaces constraint (12) in the solving of the CAPPA by CSP. For a more convenient explanation, we refer a CSP to search for a feasible solution as I-CSP model, and a CSP to search for an optimal solution as O-CSP model. u kt+ + u kt− ≤ Et
, for each k, each t
(13)
When CSP is applied to generate an optimal solution for the CAPPA problem (i.e. O-CSP model), the number of iterations for upper bound of load unbalance is difficult to estimate. In consequence, the expectation of a fast-solving and efficient algorithm from CSP may not be attained. Hence, we present a heuristic method to estimate the value of Et (upper bound of load unbalance, UBLU). With a good estimation of the upper bound of load unbalance, the search space of O-CSP can be reduced, and the efficiency and quality of solution from CSP to solve the CAPPA problem can be increased. 2.3 Upper Bound of Load Unbalance (UBLU) Estimation
Conventionally, the average load level (ALt) of a workstation in a planning period is obtained by dividing total load by the number of machines, and the maximum loading level (MLt) among the machines is assumed equal to the average loading level (ALt). This calculation is based on the assumption that all machines are identical, that is, machines have identical process capability. Since the types and amount of process capabilities are not exactly the same in the CAPPA problem, the maximum loading level may not equal to the average loading level. Therefore, we propose a two-phase capacity demand-supply assessment, which includes an independent and a dependent assessment, to estimate the maximum loading level among the machines. The results are utilized as the basis for setting the UBLU in equation (13). The concept is described as follows:
Using Constraint Satisfaction Approach to Solve the Capacity Allocation Problem
615
Phase I. Independent capacity supply assessment The independent assessment examines whether capacity requirement is less than capacity supply for each process capability l. Capacity supply is the sum of the maximum loading level (MLt) of each machine to handle this process capability, and the initial value of MLt is set to be the average capacity loading (ALt) in a workstation. If capacity demand is less than supply, the independent assessment is passed, and we can go to the dependent assessment. Otherwise, the maximum loading level of all machines needs to be raised to satisfy the capacity requirement of process capability l. Phase II. Dependent capacity supply assessment Since Phase I evaluates the capacity demand-supply without considering the fact that machines may possess several process capabilities, this phase uses an iterative calculation to assess whether the overall capacity supply is sufficient based on the maximum loading level obtained from Phase I. First, the ratios (DSlt) of capacity requirement to capacity supply of each process capability are ranked from large to small, and the sequence is SQ(r). Then, whether cumulative capacity requirement is less than cumulative capacity supply is examined according to the ranking of DSlt (that is, SQ(r)). If the answer is affirmative, the capacity supply is sufficient to meet the capacity requirement with the consideration of the types and amount of process capabilities of each machine. Otherwise, further adjustment of the maximum loading level is required to meet the cumulative capacity demand. The maximum loading level among the machines obtained after the two-phase capacity supply assessment is set as a basis for setting the UBLU. Followings are the detail computation steps: Capacity requirement of each process capability Step 1: Calculate capacity requirement (DClt) of process capability in each planning period within the planning horizon. DClt = ∑i ∑ j ( pij CRijl LTijt ) , for each l, each t
(14)
Step 2: Calculate average capacity loading (ALt) of machines in photolithography workstation in each planning period. ALt = ∑l DClt K , for each t
(15)
Phase I. Independent capacity supply assessment Step 1: Set t = 1, l = 1. Step 2: Set MLt = ALt . Step 3: Verify if independent capacity supply of process capability l is sufficient in planning period t. If yes, then go to step 5; else go to step 4. DClt ≤ SClt where SClt = MLt × ∑ k Akl
(16)
616
S.-H. Chung, C.-Y. Huang and A.H.-I Lee
Step 4: Adjust the maximum loading level (MLt) to satisfy capacity requirement of process capability l. MLt = MLt + ( DClt − SClt )
∑k Akl
(17)
Step 5: Check if l = L. If yes, then go to step 6; else let l = l + 1 and go to step 3. Step 6: Check if t = T. If yes, then end of Phase I; else let t = t + 1, l = 1, and go to step 2. Phase II. Dependent capacity supply assessment Step 1: Set t = 1. Step 2: Calculate the ratio (DSlt) of each process capability. DS lt = DClt ( MLt × ∑ k Akl ) , for each l
(18)
Step 3: Rank the values of all DSlt in planning period t from large to small. Use r to represent the rank and SQ(r) to represent the r-th process capability. Step 4: Set r = 1. Step 5: Calculate whether dependent capacity supply is sufficient. If equation (19) is satisfied, then go to step 7; else go to step 6.
∑r '=1 DC SQ ( r '), t ≤ CS rt r
where
{
CS rt = MLt × ∑ k min 1, max
(19)
{∑
r r ' =1
}}
Ak , SQ ( r ') ,0
Step 6: Adjust the maximum loading level (MLt) to satisfy cumulative capacity requirement. Then go to step 7. MLt = MLt + (∑ r ' =1 DCSQ ( r '), t − CS rt ) CS rt r
(20)
Step 7: Check if r = L. If yes, then go to step 8; else let r = r + 1 and go to step 5. Step 8: Check if t = T. If yes, then end of Phase II; else let t = t + 1, and go to step 2. Setting the upper bound of load unbalance (UBLU) After the two-phase assessment above, the upper bound of load unbalance can be set as (MLt - ALt)/ALt. However, considering the processing time for each layer of each product is not identical and is not divisible. It is revised as follows:
{ }
Et = max ⎧⎨( MLt − ALt ) ALt , min pij ∀ i, j ⎩
ALt ⎫⎬ ⎭
(21)
3 Comparisons Among MILP, I-CSP, and O-CSP Models A simple example is presented here to compare the performance between MILP, I-CSP, and O-CSP models. Consider a production environment with three machines, M1, M2 and M3, and each machine possesses different process capabilities as shown in Table 1.
Using Constraint Satisfaction Approach to Solve the Capacity Allocation Problem
617
Table 1. Process capability of machines Machine Process capability No. 1 2 3 1 1* 1 0 2 0 1 1 3 0 1 1 * 1 if the machine has this certain process capability; 0, otherwise.
4 0 0 1
Table 2. Processing time, loading occurrence time, process window constraint and critical operation of orders Order Layer Number No. 1 2 3 4 5 6 7 1 12,1,1,0* 15,1,3,1 19,2,2,0 12,2,3,1 14,3,2,0 2 11,1,1,0 16,1,3,1 18,2,2,0 11,2,3,1 9,3,2,0 15,3,3,1 17,3,1,0 3 13,1,2,0 15,1,3,0 10,2,4,1 13,2,2,0 20,3,4,1 12,3,3,0 4 12,1,2,0 14,2,4,1 14,2,2,0 13,2,4,1 12,3,3,0 15,3,2,0 5 13,1,2,0 13,1,3,0 19,1,4,1 12,2,3,0 13,2,4,1 12,3,2,1 16,3,2,0 * processing time (hr), load occurrence time (week), required process capability, whether a critical operation (1: critical operation; 0: non-critical operation), respectively.
Table 3. Performance of different solving models [k, t] [2, 1] [3, 1] [1, 2] [2, 2] [3, 2] [1, 3] [2, 3] [3, 3] MILP (1) u kt+ 0.0000 0.0000 0.0019 0.0000 0.0019 0.0000 0.0039 0.0000 − 0.03571 (2) u kt 0.0019 0.0079 0.0000 0.0039 0.0000 0.0019 0.0000 0.0019 0.282 (3)3 0.0019 0.0079 0.0019 0.0039 0.0019 0.0019 0.0039 0.0019 (4)4 0.0099 0.0039 0.0039 (5)5 1.6632 0.6552 0.6552 + 0.4623 0.0000 0.0000 0.3591 0.0000 0.0000 0.5634 0.0000 0.0000 I-CSP (1) ukt − 2.7694 (2) ukt 0.0000 0.1865 0.2757 0.0000 0.0634 0.2956 0.0000 0.2817 0.2817 0.00 (3) 0.4623 0.1865 0.2757 0.3591 0.0634 0.2956 0.5634 0.2817 0.2817 (4) 0.4623 0.3591 0.5634 (5) 77.6664 60.3288 94.6512 + 0.0099 0.0000 0.0000 0.0019 0.0000 0.0019 0.0000 0.0039 0.0000 O-CSP (1) ukt 0.0357 (2) u kt− 0.0000 0.0019 0.0079 0.0000 0.0039 0.0000 0.0019 0.0000 0.0019 0.30 (3) 0.0099 0.0019 0.0079 0.0019 0.0039 0.0019 0.0019 0.0039 0.0019 (4) 0.0099 0.0039 0.0039 (5) 1.6632 0.6552 0.6552 1 Objective function value. Notice the objective function value of I-CSP and O-CSP is the sum of ( u kt+ + u kt− ). 2 Computational time (sec). Notice the time of O-CSP is the sum of solving time of the 15 iterations in Table 4. 3, 4, 5 (3)=(1)+(2). (4) is the maximum value of (3) under t. (5)=(4)×available capacity (168 hours/period). Model
[1, 1] 0.0099 0.0000 0.0099
There are five orders to be released, and the information of processing time (hrs), loading occurrence time (week), required process capability and critical layer process are shown in Table 2. The commercial software ILOG OPL 3.5 [6] is utilized to solve the simplified example by three different models: MILP model, I-CSP model and O-CSP model. The results are shown in Table 3. Even though I-CSP model uses the least amount of solving time among the three models, its objective function value (2.7694) and workstation utilization rate difference (maximum difference of 0.5634) are the worst. As an extension of I-CSP model,
618
S.-H. Chung, C.-Y. Huang and A.H.-I Lee Table 4. Computation process of O-CSP in solving the simple example # of Objective UBLU Solving time iterations function value 1 2.7694 0.00 2 0.5634 1.6782 0.02 3 0.3432 1.3051 0.02 : : : : : : : : 6 0.2480 0.8924 0.03 7 0.1865 0.6502 0.02 8 0.1448 0.6146 0.02 : : : : : : : : 14 0.0277 0.1304 0.02 15 0.0218 0.0357 0.05 16 0.0099 no feasible solution + − 1 The maximum value of ( u kt + u kt ).
Maximum difference1 0.5634 0.3432 0.2599 : : 0.1865 0.1448 0.0932 : : 0.0218 0.0099
O-CSP can constantly adjust the UBLU by using equation (13) (see Table 4 for computation process) to eventually derive at an optimal solution (same as the objective function value of MILP model). One drawback of the model is that the required iterations of adjustments are not estimable. With the upper bound of load unbalance estimation introduced in section 2.3, we could set the upper bound of load unbalance to 0.1811. Comparing with the data in Table 4, 0.1811 is the upper bound for the 7th iteration in the O-CSP model. In consequence, the adoption of the setting of UBLU to the O-CSP model can effectively reduce the number of iterations.
4 A Real-World Application To verify the applicability of the proposed model, a real-world case investigated in [3], is examined here. In this wafer fab, there are ten steppers and five different process capabilities. Five types of products, A, B, C, D and E, are manufactured, and each product requires 17, 19, 16, 20 and 19 times of photolithography operations respecttively. The total required photolithography operation time for a product is in a range between 597 to 723 minutes. Product A and B require fabrication technology of 0.17 μm, while Product C, D and E adopt 0.14 μm fabrication technology. Production planning and control department sets the planning horizon to be 28 days and planning period to be 7 days. In the planning horizon, there are 474 lots that are expected to be released. Manufacturing execution system (MES) reveals that there are currently 204 lots of WIP on floor. The CAPPA problem is solved by O-CSP using software ILOG OPL 3.5 [6], and the results are shown in Table 5. Through the upper bound of load unbalance estimation, the CAPPA problem could have a fairly balanced capacity allocation result (i.e. MLt = ALt), and the UBLU is set to 0.0014 (=15/10800, where minimum processing time is 15 minutes and available capacity for machines in a planning period is 10,800 minutes.) Table 5 shows that the objective function value derived from the CAPPA problem is 0.0205 and the required solving time is 475.22 sec. This result is superior than that generated by Chung et al. [3] that the objective function value and required solving time are 0.0291 and 5.3878 hours respectively.
Using Constraint Satisfaction Approach to Solve the Capacity Allocation Problem
619
Table 5. Objective function value, maximum difference and solving time under different UBLU Maximum Solving time difference1 (sec.) 0.9994 125.94 : : : : : : 0.0014 0.0013 475.22 0.0013 0.0010 576.53 0.0010 0.0008 851.02 0.0008 0.0007 903.43 0.0007 0.0006 973.98 0.0006 0.0005 3477.64 0.0005 no feasible solution 1 The maximum value of ( u kt+ + u kt− ). UBLU
-
Objective function value 9.4434 : : 0.0205 0.0167 0.0141 0.0116 0.0107 0.0084
When the UBLU (0.0014) is used as a basis of O-CSP for solving the CAPPA problem, the required number of iterations is only six times. This indicates that the setting of UBLU can effectively reduce the search space of O-CSP. Such a solving process possesses a very good quality and has its application value in real practice.
5 Conclusion In this paper, we consider the capacity allocation problem with the process window and machine dedication constraints that are apparent in wafer fabrication. We model the CAPPA problem as a constraint satisfaction problem (CSP), which uses an efficient search algorithm to obtain a feasible solution. A relatively large amount of setting and calculation process is required in CSP because it treats the objective function as one of the constraints for searching an optimal solution while the bound of objective function is narrowed down through an iterative process. Hence, we propose a method for setting the upper bound of load unbalance among machines, and the search space and the number of computations can be decreased effectively in the CAPPA problem. The result shows that a very good solution can be obtained in a reasonable time and can be a reference for wafer lot release and dispatching of photolithography machines, and the model thus is valuable in real world application.
Acknowledgements This paper is partially support by Grant NSC94-2213-E-009-086.
References 1. Akçali, E., Nemoto, K., Uzsoy, R.: Cycle-Time Improvements for Photolithography Process in Semiconductor Manufacturing. IEEE Transactions on Semiconductor Manufacturing. 14(1) (2001) 48-56. 2. Brailsford, S.C., Potts, C.N., Smith, B.M.: Constraint Satisfaction Problems: Algorithms and Applications, European Journal of Operational Research. 119(3) (1999) 557-581.
620
S.-H. Chung, C.-Y. Huang and A.H.-I Lee
3. Chung, S.H., Huang, C.Y., Lee, A.H.I.: Capacity Allocation Model for Photolithography Workstation with the Constraints of Process Window and Machine Dedication. Production Planning and Control. Accepted. (2006). 4. Freuder, E.C., Wallace, R.J.: Constraint Programming and Large Scale Discrete Optimization. American Mathematical Society. (2001) 5. Hung, Y.F., Cheng, G.J.: Hybrid Capacity Modeling for Alternative Machine Types in Linear Programming Production Planning. IIE Transactions. 34(2) (2002) 157-165. 6. ILOG Inc.: ILOG OPL Studio 3.5. ILOG Inc., France. (2001) 7. Kim, S., Yea, S.H., Kim, B.: Shift Scheduling for Steppers in the Semiconductor Wafer Fabrication Process. IIE Transactions. 34(2) (2002) 167-177. 8. Kishimoto, M., Ozawa, K., Watanabe, K., Martin, D.: Optimized Operations by Extended X-Factor Theory Including Unit Hours Concept. IEEE Transactions on Semiconductor Manufacturing. 14(3) (2001) 187-195. 9. Leachman, R.C., Carmon, T.F.: On Capacity Modeling for Production Planning with Alternative Machine Types. IIE Transactions. 24(4) (1992) 62-72. 10. Lee, Y.H., Park, J., Kim, S.: Experimental Study on Input and Bottleneck Scheduling for a Semiconductor Fabrication Line. IIE Transactions. 34(2) (2002) 179-190. 11. Lustig, I.J., Puget, J.-F.P.: Program Does Not Equal Program: Constraint Programming and Its Relationship to Mathematical Programming. Interfaces. 31(6) (2001) 29-53. 12. Toktay, L.B., Uzsoy, R.: A Capacity Allocation Problem with Integer Side Constraints. European Journal of Operational Research. 109(1) (1998) 170-182. 13. Uzsoy, R., Lee, C.-Y., Martin-Vega, L.A.: A Review of Production Planning and Scheduling Models in the Semiconductor Industry (I): System Characteristics, Performance Evaluation and Production Planning. IIE Transactions. 24(4) (1992) 47-60. 14. Uzsoy, R., Lee, C.-Y., Martin-Vega, L.A.: A Review of Production Planning and Scheduling Models in the Semiconductor Industry (II): Shop-Floor Control. IIE Transactions. 26(5) (1994) 44-55.
Scheduling an R&D Project with Quality-Dependent Time Slots Mario Vanhoucke1,2 1
Faculty of Economics and Business Administration, Ghent University, Gent, Belgium 2 Operations & Technology Management Centre, Vlerick Leuven Gent Management School, Gent, Belgium [email protected]
Abstract. In this paper we introduce the concept of quality-dependent time slots in the project scheduling literature. Quality-dependent time slots refer to pre-defined time windows where certain activities can be executed under ideal circumstances (optimal level of quality). Outside these time windows, there is a loss of quality due to detrimental effects. The purpose is to select a qualitydependent time slot for each activity, resulting in a minimal loss of quality. The contribution of this paper is threefold. First, we show that an R&D project from the bio-technology sector can be transformed to a resource-constrained project scheduling problem (RCPSP). Secondly, we propose an exact search procedure for scheduling this project with the aforementioned quality restrictions. Finally, we test the performance of our procedure on a randomly generated problem set.
1 Introduction In the last decades, the research in resource-constrained project scheduling has been investigated from different angles and under different assumptions. The main focus on project time minimization has shifted towards other objectives (e.g. net present value maximization), extensions such as multi-mode scheduling and/or preemption, and many other facets (for the most recent overview, see [1]). However, the literature on project scheduling algorithms where quality considerations are taken into account is virtually void. [9] maximize the quality in the resource-constrained project scheduling problem by taking the rework time and rework cost into account. They argue that their work is a logical extension of the classical resource-constrained project scheduling efforts. Therefore, they refer to a previous study by the same authors that indicated that over 90% of the project managers take the maximization of the quality of projects and their outcomes as their primal objective. Given that emphasis on quality management and its implementation in project management, and the need to develop new tools and techniques for scheduling decisions, we elaborate on that issue based on a real-life project in the bio-technology sector. The motivation of this research paper lies in the effort we put in the scheduling of a project with genetically manipulated plants. In this project, several activities need to be scheduled in the presence of limited resources and severe quality restrictions. More M. Gavrilova et al. (Eds.): ICCSA 2006, LNCS 3982, pp. 621 – 630, 2006. © Springer-Verlag Berlin Heidelberg 2006
622
M. Vanhoucke
precisely, some activities need to be executed preferably within certain pre-defined periods, referred to as quality-dependent time slots. Although the execution is also possible outside these pre-defined intervals, it is less desirably since it leads to a decrease in quality. The concept of pre-defined time windows for activity execution is not new in the project scheduling literature. [2] criticize the traditional models in which it is assumed that an activity can start at any time after the finishing of all its predecessors. To that purpose, they consider two improvements over the traditional activity networks by including two types of time constraints. Time-window constraints assume that an activity can only start within a specified time interval. Time-schedule constraints assume that an activity can only begin at one of an ordered schedule of beginning times. [15] elaborate on these time constraints and argue that time can be treated as a repeating cycle where each cycle consists of two categories: (i) some pairs of rest and work windows and (ii) a leading number specifying the maximal number of time each pair should iterate. By incorporating these so-called time-switch constraints, activities are forced to start in a specific time interval and to be down in some specified rest interval. Quality-dependent time slots refer to predefined time windows where certain activities can be executed under ideal circumstances (optimal level of quality). Outside these time windows, there is a loss of quality due to detrimental effects. Unlike the time-switch constraints, the qualitydependent time slots allow the execution of the activity outside the pre-defined window leading to an extra cost or decrease in quality. Consequently, the qualitydependent time slots are similar to the time-switch constraints. The latter are hard constraints (execution is only possible within the interval) and the former are soft constraints (execution is preferable within the interval but is also possible outside the interval) that can be violated at a certain penalty cost. Our project settings assume that each activity has several pre-defined qualitydependent time slots, from which one has to be selected. The selection of a time slot must be done before the start of the project (in the planning phase). Given a fixed set of time slots per activity, the target is then to select a time slot and to schedule the project such that the loss in quality will be minimized.
2 Description of the Problem The project under study can be represented by an activity-on-the-node network where the set of activity nodes, N, represents activities and the set of arcs, A, represents finish-start precedence constraints with a time lag of zero. The activities are numbered from the dummy start activity 1 to the dummy end activity n and are topologically ordered, i.e. each successor of an activity has a larger activity number than the activity itself. Each activity has a duration di (1 ≤ i ≤ n) and a number of quality-dependent time windows nr(i). Each window l of activity i (1 ≤ i ≤ n and 1 ≤ l ≤ nr(i)) − + is characterized by a time-interval ⎡q il , q il ⎤ of equal quality, while deviations outside ⎢⎣ ⎥⎦ − + that interval result in a loss of quality. Note that the time slot ⎡q il , q il ⎤ is used to ⎢⎣ ⎥⎦ refer to a window with optimal quality and can be either an interval or a single point-in-time. The quality deviation of each activity i can be computed as
Scheduling an R&D Project with Quality-Dependent Time Slots
623
Qiloss = maxx{q il− − s i ; s i − q il+ ; 0} and depends on the selection of the time window l, with si the starting time of activity i. To that purpose, we need to introduce a binary decision variable in our conceptual model which determines the selection of a specific ⎧1, if time interval l has been selected for activity i time interval for each activity i, yil = ⎨ . ⎩0, otherwise
We use qilopt to denote the minimal activity cost associated with a fixed and optimal level of quality for each time window l of activity i. We use qilextra to denote the loss in quality per time unit deviation from the time interval and consequently, the total n nr (i )
cost of quality equals
∑ ∑ (q
opt il
+ qilextra Qiloss ) yil . Note that nr(0) = nr(n) = 1, since
i =1 l =1
−
+
−
+
nodes 0 and n are dummy activities with q 01 = q 01 and q n1 = q n1 . Moreover, we set extra q 01 = ∞ to force the dummy start activity to start at time instance zero. The project −
+
needs to be finished before a negotiated project deadline δ n , i.e. q n1 = q n1 = δ n . Consequently, setting q nextra = ∞ denotes that the project deadline can not be 1 exceeded (a hard constraint), while q nextra < ∞ means that the project deadline can be 1 exceeded at a certain penalty cost (soft constraint). There are K renewable resources with ak (1 ≤ k ≤ K) as the availability of resource type k and with rik (1 ≤ i ≤ n, 1 ≤ k ≤ K) as the resource requirements of activity i with respect to resource type k. The project with renewable resources and qualitydependent time windows can be conceptually formulated as follows: n nr (i )
Minimize
∑∑ (q
opt il
+ qilextra Qiloss ) yil
(1)
i =1 l =1
Subject to
si + di ≤ s j
∑r
ik
≤ ak
∀(i, j ) ∈ A
(2)
k = 1, …, K and t = 1, …, T
(3)
i∈S (t )
nr (i )
Qiloss ≥
∑q
− il y il
− si
i = 1, …, n
(4)
+ il y il
i = 1, …, n
(5)
l =1
nr (i )
Qiloss ≥ s i −
∑q l =1
nr (i )
∑y
il
l =1
=1
i = 1, …, n
(6)
624
M. Vanhoucke
s1 = 0 si ∈ int yil ∈ bin
+
, Qiloss
∈ int
+
(7)
i = 1, …, n
i = 1, …, n and l = 1, …, nr(i)
(8) (9)
where S(t) denotes the set of activities in progress in period ]t-1,t]. The objective in Eq. 1 minimizes the total quality cost of the project (i.e. the fixed cost within the selected time window plus the extra cost of quality loss due to deviations from that interval). The constraint set given in Eq. 2 maintains the finish-start precedence relations among the activities. Eq. 3 represents the renewable resource constraints and the constraint sets in Eq. 4 and Eq. 5 compute the deviation from the selected time window of each activity. Eq. 6 represents the time window selection and forces to select a single time window for each activity. Eq. 7 forces the dummy start activity to start at time zero and Eq. 8 ensures that the activity starting times as well as the time window deviations assume nonnegative integer values. Eq. 9 ensures that the time window selection variable is a binary (0/1) variable. Remark that the quality loss function measuring the quality decrease due to a deviation from the ideal time window l can be off any form (such as stepwise functions, convex functions, etc…). However, we assume in Eqs. [1]-[9], without loss of generality, a linear quality deviation function. Although our first acquaintance with this problem type was during the scheduling of a genetically manipulated plants project, we believe that there are numerous other examples where pre-defined time-windows need to be selected before the execution of the project. The following four examples illustrate the possible generalization of multiple quality-dependent time windows to other project environments: Perishable Items. The project of this paper, which motivates us to elaborate on this issue, is a typical example where items (i.e. plants) are perishable. Many project activities consist of tests on growing plants where the quality is time-dependent since there is an optimal time interval of consumption. Earlier consumption is possible, at a cost of a loss in quality, since the plants are still in their ripening process. Later consumption results in loss of quality due to detrimental effects. State-of-Nature Dependencies. In many projects, the performance of some activities might depend on the state-of-nature. In this case, a pre-defined set of possible starting times depending on the state-of-nature are linked with possible execution times of the activity, and the deviation from these time windows is less desirable (resulting in higher costs or quality loss) or even completely intolerable. Multiple Activity Milestones. The project scheduling literature with due dates (milestones) has been restricted to considering projects with pre-assigned due dates (see e.g. [11] and [12]). In reality, milestones are the results of negotiations, rather than simply dictated by the client of the project. Therefore, we advocate that due dates, including earliness and tardiness penalty costs for possible deviations, are agreed upon by the client and the contractor (and possibly some subcontractors). This results in a set of possible due dates for each activity, rather than a single pre-defined due date. The objective is then to select a due date for each activity such that the total earliness/tardiness penalty costs will be minimized.
Scheduling an R&D Project with Quality-Dependent Time Slots
625
Time-Dependent Resource Cost. In many projects, the cost of (renewable) resources heavily depends on the time of usage. The aforementioned time-switch constraints are a typical and extreme example of time-dependent resource costs, since it restricts the execution of activities to pre-defined time intervals (work periods) without any possibility to deviate. However, if we allow the activities to deviate from their original work periods (e.g. by adding more (expensive) resources to an activity in the pre-defined rest period), the work periods can be considered as the quality-dependent time slots while the rest periods are periods outside these slots in which the activity can be executed at an additional cost.
3 The Algorithm The problem type under study requires the selection of a quality dependent timewindow from a possible set of windows such that the total quality loss is minimized. A closer look to the problem formulation of (1)-(9) reveals the following observations: When K = 0, i.e. there are no resources with limited capacity, the problem given in (1)-(9) reduces to an unconstrained project scheduling problem with non-regular measures of performance. Due to the special structure of the quality loss functions and the multiple time-windows, the problem reduces to a separable nonconvex programming problem (see “solution algorithm for problem (1)-(2), (4)-(9)” described below). When K > 0, i.e. there is at least one renewable resource with limited capacity, resource conflicts can arise during the scheduling of the project. Therefore, this problem type can be solved to optimality by any branch-and-bound enumeration scheme for project scheduling problems with non-regular measures of performance (see “solution algorithm for problem (1)-(9)” described below).
In this section we describe a double branch-and-bound algorithm for the problem type under study. The first branch-and-bound procedure ignores the renewable resource constraints and searches for an exact solution of the unconstrained project scheduling problem. The second branch-and-bound procedure aims at resolving resource conflicts and needs the previously mentioned solution as an input in every node of the tree. Solution Algorithm for Problem (1)-(2), (4)-(9): The basic idea of this solution approach relies on the approach used by [6] and [7]. Their problem is to find the vector x = (x1, …, xn) which minimizes n
ϕ ( x) = ∑ ϕ i ( xi ) subject to x ∈ G and l ≤ x ≤ L
(10)
i =1
for which it is assumed that G is closed and that each ϕi is lower semi-continuous, possibly nonconvex, on the interval [li, Li]. In their paper, they have presented an algorithm for separable nonconvex programming problems. To that purpose, they solve a sequence of problems in a branch-and-bound approach in which the objective
626
M. Vanhoucke
function is convex. These problems correspond to successive partitions of the feasible set. This approach has been successfully applied for different optimization problems. [10] have shown that this problem type is a special case of irregular starting-time costs project scheduling which can be formulated as a maximum-flow problem and hence, can be solved using any maximum-flow algorithm. Due to the special structure of the convex envelope, we rely on an adapted procedure of [14] developed for an unconstrained project scheduling problem with activity-based cash flows which depend on the time of occurrence. This branch-andbound procedure basically runs as follows. At each node, we calculate a lower bound for the total quality cost qilopt + qil− Qi− + qil+ Qi+ for each activity i. To that purpose, we construct the convex envelope of the total quality cost profile over the whole time window [esi, lfi] for each activity i (esi = earliest start and lfi = latest finish). The convex envelope of a function F = qilopt + qil− Qi− + qil+ Qi+ (l = 1, …, nr(i)) taken over C = [esi, lfi] is defined as the highest convex function which fits below F. If the reported solution is not feasible for the original problem, the algorithm starts to branch. The algorithm calculates two new convex envelopes for these subsets and solves two new problems at each node. Branching continues from the node with the lowest lower bound. If all activities at a particular node of the branchand-bound tree are feasible, then we update the upper bound of the project (initially set to ∞) and explore the second node at that level of the branch-and-bound tree. Backtracking occurs when the calculated lower bound is larger than or equal to the current lower bound. The algorithm stops when we backtrack to the initial level of the branch-and-bound tree and reports the optimal lower bound. This lower bound can be calculated at each node of the branch-and-bound algorithm described hereunder. Solution Algorithm for Problem (1)-(9): In order to take the renewable resource constraints into account (i.e. equation (3)) we rely on a classical branch-and-bound approach that uses the unconstrained solutions as lower bounds at each node of the tree. Since the previous branch-and-bound procedure searches for an optimal solution for the unconstrained project and consequently, ignores the limited availability of the renewable resources, the second branching strategy boils down to resolving resource conflicts. If resource conflicts occur, we need to branch. A resource conflict occurs when there is at least one period ]t - 1, t] for which ∃k ≤ K : rik > a k . To that
∑
i∈S (t )
purpose, we rely on the branch-and-bound approach of [8] for the resourceconstrained project scheduling problem with discounted cash flows. This is an adapted version of the branching scheme developed by [3] for the resourceconstrained project scheduling problem and is further enhanced by [5], [12] and [13]. In order to prune certain nodes of the branch-and-bound tree, we have implemented the so-called subset dominance rule. This dominance rule has originally been developed by [5] and has been applied in the branch-and-bound procedures of [12]
Scheduling an R&D Project with Quality-Dependent Time Slots
[2 2]
[1 5]
[3 7]
[6 2]
2
5
9
11
[(0,2,4)] [(0,10,2)(2,20,2)]
[(0,24,6)]
[5 2]
[10 5]
[8 3]
[9 4]
3
6
10
13
[(0,6,19)(2,9,19)]
1
[(0,18,1)]
[8 6]
4
[(0,15,1)] [9 3]
7
[(0,8,6)]
[(0,17,8)]
[(3,23,6)] [8 4]
15 [(0,30,6)]
[(0,32,2)] [6 1]
627
[di ri1] [6 4]
12
i opt − + extra [ qil , qil = qil , qil ]
[(0,43,8)] [8 2]
19
21
[(0,44,7)]
16 [(0,36,1)]
[3 3]
20 [2 4]
[2 4]
8
14
[(0,10,16)(2,20,16)] [(0,22,2)]
[10 1]
[5 3]
17
18
[(0,22,6)(2,42,6)]
[(0,33,4)] [(0,28,3)(2,38,3)]
Fig. 1. An example project with quality-dependent time slots
and [13]. This dominance rule can be applied when the set of added precedence constraints (to resolve resource conflicts) of a previously examined node in the tree is a subset of the set of precedence constraints of the current node. Example: We illustrate the project scheduling problem with limited resources and quality-dependent time slots by means of an example project of figure 1. The two numbers above each node are used to denote the activity duration di and its requirement ri1 for a single renewable resource with availability a1 = 10. The numbers
below the node are used to denote (qilopt , qil− = qil+ , qilextra ) for each quality-dependent time slot l. For the sake of clarity, we assume that qil− = qil+ , i.e. the quality-dependent time slots are a single point in time. Moreover, we assume qilextra is equal for each interval l. Each activity of the example project belongs to one of the following categories. An activity can be the subject to single (activities 12 and 17) or multiple qualitydependent time slots (activities 3, 5, 8, 18 and 20). Activities can also have the requirement to be scheduled as-soon-as-possible (ASAP; activities 2, 4, 6, 7, 9, 10, 11 and 13) or as-late-as-possible (ALAP; activities 14, 15, 16, 19 and 21). This can be incorporated in the network by adding a single quality-dependent time slot with qi−1 = qi+1 = esi (ASAP) or qi−1 = qi+1 = ls i (ALAP), with esi (lsi) the earliest start (finishing) time of activity i. Deviations from these requirements will be penalized by qiextra per time unit. We assume that the project deadline T equals 44 time units. 1 Figure 2 displays the schedules found by solving the RCPSP without (i.e. minimization of project time) and with the quality-dependent time slots. The activities highlighted in dark grey are the activities that are scheduled at a different time instance between the two schedules. Activities 3, 6, 8, 14, 10, 17, 13, 18, 12 and 20 have been scheduled later than the classical RCPSP schedule, while activities 5 and 7 have been scheduled earlier.
628
M. Vanhoucke
10 10
8 14
8 7
2
9
2
9
3
18
17 7
9
6
15
7
20
3
14
7
6
11
18
17
8
8
20
5
9
15
11
5
5
16
16
4
12
4
4
3 2
19
4
6
8
10
12
14
16
18
20
22
24
26
28
30
32
34
36
38
40
42
44
46
48
13
10
1
t
0 2
6
5
2
13
10
1
0
12 4
3
6
19 t
0 0
2
4
6
8
10
12
14
16
18
20
22
24
26
28
30
32
34
36
38
40
42
44
46
48
Fig. 2. The RCPSP schedule with minimal time (left) and quality-dependent time slots (right)
4 Computational Experience In order to validate the efficiency, we have coded our double B&B procedure in Visual C++ Version 6.0 under Windows NT 4.0 on a Compaq personal computer (Pentium 500 MHz processor). We have generated a problem set by the RanGen network generator of [4] with 10, 20 and 30 activities. In order to test the presence of renewable resources on the performance of our B&B procedure, we have extended each network instance with renewable resources under a pre-defined design. To that purpose, we rely on the resource use RU and the resource-constrainedness RC which can be generated by RanGen. More precisely, we use 4 settings for the RU (1, 2, 3 or 4) and 4 settings for the RC (0.25, 0.50, 0.75 or 1). In order to generate data for the quality-dependent time slots, we need to generate values for the number of time slots nr(i), the start and finishing time instance per time −
+
slot l ( q il and q il ) and the quality deviation per time unit for each time slot l ( q ilextra ). We used 4 different settings to generate the number of time slots per activity, i.e. nr(i) equals 5, 10, 15 or 20. The start and finishing times of each time slot have been carefully selected between the earliest start time and the latest finish time of each activity, such that the ‘spread’ of the time slots have been generated under 3 settings, i.e. low (time slots close to each other), average and high (time slots far from each other). The q ilextra values have been randomly generated. Using 30 instances for each setting, we have created 17,280 problem instances. In table 1 we report the computational results for the project scheduling problem with renewable resource constraints and quality-dependent time-slots. To that purpose, we display the average CPU-time in seconds (Avg.CPU), the number of problems solved to optimality within 100 seconds CPU-time (#Solved), the average number of created nodes in the main branch-and-bound tree (#Avg.CN) to resolve resource conflics, the average number of branched nodes for this B&B tree (Avg.BN) and the average number of created nodes (Avg.CN2) in the unconstrained branchand-bound tree (lower bound calculation) at each node of the main B&B tree. The row labelled ‘all instances’ gives the average results over all 17,280 problem instances and illustrates the efficiency of our double branch-and-bound procedure. In the remaining rows we show more detailed results for the different parameters of our full factorial experiment.
Scheduling an R&D Project with Quality-Dependent Time Slots
629
Table 1. Computational results for the RCPSP with quality-dependent time slots All instances Number of activities RU
RC
nr(i)
Spread
10 20 30 1 2 3 4 0.25 0.5 0.75 1 5 10 15 20 low mid high
Avg.CPU 43.517 0.099 53.149 77.304 30.462 43.544 48.318 51.745 24.602 48.772 50.547 50.149 31.851 45.553 48.146 48.519 39.394 44.644 46.515
#Solved 10,212 5,760 3,012 1,440 3,123 2,530 2,348 2,211 3,328 2,338 2,270 2,276 3,069 2,476 2,339 2,328 3,640 3,333 3,239
Avg.CN 78.924 57.378 86.021 93.372 59.043 80.558 87.304 88.791 49.583 85.655 89.723 90.735 76.684 79.643 79.423 79.947 78.472 79.704 78.596
Avg.BN 71.517 44.173 80.557 89.821 49.905 72.422 80.869 82.872 36.353 79.643 84.356 85.716 70.006 72.077 71.860 72.125 71.500 72.422 70.629
Avg.CN2 22.881 7.825 26.672 34.145 28.101 24.646 20.121 18.654 25.489 25.258 20.758 20.017 10.714 20.211 28.533 32.064 15.385 24.841 28.416
As expected, the RU and the RC are positively correlated with the problem complexity. Indeed, the more resources in the project and the tighter their constrainedness, the higher the probability for a resource conflict to occur. These effects are completely in line with literature (see e.g. [13]). The number of time slots is positively correlated with problem complexity. The spread of the time slots is positively correlated with the problem complexity. When time slots are close to each other, the selection of an activity time slot is rather straightforward since only one (or a few) are relevant.
5 Conclusions and Areas for Future Research In this paper we presented a double branch-and-bound procedure for the resourceconstrained project scheduling problem with quality-dependent time slots. The depthfirst branch-and-bound strategy to solve renewable resource conflicts makes use of another branch-and-bound procedure to calculate the lower bounds at each node. The branching scheme has been extended with the subset dominance rule to prune the search tree considerably. The incorporation of quality-dependent time slots in project scheduling is, to the best of our knowledge, completely new. It is in our future intentions to broaden the research efforts of quality-dependent time slot selection in project scheduling. More precisely, the introduction of dynamic quality-dependent time slots should open the door to many other applications. In this case, each activity can be executed several times, preferably within the pre-defined time slots. Moreover, the different time slots per activity depend on each other, in the − + sense that the time interval ⎡q il , q il ⎤ depends on the finishing time of the activity in ⎢⎣ ⎥⎦ − + or around the previous time slot = ⎡q il −1 , q il −1 ⎤ . A typical example is a maintenance ⎢⎣ ⎥⎦
630
M. Vanhoucke
operation that needs to be done within certain time limits. A second maintenance operation depends on the execution of the first one, resulting in a second time slot that depends on the starting time of the activity around the first time slot. The development of heuristic solution methods to solve larger, real-life problems instances also lies within our future research intentions.
References 1. Brücker, P., Drexl, A., Möhring, R., Neumann, K., Pesch, E.: Resource-constrained project scheduling: notation, classification, models and methods. European Journal of Operational Research. 112 (1999) 3-41 2. Chen, Y.L., Rinks, D., Tang, K.: Critical path in an activity network with time constraints. European Journal of Operational Research. 100 (1997) 122-133 3. Demeulemeester E., Herroelen, W.: A branch-and-bound procedure for the multiple resource-constrained project scheduling problem. Management Science. 38 (1992) 1803-1818 4. Demeulemeester, E., Vanhoucke, M., Herroelen, W.: A random network generator for activity-on-the-node networks. Journal of Scheduling. 6 (2003) 13-34 5. De Reyck, B., Herroelen, W.: An optimal procedure for the resource-constrained project scheduling problem with discounted cash flows and generalized precedence relations. Computers and Operations Research. 25 (1998) 1-17 6. Falk, J.E., Soland, R.M.: An algorithm for separable nonconvex programming problems. Management Science. 15 (1969) 550-569 7. Horst, R.: Deterministic methods in constrained global optimization: Some recent advances and new fields of application. Naval Research Logistics. 37 (1990) 433-471 8. Icmeli, O., Erengüç, S.S.: A branch-and-bound procedure for the resource-constrained project scheduling problem with discounted cash flows. Management Science. 42 (1996) 1395-1408 9. Icmeli-Tukel, O., Rom, W.O.: Ensuring quality in resource-constrained project scheduling. European Journal of Operational Research. 103 (1997) 483-496 10. Möhring, R.H., Schulz, A.S., Stork, F., Uetz, M.: On project scheduling with irregular starting time costs. Operations Research Letters. 28 (2001) 149-154 11. Schwindt, C.: Minimizing earliness-tardiness costs of resource-constrained projects. In: Inderfurth, K., Schwoediauer, G., Domschke, W., Juhnke, F., Kleinschmidt, P., Waescher, G. (eds.). Operations Research Proceedings. Springer. (1999) 402-407 12. Vanhoucke, M., Demeulemeester, E., Herroelen, W.: An exact procedure for the resourceconstrained weighted earliness-tardiness project scheduling problem. Annals of Operations Research. 102 (2000) 179-196 13. Vanhoucke, M., Demeulemeester, E., Herroelen, W.: On maximizing the net present value of a project under renewable resource constraints. Management Science. 47 (2001) 11131121 14. Vanhoucke, M., Demeulemeester, E., Herroelen, W.: Progress payments in project scheduling problems. European Journal of Operational Research. 148 (2003) 604-620 15. Yang, H.H., Chen, Y.L.: Finding the critical path in an activity network with time-switch constraints. European Journal of Operational Research. 120 (2000) 603-613
The Bottleneck Tree Alignment Problems Yen Hung Chen1, and Chuan Yi Tang2 1 Department of Applied Mathematics, National University of Kaohsiung, Kaohsiung 811, Taiwan, R.O.C. [email protected] 2 Department of Computer Science, National Tsing Hua University, Hsinchu 300, Taiwan, R.O.C. [email protected]
Abstract. Given a set W of k sequences (strings) and a tree structure T with k leaves, each of which is labeled with a unique sequence in W , a tree alignment is to label a sequence to each internal node of T . The weight of an edge of the tree alignment is the distance, such as Hamming distance, Levenshtein (edit) distance or reversal distance, between the two sequences labeled to the two ends of the edge. The bottleneck tree alignment problem is to find a tree alignment such that the weight of the largest edge is minimized. A lifted tree alignment is a tree alignment, where each internal node v is labeled one of the sequences that was labeled to the children of v. The bottleneck lifted tree alignment problem is to find a lifted tree alignment such that the weight of the largest edge is minimized. In this paper, we show that the bottleneck tree alignment problem is NP-complete even when the tree structure is the binary tree and the weight function is metric. For special cases, we present an exact algorithm to solve the bottleneck lifted tree alignment problem in polynomial time. If the weight function is ultrametric, we show that any lifted tree alignment is an optimal bottleneck tree alignment. Keywords: Edit distance, bottleneck tree alignment, metric, ultrametric, NP-complete.
1
Introduction
Tree alignment [3, 9] is one of the fundamental problems in computational biology. Given a set W of k sequences (as DNA sequences of species) and a tree structure T with k leaves, each of which is labeled with a unique sequence, the tree alignment is to label a sequence to each internal node of T . The weight function of an edge of the tree is defined the distance, such as Hamming distance, Levenshtein (edit) distance [1] or reversal distance [7], between the two sequences labeled to the two ends of the edge. The tree alignment problem (TAP for short) is to find a tree alignment such that the sum of the weights of all its edges is minimized. This problem was showed to be NP-complete [12] even when the tree structure is a binary tree and some polynomial time approximation
Corresponding author.
M. Gavrilova et al. (Eds.): ICCSA 2006, LNCS 3982, pp. 631–637, 2006. c Springer-Verlag Berlin Heidelberg 2006
632
Y.H. Chen and C.Y. Tang
schemes (PTASs) had been proposed [8, 13, 14, 15]. When the tree structure is a star ( also called star alignment or median string), this problem was also showed to be NP-complete [6, 9] when the weight function is metric (i.e., the weights of edges satisfy the triangle inequality). This problem was also showed to be MAX SNP-hard [12], but the weight function is not metric. Ravi and Kececioglu [8] posted a variant of the TAP with bottleneck objective function. It is the bottleneck tree alignment problem (BTAP for short). This problem is to find a tree alignment such that the weight of the largest edge is minimized. They gave an O(log k)-approximation algorithm in linear time. However, whether this problem is NP-complete still open. A lifted tree alignment [5, 13, 14] is a tree alignment, in which each internal node v is labeled one of the sequences that was labeled to the children of v. The lifted tree alignment problem is to find a lifted tree alignment such that the sum of the weights of all its edges is minimum and a polynomial time algorithm had been proposed [5]. An optimal solution of the lifted tree alignment problem is a good approximation solution for the TAP [8, 13, 14, 15]. The bottleneck lifted tree alignment problem (BLTAP for short) is to find a lifted tree alignment such that the weight of the largest edge is minimized. In this paper, we use d(u, v) to denote the weight (simultaneously, distance) of the two ends (simultaneously, two sequences) u and v of an edge. The rest of this paper is organized as follows. In section 2, we show that the BTAP is NP-complete even when the tree structure is the binary tree and the weight function is metric. In Section 3, we give an O(k 2 n2 +k 3 ) time algorithm to optimally solve the BLTAP, where n is the maximum length of sequences in W . In section 4, we show that any lifted tree alignment is an optimal solution of the BTAP when the weight function is ultrametric. The ultrametric [2, 5, 11, 17] is the metric [2, 5, 11] and satisfies the following condition: for any three sequences x, y, and z, max{d(x, y), d(y, z)} ≥ d(x, z). Finally, we make a conclusion in section 5.
2
The Bottleneck Tree Alignment Problem Is NP-Complete
In this section, the distance function of two sequences is the edit distance which is metric. Edit distance has been thoroughly studied (see [1] for instance). The edit distance of the two sequences u and v is defined as the minimum number operations such that u transforms into v, where an operation is a single-symbol deletion, a single-symbol insertion, or a single-symbol substitution. Edit distance of the two sequences u and v can be computed in O(|u| ∗ |v|) time via dynamic programming [10, 16], where |u| and |v| are lengths of u and v. Lemma 1. [6] Given two sequences u and v; if |u| ≥ |v| then d(u, v) ≥ |u|− |v|. Lemma 2. [6] Given two sequences u and v; if u and v have no symbol in common then d(u, v) = max(|u|, |v|).
The Bottleneck Tree Alignment Problems
633
A sequence is a string over some alphabet Σ and the set of all finite-length sequences of symbols from Σ is denote by Σ ∗ . Now, we definition the decision version of the BTAP as follows. Bottleneck Tree Alignment Problem With Decision Version Instance: A positive integer m, a set W of k distinct sequences {w1 , w2 , . . . , wk } over an alphabet Σ and a tree structure T with k leaves, each of which is labeled with a unique sequence in W . Question: Does there exist sequences in Σ ∗ which are labeled to all internal nodes of T such that the weight of the largest edge in T is less than or equal to m, where the weight of an edge is the (edit) distance between the two sequences labeled to the two ends of the edge ? Given a finite set S of sequences over an alphabet Σ, the longest common subsequence problem [4] is to find a longest length sequence s such that s is a subsequence of each sequence in S. Bonizzoni and Vedova [6] defined the longest common subsequence0 (LCS0) problem. Given a positive integer m and a finite set S of k distinct sequences over an alphabet Σ0 , in which the length of each sequence in S is 2m, the LCS0 problem is to find a common subsequence s of each sequence in S such that |s| ≥ m. This problem was shown to be NP-complete [6]. LCS0 problem Instance: A positive integer m and a set S of k distinct sequences s1 , s2 , . . . , sk over an alphabet Σ0 , in which the length of each sequence is 2m. Question: Does there exist a sequence s with |s| ≥ m such that s is a common subsequence of each sequence si ∈ S ? Theorem 1. The bottleneck tree alignment problem is NP-complete even when the tree structure is a binary tree and the weight function is metric. Proof. First, It is not hard to see the BTAP in NP. Now, we show the reduction: the transformation from the LCS0 problem to the BTAP. Let a set of sequences S = {s1 , s2 , . . . , sk } over an alphabet Σ0 and a positive constant m be the instance of LCS0. The length of each sequence in S is 2m. Now, we construct a binary tree T as in Figure 1a, where each Ti is a subtree shown in Figure 1b. Then, we find two different symbols b and c that do not belong to Σ0 and an alphabet Σa = {a1 , a2 , . . . , ak } has k different symbols that do not belong to Σ0 ∪ {b, c}. Each leaf in T is labeled by a sequence over Σ = Σ0 ∪ Σa ∪ {b, c}. Let sequences b2m and c2m be b and c repeating 2m times, respectively. The leaves {y1 , y2 , . . . , yk , z1 , z2 , . . . , zk } of T are labeled as follows. If i is odd, yi and zi are labeled by si b2m and ai b2m , respectively. Otherwise, yi and zi are labeled by si c2m and ai c2m , respectively. We will show that there is a common subsequence s of S with |s| = m if and only if there is a tree alignment of T such that the weight of the largest edge is m. (Only if) Assume that there is a common subsequence s of each sequence in S with |s| = m. We label the sequences sb2m and sbm to xi and ui if i is odd, respectively. We also label the sequences sc2m and scm to xi and ui if i is even, respectively. By lemma 2, the weight of the largest edge of T is m.
634
Y.H. Chen and C.Y. Tang
(If) Suppose that there exists sequences which are labeled to the internal nodes of T such that the weight of the largest edge is m. For each 1 ≤ i ≤ k, let ti be the sequence which is labeled to xi . By lemma 2, we have d(yi , zi ) = 2m. Hence, d(yi , xi ) and d(xi , zi ) can not be less than m simultaneously via triangle inequality. However, the weight of the largest edge in T is equal to m, we have d(yi , xi ) = m and d(xi , zi ) = m. Hence, ti must contain b2m (respectively, c2m ) and can not contain the symbol ai if i is odd (respectively, even). Now ti can be written as ti b2m (respectively, ti c2m ) if i is odd (respectively, even). Moreover, ti must be a subsequence of si with |ti | = m by lemma 1 and lemma 2. Furthermore, each sequence which is labeled to ui must contain bm (respectively, cm ) if i is odd (respectively, even), otherwise d(xi , ui ) > m. Because b = c, for 1 ≤ i < k, we have each edge (ui , ui+1 ) has weight greater than or equal to m by lemma 2. However, d(ui , ui+1 ) can not exceed m, thus each sequence which is labeled to ui must be ti bm (respectively, ti cm ) if i is odd (respectively, even). Moreover, ti = s, for 1 ≤ i ≤ k. Therefore, s must be a common subsequence of each sequence in S and the length of s is m.
Fig. 1. a. The tree T. b.The subtree Ti .
3
The Bottleneck Lifted Tree Alignment Problem
In this section, we will present an exact algorithm for solving the BLTAP via dynamic programming. The distance function of two sequences also uses the edit distance. From now on, we let the tree T rooted at r and a set of sequences W = {w1 , w2 , . . . , wk } (i.e., labeling sequences of the leaves of T ) be the instance of the BLTAP. First, let the level Lj of T be the nodes set, in which every node in Lj has the same height j in T and let H be the height of T . Hence, LH is the set of all leaves of T and L1 contains only root r. Then, for a node u in T , we let Tu be the the subtree of T rooted at u and a sequences set des(u) denote the labeling sequences of leaves of Tu . We also let the nodes set chi(u) denote the children of u in Tu . For each node u and wi ∈ W , let B[u, wi ] denote minimum weight among the largest edges of all possible lifted tree alignments of Tu such that u is labeled by the sequence wi . For convenience, we let the node vα be one of the children of u with wi ∈ des(vα ). For each u ∈ Lj , 1 ≤ j ≤ H, if we have
The Bottleneck Tree Alignment Problems
635
all B[v, wt ], in which each v ∈ Lj+1 is a child of u and wt ∈ des(v), then B[u, wi ] can be computed as follow: B[u, wi ] = max{B[vα , wi ],
max
{ min
{max(d(wi , wt ), B[v, wt ])}}}.
v∈chi(u) and v =vα wt ∈des(v)
(1) It is clear that the weight of the largest edge for the optimal bottleneck lifted tree alignment is minwi ∈W B[r, wi ]. One can then obtain B[r, wi ] by iteration above process for each B[u, w] of a node u with w ∈ des(u) via bottom-up (from leaves level to root). Then, it is not hard to find the optimal solution for the BLTAP via backtracking. For technical reason, for each leaf v, B[v, wi ] = 0 if the sequence wi is a labeling sequence of v. For clarification, we list the exact algorithm for the BLTAP as follows. Algorithm Exact-BLTAP Input: A tree T rooted at r with k leaves, each of which is labeled by k distinct sequences W = {w1 , w2 , . . . , wk }. Output: A lifted tree alignment of T . 1. For each sequence pair (wi , wj ) ∈ W , compute the edit distance of wi and wj . 2. For each u ∈ LH , let B[u, w] = 0 if the sequence w is a labeling sequence of u. 3. /* recursively compute B[u, wi ] */ For j = H − 1 to 1 do for each node u ∈ Lj with w ∈ des(v), compute B[u, w] by equation (1). 4. Select a sequence w ∈ W such that the B[r, w] is minimized among all sequences in W . 5. /* backtracking */ Backtrack to find the optimal solution of the BLTAP. Now, we analyze the time-complexity of this algorithm: step 1 takes O(k 2 n2 ) time and step 2 takes O(k) time. Computing each B[u, wi ] with wi ∈ des(u) takes O(k) time and then we need O(k 2 ) time for each level. Hence, step 3 runs in O(k 3 ) time. Clearly, step 4 and step 5 run in O(k) time. The total timecomplexity of Algorithm Exact-BLTAP is O(k 2 n2 + k 3 ). Theorem 2. Algorithm Exact-BLTAP is an O(k 2 n2 + k 3 ) time algorithm for solving the BLTAP.
4
The Score Is Ultrametric
In this section, we show any lifted tree alignment is an optimal solution of the BTAP when the weight function d is ultrametric. Similarly, the tree T and a set of sequences W = {w1 , w2 , . . . , wk } (i.e., labeling sequences of the leaves of T ) be the instance of the BTAP. Theorem 3. Any lifted tree alignment is an optimal bottleneck tree alignment when the weight function is ultrametric.
636
Y.H. Chen and C.Y. Tang
Proof. Let To denote an optimal solution of the BTAP. Now, we show any d(wi , wj ) for {wi , wj } ∈ W is less than or equal to the weight of the largest edge of To when the weight function is ultrametric. We let vi be labeled by a sequence wi and vj be labeled by a sequence wj , in which vi and vj are the leaves of T . We also let the sequences set (wi , wi1 , wi2 , . . . , wip , wj ) are labeled to the path Pi,j = (vi , vi1 , vi2 , . . . , vip , vj ) of To . Because the weight function is ultrametric, we have d(wi , wj ) ≤ max{d(wi , wip ), d(wip,wj )}, d(wi , wip ) ≤ max{d(wi , wi(p−1) ), d(wi(p−1) , wip )}, .. . d(wi , wi2 ) ≤ max{d(wi , wi1 ), d(wi1 , wi2 )}. Hence, d(wi , wj ) is less than or equal to the weight of largest edge in the path Pi,j of To . Therefore, we have any d(wi , wj ) is less than or equal to the weight of the largest edge in To . Clearly, the weight of the largest edge of any lifted tree alignment is one of d(wi , wj ) for all {wi , wj } ∈ W and then we can immediately conclude.
5
Conclusion
In this paper, we have shown the bottleneck tree alignment problem is NPcomplete even when the tree structure is the binary tree and the weight function is metric. Then, we solved this problem in two special cases. It would be interesting to find a constant factor approximation algorithm for the BTAP or whether the BTAP is MAX SNP-hard.
References 1. Aho, A.V.: Algorithms for finding patterns in strings. Handbook of Theoretical Computer Science, Volume A: Algorithms and Complexity A (1990), 290–300. 2. Bonizzoni, P. and Vedova, G.D: The complexity of multiple sequence alignment with SP-score that is a metric. Theoretical Computer Science 259 (2001) 63–79. 3. Chan, S.C., Wong, A.K.C and Chiu, D.K.T.: A survey of multiple sequence comparison methods. Bulletin of Mathematical Biology 54 (1992) 563–598. 4. Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to Algorithm. 2nd edition MIT Press, Cambridge (2001). 5. Gusfield, D.: Algorithms on strings, trees, and sequences: computer science and computational biology. Cambridge University Press Cambridge UK (1997). 6. Higuera, C. and Casacuberta, F.: Topology of strings: median string is NPcomplete. Theoretical Computer Science 230 (2000) 39–48. 7. Pevzner, P.A.: Computational Molecular Biology: An Algorithmic Approach. The MIT Press, Cambridge, MA 2000. 8. Ravi, R. and Kececioglu, J.: Approximation algorithms for multiple sequence alignment under a fixed evolutionary tree. Discrete Applied Mathematics 88 (1998) 355–366.
The Bottleneck Tree Alignment Problems
637
9. Sankoff, D.: Minimal mutation trees of sequences. SIAM Journal on Applied Mathematics 28 (1975) 35–42. 10. Sankoff, D. and Kruskal, J: Time warps, string edits, and macromolecules: the theory and practice of sequence comparison. Addison-Wesley Reading, MA (1983). 11. Setubal, J. and Meidanis, J.: Introduction to computational molecular biology. PWS Publishing Company (1997). 12. Wang, L. and Jiang, J.: On the complexity of multiple sequence alignment. Journal of Computational Biology 1 (1994) 337–348. 13. Wang, L., Jiang, T. and Lawler, E.L.: Approximation algorithms for tree alignment with a given phylogeny. Algorithmica 16 (1996) 302–315. 14. Wang, L. and Gusfield, D.: Improved approximation algorithms for tree alignment. Journal of Algorithm 25 (1997) 255–273. 15. Wang, L., Jiang, J. and Gusfield, D.: A more efficient approximation scheme for tree alignment. SIAM Journal on Computing 30 (2000) 283–299. 16. Wagner, R. and Fisher, M.: The string-to-string correction problem. Journal of the ACM 21 (1974) 168–178. 17. Wu, B.Y., Chao, K.M. and Tang, C.Y., Approximation and exact algorithms for constructing minimum ultrametric trees from distance matrices. Journal of Combinatorial Optimization 3 (1999) 199–211.
Performance Study of a Genetic Algorithm for Sequencing in Mixed Model Non-permutation Flowshops Using Constrained Buffers Gerrit F¨ arber and Anna M. Coves Moreno Instituto de Organizaci´ on y Control de Sistemas Industriales (IOC), Universidad Polit´ecnica de Catalu˜ na (UPC), Barcelona, Spain Gerrit [email protected], [email protected]
Abstract. This paper presents the performance study of a Genetic Algorithm applied to a mixed model non-permutation flowshop production line. Resequencing is permitted where stations have access to intermittent or centralized resequencing buffers. The access to the buffers is restricted by the number of available buffer places and the physical size of the products. Characteristics such as the difference between the intermittent and the centralized case, the number of buffer places and the distribution of the buffer places are analyzed. Improvements that come with the introduction of constrained resequencing buffers are highlighted.
1
Introduction
In the classical production line, only products with the same options were processed at once. Products of different models, providing distinct options, were either processed on a different line or major equipment modifications were necessary. For today’s production lines this is no longer desirable and more and more rise the necessity of manufacturing a variety of different models on the same line, motivated by offering a larger variety of products to the client. Furthermore, the stock for finished products is reduced considerably with respect to a production with batches, and so are the expenses derived from it. Mixed model production lines consider more than one model being processed on the same production line in an arbitrary sequence. Nevertheless, the majority of publications in this area are limited to solutions which determine the job sequence before the jobs enter the line and maintain it without interchanging jobs until the end of the production line, which is known as permutation flowshop. Exact approaches for makespan minimization can be found in [1], [2], and [3], among others. In two recent reviews [4], [5] heuristic methods for sequencing problems are presented. In the case of more than three stations and with the objective function to minimize the makespan, a unique permutation is no longer optimal. In [6] a study of
This work is partially supported by the Ministry of Science and Technology, and the funding for regional research DPI2004-03472.
M. Gavrilova et al. (Eds.): ICCSA 2006, LNCS 3982, pp. 638–648, 2006. c Springer-Verlag Berlin Heidelberg 2006
Performance Study of a Genetic Algorithm for Sequencing in Mixed Model
639
the benefit of using non-permutation flowshops is presented. Furthermore, there exist various designs of production lines which permit resequencing of jobs: using large buffers (Automatic-Storage-and-Retrieval-System) which decouple one part of the line from the rest of the line [7]; buffers which are located off-line [8]; hybrid or flexible lines [9]; and more seldomly, the interchange of job attributes instead of physically changing the position of a job within the sequence [10]. Resequencing of jobs on the line is even more relevant with the existence of an additional cost or time, occurring when at a station the succeeding job is of another model, known as setup-cost and setup-time [11]. The present work considers a flowshop with the possibility to resequence jobs between consecutive stations. The buffers are located off-line either accessible from a single station (intermittent case) or from various stations (centralized case). In both cases, it is considered that a job may not be able to be stored in a buffer place, due to its extended physical size, see figure 1. J1
Bi,1
a)
Ii+1
J2 J3
Ii
Bi,2
Ii-1
J3 J43 b)
Ii-1
Ii
J2
Bi,1
Ii+1 Bi,2
Fig. 1. Scheme of the considered flowshop. The jobs Jj pass consecutively through the stations Ii . The buffer Bi permits to temporally store a job with the objective of reinserting it at a later position in the sequence. a) Job J2 can pass through any of the two buffer places Bi,1 or Bi,2 of buffer Bi . b) Job J3 can pass only through buffer place Bi,2 , due to its physical size.
The considered problem is relevant to various flowshop applications such as chemical productions dealing with client orders of different volumes and different sized resequencing tanks. Also in productions where split-lots are used for engineering purpose, such as the semiconductor industry. Even in the production of prefabricated houses with, e.g., large and small walls passing through consecutive stations where electrical circuits, sewerage, doors, windows and isolation are applied. In what follows the problem is formulated with more detail and the applied Genetic Algorithm is described. Thereafter, the accomplished performance study is presented and finally conclusions are presented which are already useful at the time a production line is being designed.
640
2
G. F¨ arber and A.M. Coves Moreno
Problem Definition
The realized work is based on the classical flowshop in which the jobs (J1 , J2 , ..., Jj , ..., Jn ) pass consecutively through the stations (I1 , I2 , ..., Ii , ..., Im ). Furthermore, after determined stations, off-line buffers Bi permit to resequence jobs. The buffer provides various buffer places (Bi,1 , Bi,2 , ...) and each buffer place is restricted by the physical size of the jobs to be stored. As can be seen in figure 1a, job J2 can be stored in buffer place Bi,1 as well as in Bi,2 . Whereas, the next job J3 can be stored only in buffer place Bi,2 , because of the physical size of the job exceeding the physical size of the buffer place Bi,1 , see figure 1b. In a first step, the resequencing buffers are located intermittent, between two consecutive stations. In this case the buffer is assigned to the precedent station and may be accessed only by this station. Then, for an additional benefit, a single resequencing buffer is used, with access from various stations, while the limitations on the physical size of the buffer places are maintained.
3
Genetic Algorithm
The concept of the Genetic Algorithm was first formulated by [12] and [13] and can be understood as the application of the principles of evolutionary biology, also known as the survival of the fittest, to computer science. Genetic algorithms are typically implemented as a computer simulation in which a population of chromosomes, each of which represents a solution of the optimization problem, evolves toward better solutions. The evolution starts from an initial population which may be determined randomly. In each generation, the fitness of the whole population is evaluated and multiple individuals are stochastically selected from the current population, based on their fitness and modified to form a new population. The alterations are biologically-derived techniques, commonly achieved by inheritance, mutation and crossover. Multiple genetic algorithms were designed for mixed model assembly lines such as [14], [15], [16] and [17]. The heuristic used here is a variation of the Genetic Algorithm explained in [18]. The genes represent the jobs which are to be sequenced. The chromosomes υ, determined by a series of genes, represent a sequence of jobs. A generation is formed by R chromosomes and the total number of generations is G. In the permutation case, the size of a chromosome is determined by the number of jobs, the fraction Π. In the non-permutation case, the chromosomes are L + 1 times larger, resulting in the fractions Π1 , ..., ΠL+1 , being L the number of resequencing possibilities. In both cases, special attention is required when forming the chromosomes, because of the fact that for each part of the production line every job has to be sequenced exactly one time. The relevant information for each chromosome is its fitness value (objective function), the number of job changes and the indicator specifying if the chromosome represents a feasible solution. A chromosome is marked unfeasible and is imposed with a penalty, if a job has to be taken off the line and no free buffer place is available or the physical size of the job exceeds the size limitation of the
Performance Study of a Genetic Algorithm for Sequencing in Mixed Model
641
available buffer places. When two solutions result in the same fitness, the one with fewer job changes is preferred. 3.1
Genetic Operators
The genetic operators specify in which way the subsequent population is generated by reproduction of the present population, taking into account that ”fitter” solutions are more promising and therefore are more likely to reproduce. Even an unfeasible solution is able to reproduce, because of the fact that it may generate valuable and feasible solutions in one of the preceding generations. The used genetic operators are inheritance, crossover and mutation. The value pX is the percentage with which a genetic operator X is applied to a chromosome. Inheritance: This operator is determined by two parameters. The parameter pBS determines the percentage of the best solutions which will be copied directly to the next generation, called the cluster of promising chromosomes, and ensures that promising chromosomes are not extinct. Then, in order to not remain in a local minimum, the parameter pb determines the percentage of chromosomes which are removed from this cluster. Crossover: This operator specifies the operation of interchanging information of two chromosomes. Two crossover operations are applied, crossover-I (figure 2a,b) and crossover-II (figure 2c,d). The probabilities with which these operations are applied to a chromosome are pc-I and pc-II , and the crossover points are defined by the random number pos, and the pair pos1 and pos2 , respectively. pos = 7
a)
pos1 = 7
c)
pos2 = 14
u1
1-3-4-6-2-5-7
1-3-4 6-2-5-7
1-3-4-6-2-5-7
u5
1-3-4-6-2-5-7
1-3-4 6-2-5-7
1-3-4-6-2-5-7
u‘1
1-3-4-6-2-5-7
3-4-5-6-1-7-2
3-4-5-6-1-7-2
u‘5
1-3-4-6-2-5-7
3-4-5-6-1-7-2
1-3-4-6-2-5-7
u‘2
3-4-5-6-1-7-2
1-3-4 6-2-5-7
1-3-4-6-2-5-7
u‘6
3-4-5-6-1-7-2
1-3-4 6-2-5-7
3-4-5-6-1-7-2
u2
3-4-5-6-1-7-2
3-4-5-6-1-7-2
3-4-5-6-1-7-2
u6
3-4-5-6-1-7-2
3-4-5-6-1-7-2
3-4-5-6-1-7-2
P´1
P´2
P´3
pos = 10
b)
P´1
P´2 pos1 = 10
d)
P´3 pos2 = 17
u3
1-3-4-6-2-5-7
1-3-4 6-2-5-7
1-3-4-6-2-5-7
u7
1-3-4-6-2-5-7
1-3-4 6-2-5-7
1-3-4 6-2-5-7
u‘3
3-4-5-6-1-7-2
3-4-5 1-6-2-7
1-3-4-6-2-5-7
u‘7
3-4-5-6-1-7-2
3-4-5 1-6-2-7
1-3-4 5-6-7-2
u‘4
1-3-4-6-2-5-7
1-3-4 5-6-7-2
3-4-5-6-1-7-2
u‘8
1-3-4-6-2-5-7
1-3-4 5-6-7-2
3-4-5 1-6-2-7
u4
3-4-5-6-1-7-2
3-4-5-6-1-7-2
3-4-5-6-1-7-2
u8
3-4-5-6-1-7-2
3-4-5-6-1-7-2
3-4-5 6-1-7-2
P´1
P´2
Crossover-I
P´3
P´1
P´2
P´3
Crossover-II
Fig. 2. Operators crossover-I and crossover-II. a) and c) In the simple case the crossing takes place between two main fractions of the chromosome. After the crossover point the chromosomes are completely crossed over. b) and d) In the more complex case it has to be assured that each job is sequenced exactly one time for each fraction of the chromosome.
642 a)
G. F¨ arber and A.M. Coves Moreno
pos1 < pos2 pos1 = 3
b)
pos2 = 7
pos1 = 9 pos2 = 12
u9
3-4-5-6-1-7-2
3-4-5-6-1-7-2
u10
1-3-4-6-2-5-7
1-3-4-6-2-5-7
u ‘9
3-4- 6-1-7-2-5
3-4-5-6-1-7-2
u‘10
1-3-4-6-2-5-7
1-2-4-6-3-5-7
P´1
P´2
Mutation-I
P´1
P´2
Mutation-II
Fig. 3. Operators mutation-I and mutation-II. a) The job at position pos1 is taken off the line and reinserted to the line at position pos2 . b) The two jobs at position pos1 and pos2 are interchanged.
If the crossover point (pos, pos1 and pos2 ) is a multiple of the number of jobs to be sequenced, the crossover operation is simple and takes place between two main fractions of the chromosome, i.e. after the crossover point the chromosomes are completely crossed over. Whereas, in the complex case the crossover points are located within a main fraction of the chromosome and it has to be assured explicitly that each job is sequenced exactly one time for each fraction of the chromosome. Mutation: This operator specifies the operation of relocating jobs at position pos1 to position pos2 within the same fraction of a chromosome. Two mutation operators are applied, mutation-I and mutation-II (figure 3). Furthermore, there exist two cases for mutation-I: forward mutation, where pos1 < pos2 ; and backward mutation, where pos1 > pos2 . In the first case, a single job has to be taken off the line, and in the second case, in order to let a single job pass, a group of succeeding jobs has to be taken off the line, resulting in a larger effort to realize. The probabilities of this operator are pm-I(f) , pm-I(b) and pm-II . 3.2
Cascading
In order to further improve the Genetic Algorithm, it is partitioned into two steps. In the first step, the possibility of resequencing jobs within the production line is ignored, furthermore only permutation sequences are considered as possible solutions and the chromosome size is reduced to the number of jobs. The last generation, together with the best found solution, form the initial generation for the next cascade where the resequencing possibilities, provided by stations with access to resequencing buffers, are taken into account. 3.3
Parameter Adjustment
The adjustment of the values of the genetic operators, which are used for the two cascades of the Genetic Algorithm, is determined with an extended experimentation and consists of three steps: Rough Adjustment: In order to adjust the parameters in a robust manner, different sets of parameters are applied to a series of 14 differently sized problems, varying the number of jobs to be sequenced, the number of stations and
Performance Study of a Genetic Algorithm for Sequencing in Mixed Model
643
the number of resequencing possibilities. During the rough adjustment only one unique seed is used for the random number generation in the Genetic Algorithm. The sets of parameters are summarized and the 300 most promising which show good performance on all problem sizes are used for further adjustment. Repeatability: The use of only one seed in the rough adjustment requires to determine amongst the promising sets of parameters which set achieves good results for a multitude of seeds. The fact that a set of parameters achieves good results for different seeds indicates that the same set of parameters also performs well for different plant setups. The promising sets of parameters are verified with 16 different seeds for the 14 differently sized problems. Once the sets of promising parameters are examined with respect to repeatability, one set is used for the fine adjustment, determined by grouping into clusters [19]. Fine Adjustment: Due to the fact that in the previous analysis predetermined discrete values for the parameters are used, a fine adjustment succeeds. The genetic operators are subject to an adjustment of 0.1 for the previously determined sets of parameters and are revised with 16 seeds for the 14 differently sized problems, used for the repeatability. The experimentation is first performed on the permutation case (first cascade), and then on the non-permutation case (second cascade). Table 1 lists the resulting values of the genetic operators used in the following performance study. Table 1. Characteristic values of the Genetic Algorithm. The first cascade is applied to determine a generation with only permutation solutions, which is then used as an initial generation for the second cascade. Cascade Step 1 Step 2
4
R
G
pBS
pb
pc-I
pc-II
pm-I(f)
100 100
1000 10000
0.05 0.05
0.1 0.4
0.3 0.5
0.6 0.35
0.25 0.45
pm-I(b) pm-II 0.25 0.1
0.25 0.1
Performance Study
For the study of performance, a flowshop which consists of 5 stations is considered. The range of the production time is [0...20] such that for some jobs exists zero-processing time at some stations, for the setup-time [2...8] and for the setuptime [1...5]. The number of jobs is varied from 5 to 100 with increments of 5. The objective function, is the weighted sum of the makespan (factor of 1.0) and the setup-cost (factor of 0.3), where the setup-time has is not concerned with a weight but is indirectly included in the calculation of the makespan. 4.1
Intermittent Versus Centralized Location
Replacing the intermittent resequencing buffer places with centralized resequencing buffer places has two benefits. On the one hand, for the case of the same
644
G. F¨ arber and A.M. Coves Moreno Improvement compared to permutation case
Number of job changes
5%
40
C4 I22
4%
C4 I22 3%
30
C3
C3 20
2%
10
1%
0% 10
20
30
40
50
60
70
80
90
100
Number of jobs
0 10
20
30
40
50
60
70
80
90
100
Number of jobs
Fig. 4. Comparison of three cases: one centralized resequencing buffer with four buffer places (C4); two intermittent resequencing buffers, each providing two buffer places (I22); one centralized resequencing buffer with three buffer places (C3).
number of buffer places, the objective function of the final solution is expected to be at least as good. This is caused by the fact that in some instances of time, all buffer places of a certain intermittent resequencing buffer may be occupied and do not allow an additional job to be removed from the line, while buffer places from other intermittent resequencing buffers are not accessible. Whereas, in the case of a centralized buffer, blocking only appears when all buffer places are occupied. On the other hand, the number of buffer places may be reduced in order to obtain values of the objective function similar to the case of the intermittent resequencing buffer. Depending on the number of buffer places which are reduced, this reduction in area is significant. Figure 4 shows the comparison of the intermittent and the centralized case. After the second station and after the third station there exists access to resequencing buffers. The compared cases are: (I22) two intermittent resequencing buffers, each buffer provides two buffer places; (C3, C4) one centralized resequencing buffer providing three and four buffer places, respectively. On the one hand better solutions are obtained by arranging the buffers centralized; on the other hand, the reduction from four to three buffer places leads to solutions nearly as good as in the intermittent case. 4.2
Number of Buffer Places
The increase in the number of buffer places makes the limitations less strict and as already seen in the previous case, solutions are expected to improve. Figure 5 shows the centralized case without physical size limitations. Jobs leaving the second or the third station have access to the centralized buffer, provided with 2, 3 or 4 buffer places. Providing more buffer places results in better solutions together with an elevated number of job changes. Figure 6 illustrates an intermittent case. In I22 the second and the third station have 2 buffer places each, in I20 the buffer after the second station has two buffer places and in I02 the buffer after the third station has two buffer places.
Performance Study of a Genetic Algorithm for Sequencing in Mixed Model Improvement compared to permutation case
645
Number of job changes
5%
40
C4 4%
C4
30
C3
3%
C3 20
C2
2%
C2 10 1%
0
0% 10
20
30
40
50
60
70
80
90
10
100
20
30
40
50
60
70
80
90
100
Number of jobs
Number of jobs
Fig. 5. Variation of the number of buffer places for the centralized case Improvement compared to permutation case
Number of job changes
5%
40
I22 4% 30
I22 I20
3%
I20 I02
20
I02
2%
10
1%
0
0% 10
20
30
40
50
60
70
80
90
100
10
20
30
40
Number of jobs
50
60
70
80
90
100
Number of jobs
Fig. 6. Variation of buffer places for the case of the intermittent case
Providing two more buffer places results in slightly better solutions together with an elevated number of job changes. 4.3
Difference in Physical Size of Buffer Places
Introducing limitations on the physical size of the buffer places on one side restricts possible solutions but on the other side minimizes the necessary buffer area. This limitation arises, for example, in a chemical production. The arrangement of two tanks which are located off the line, accessible after a certain station, equals an intermittent resequencing buffer with two buffer places. With tank capacities of 50 and 100 liters, a client order of 80 liters can be stored only in the larger of the two tanks which is capable of storing this volume. Whereas, a client order of 50 liters can be stored in either of the tanks. A close look at the local conditions may amortize an increase in the objective function compared to a reduction of investment with respect to tank size and gained area. As a concrete example, three differently sized buffer places (large, medium, 3 3 4 small) are available and the ratio of jobs is 10 large, 10 medium and 10 small. As in the previous section, the second and the third station have access to the resequencing buffers and table 4.3 shows the allocation of the buffer places to the buffers, considering eight scenarios. ”300” represents 3 large, 0 medium and 0 small buffer places. In the intermittent case the first buffer is provided
646
G. F¨ arber and A.M. Coves Moreno
Table 2. Allocation of the buffer places to the buffers. In the intermittent case the allocation is done to two different buffers. Case Size (300) (111) (102) (012)
Intermittent l m s 1/2 0/1 0/1 0/0
0/0 1/0 0/0 0/1
0/0 0/1 1/1 1/1
Improvement compared to permutation case
l
Centralized m s
3 1 1 0
0 1 0 1
0 1 2 2
Improvement compared to permutation case
4%
4%
3%
3%
300 111 102
300 2%
111
2%
012
102 012
1%
1%
0%
0% 10
20
30
40
50
60
70
80
90
100
Number of jobs
a) Intermittent case
10
20
30
40
50
60
70
80
90
100
Number of jobs
b) Centralized case
Fig. 7. Influence of the variation of the physical size of the buffer places. ”102” represents 1 large, 0 medium and 2 small buffer places. In the intermittent case, the buffer places are divided to two buffers, each with access from a designated station. In the centralized case, the same two stations have simultaneously access to the buffer, con3 3 4 taining all three buffer places. The ratio of jobs is 10 large, 10 medium and 10 small.
with 1 and the second buffer with 2 large buffer places. In the centralized case the same two stations have access to a single centralized buffer, containing the three buffer places. Figure 7 shows the influence of the limitation of the physical size. The variation of the size of the buffer places towards smaller buffer places on the one hand decreases the benefit achieved by the possibility of resequencing jobs. On the other hand, it may amortize when taking into account the reduction of investment with respect to tank size and gained area.
5
Conclusions
This paper has presented a study of performance of a genetic algorithm which was applied to a mixed model non-permutation flowshop. The algorithm uses the genetic operators inheritance, crossover and mutation and is designed to consider intermittent or centralized resequencing buffers. Furthermore, the buffer access is restricted by the number of buffer places and the physical size of jobs. The realized study of performance demonstrates the effectiveness of resequencing by examining certain characteristics. The results of the simulation
Performance Study of a Genetic Algorithm for Sequencing in Mixed Model
647
experiments reveal the benefits that come with a centralized buffer location, compared to the intermittent buffer location. It either improves the solution or leads to the utilization of fewer resequencing buffer places. An increased number of buffer places clearly improves the objective function and including buffers, constrained by the physical size of jobs to be stored, on one side limits the solutions but on the other side minimizes the necessary buffer area. In order to take full advantage of the possibilities of resequencing jobs in a mixed model flowshop, additional installations may be necessary to mount, like buffers, but also extra efforts in terms of logistics complexity may arise. The additional effort is reasonable if it pays off the necessary investment. Due to the dependency on local conditions, a general validation is not simple and was not part of this work.
References [1] Ignall, E., Schrage, L.: Application of the branch and bound technique to some flow-shop problems. Operations Research 13(3) (1965) 400–412 [2] Potts, C.: An adaptive branching rule for the permutation flowshop problem. European Journal of Operational Research 5(2) (1980) 19–25 [3] Carlier, J., Rebai, I.: Two branch and bound algorithms for the permutation flowshop problem. European Journal of Operational Research 90(2) (1996) 238– 251 [4] Framinan, J., Gupta, J., Leisten, R.: A review and classification of heuristics for permuation flowshop scheduling with makespan objective. Technical Report OI/PPC-2001/02 (2002) Version 1.2. [5] Framinan, J., Leisten, R.: Comparison of heuristics for flowtime minimisation in permuation flowshops. Technical Report IO-2003/01 (2003) Version 0.5. [6] Potts, C., Shmoys, D., Williamson, D.: Permutation vs. non-permutation flow shop schedules. Operations Research Letters 10(5) (1991) 281–284 [7] Lee, H., Schaefer, S.: Sequencing methods for automated storage and retrieval systems with dedicated storage. Computers and Industrial Engineering 32(2) (1997) 351–362 [8] Lahmar, M., Ergan, H., Benjaafar, S.: Resequencing and feature assignment on an automated assembly line. IEEE Transactions on Robotics and Automation 19(1) (2003) 89–102 [9] Engstr˝ om, T., Jonsson, D., Johansson, B.: Alternatives to line assembly: Some swedish examples. International Journal of Industrial Ergonomics 17(3) (1996) 235–245 [10] Rachakonda, P., Nagane, S.: Simulation study of paint batching problem in automobile industry. http://sweb.uky.edy/˜pkrach0/Projects/MFS605Project.pdf (2000) consulted 14.07.2004. [11] Bolat, A.: Sequencing jobs on an automobile assembly line: objectives and procedures. International Journal of Production Research 32(5) (1994) 1219–1236 [12] Holland, J.: Genetic algorithms and the optimal allocation of trials. SIAM J. Comput. 2(2) (1973) 88–105 [13] Holland, J.: Adaptation in natural and artificial systems. University of Michigan Press, Ann Arbor (1975)
648
G. F¨ arber and A.M. Coves Moreno
[14] Bolat, A., Al-Harkan, I., Al-Harbi, B.: Flow-shop scheduling for three serial stations with the last two duplicate. Computers & Operations Research 32(3) (2005) 647–667 [15] Levitin, G., Rubinovitz, J., Shnits, B.: A genetic algorithm for robotic assembly line balancing. European Journal of Operational Research 168 (2006) 811–825 [16] Wang, L., Zhang, L., Zheng, D.: An effective hybrid genetic algorithm for flow shop scheduling with limited buffers. Computers & Operations Research (2006) Article in Press. [17] Rub´en, R., Concepci´ on, M.: A genetic algorithm for hybrid flowshops with sequence dependent setup times and machine eligibility. European Journal of Operational Research 169(3) (2006) 781–800 [18] Michaelewicz, Z.: Gentic Algorithms + Data Structures = Evolution Programs. 3rd edn. Springer Verlag (1996) [19] Balasko, B., Abonyi, J., Feil, B.: Fuzzy clustering and data analysis toolbox; for use with matlab. (2005)
Optimizing Relative Weights of Alternatives with Fuzzy Comparative Judgment Chung-Hsing Yeh1,2 and Yu-Hern Chang2 1
Clayton School of Information Technology, Faculty of Information Technology, Monash University, Clayton, Victoria 3800, Australia [email protected] 2 Department of Transportation and Communications Management, National Cheng Kung University, Tainan, 701, Taiwan [email protected] Abstract. This paper presents an optimal weighting approach for maximizing the overall preference value of decision alternatives based on a given set of weights and performance ratings. In policy analysis settings, relative weights for policy alternatives are subjectively assessed by a group of experts or stakeholders via surveys using comparative judgment. A hierarchical pairwise comparison process is developed to help make comparative judgment among a large number of alternatives with fuzzy ratio values. Performance ratings for policy alternatives are obtained from objective measurement or subjective judgement. The preference value of an expert on a policy alternative is obtained by multiplying the weight of the alternative by its performance rating. Two optimization models are developed to determine the optimal weights that maximize the overall preference value of all experts or stakeholders. An empirical study of evaluating Taiwan’s air cargo development strategies is conducted to illustrate the approach.
1 Introduction Policy analysis and decision making in the public and private sectors often involve the evaluation and ranking of available alternatives or courses of action such as alternative plans, strategies, options or issues. These alternatives are to be evaluated in terms of their weight (relative importance) and performance rating with respect to a single criterion or multiple criteria. In a new decision setting, the weights of the alternatives can only be obtained by expert surveys, while the performance rating can be objectively measured or subjectively assessed by experts. To obtain the evaluation data by expert surveys, two types of judgment are often required: comparative judgment and absolute judgment [2, 14]. The comparative judgment is required for assessing the relative weight of the alternatives for which a pairwise comparison process is to be used. The absolute judgment can be used to rate the performance of the alternatives independently. To better reflect the inherent subjectiveness and imprecision involved in the survey process, the concept of fuzzy sets [19] is used for representing the assessment results. Modeling using fuzzy numbers has proven to be an effective way for formulating decision problems where the information available is subjective and imprecise [5]. To maximize the overall
M. Gavrilova et al. (Eds.): ICCSA 2006, LNCS 3982, pp. 649 – 658, 2006. © Springer-Verlag Berlin Heidelberg 2006
650
C.-H. Yeh and Y.-H. Chang
preference value assessed by the experts or stakeholders, holding different beliefs and interests about the evaluation problem, we develop an optimal weighting approach. In subsequent sections, we first present a hierarchical pairwise comparison process for facilitating comparative judgments among a large number of alternatives with fuzzy ratio values. We then develop two optimal weighting models for determining the optimal weights of alternatives for maximizing their overall preference value. Finally we conduct an empirical study to illustrate the optimal weighting approach.
2 Hierarchical Pairwise Comparison for Alternative Weights To make comparative judgment on the weight (relative importance) of the alternatives, each alternative needs to be compared with all other alternatives. As there are limitations on the amount of information that humans can effectively handle [12], a pairwise comparison approach is advisable to help the experts make comparative judgment. The concept of pairwise comparisons has been popularly implemented in the analytic hierarchy process (AHP) of Saaty [13]. In the AHP, the 1-9 ratio scale is used to compare two alternatives for indicating the strength of their weight. Applying this procedure to all m alternatives will result in a positive m × m reciprocal matrix with all its elements xij = 1/xji (i = 1, 2, …, m; j = 1, 2, …, m). In this study, we use the 1-9 ratio scale in pairwise comparisons, as it has proven to be an effective measurement scale for reflecting the qualitative information of a decision problem [13]. To reflect the subjectivity and vagueness involved, the ratio value given by the experts is represented by a corresponding triangular fuzzy number. Table 1 illustrates how a triangular fuzzy number is generated to represent the fuzzy assessment from a numeric ratio value assessed by experts. If the ratio value given is 5 (“Strongly more important”), the fuzzy assessment represented as a triangular fuzzy number is (3, 5, 7). Table 1. Ratio value fuzzification of pairwise comparisons
Equally important 1
Moderately more important 2 3 a1
Strongly more important 4 5 a2
Very strongly more important 6 7 a3
Extremely more important 8 9
In solving a fuzzy positive reciprocal matrix resulting from pairwise comparisons using fuzzy ratios, Buckley [3, 4] uses the geometric mean method to calculate the relative fuzzy values for all the alternatives. This method possesses a number of desirable properties and can be easily extended to fuzzy positive reciprocal matrices. The method guarantees a unique solution to the fuzzy positive reciprocal matrix and can be easily applied to situations where multiple experts are involved in the assessment process [3]. Given a fuzzy positive reciprocal matrix R = [xij] (i = 1, 2, …, m; j = 1, 2, …, m), the method first computers the geometric mean of each row as
Optimizing Relative Weights of Alternatives with Fuzzy Comparative Judgment m
ri = ( ∏ xij )1 / m
651
(1)
j =1
The fuzzy values wi for m alternatives Ai (i = 1, 2, …, m) are then computed as wi = ri
m
∑ rj
(2)
j =1
With the use of triangular fuzzy numbers, the arithmetic operations on fuzzy numbers are based on interval arithmetic [9]. The Buckley’s method requires the fuzzy positive reciprocal matrix to be reasonably consistent [4, 11]. This pairwise comparison process requires each expert/respondent to make m(m-1)/2 comparisons for each attribute. It would be tedious and impractical, if the number of the alternatives to be compared is great. For example, in the empirical study to be presented in Section 4, 153 (=18(18-1)/2) comparisons are required for each expert to apply the pairwise comparison process directly to assess the relative importance of all 18 alternative strategies. To facilitate the pairwise comparison process for assessing a large number of alternatives, we suggest a hierarchical approach, if these alternatives can be grouped into their higher level goals or functional areas. In this case, instead of comparing between all alternatives, the pairwise comparison process is conducted (a) between groups (higher level goals or functional areas), and (b) between alternatives within each group. As such, the number of comparisons required is greatly reduced. For example, when apply this hierarchical pairwise comparison approach to the empirical study, only 36 (=15+10+1+1+3+3+3) comparisons are required. To obtain the relative fuzzy values of all alternatives across all groups, the relative fuzzy values of alternatives under each group are normalized by making the relative fuzzy value of the corresponding group (relative to other groups) as their mean value. If the relative value of a group Ai (i = 1, 2, …, I) is wAi and the relative values of its Ni associated alternatives within the group Ai are wiA (h = 1, 2, …, Ni), then the relative ih
values of these Ni alternatives among all alternatives are wAih = w Ai × wiA × ( N i ih
Ni
i ∑ wA )
h =1
ih
(3)
To defuzzify the fuzzy assessments, we use the concept of α-cut in fuzzy set theory
[10]. By using an α-cut on a triangular fuzzy number, a value interval [ xlα , xαr ] can be
derived, where 0 ≤ α ≤ 1 . For a given α, xlα and xαr are the average of the lower bounds and upper bounds of the crisp intervals respectively, resulted from all the α-cuts using the alpha values equal to or greater than the specified value of α. The value of α represents the decision maker’s degree of confidence in the fuzzy assessments of experts/respondents. To reflect the decision maker’s relative preference between xlα and xαr , an attitude index λ in the range of 0 and 1 can be incorporated. As a result, a crisp value can be obtained as
652
C.-H. Yeh and Y.-H. Chang
xαλ = λxrα + (1 − λ ) xlα , 0 ≤ λ ≤ 1.
(4)
The value of λ can be used to reflect the decision maker’s attitude towards risk, which may be optimistic, pessimistic or somewhere in between [16, 17]. In actual decision settings, λ = 1, λ = 0.5, or λ = 0 can be used to indicate that the decision maker has an optimistic, moderate, or pessimistic view respectively on fuzzy assessment results.
3 Optimal Weighting of Alternatives The weights of alternatives given by the experts or stakeholders may not necessarily be the optimal weights for the evaluation problem as a whole, as these experts may hold different, often conflicting beliefs and interests. In this section, we develop an optimal weighting approach in order to obtain the optimal weights that maximize the overall preference value of all alternatives as a whole. This approach is based on our notion that the interests of the experts or stakeholders on the alternatives are reflected by their preference to these alternatives through their assessments on the weights and performance ratings of alternatives. The preference value of an alternative is obtained by multiplying its weight by its performance rating. If the alternative performance is to be assessed with respect to multiple attributes, a multiattribute decision making method (e.g. [6, 7, 8, 15]) can be used to obtain an overall performance value. To maximize the total preference value of all alternatives Ai (i = 1, 2, …, m) for individual experts or stakeholders Ej (j = 1, 2, …, n), we can use the following model: Objective m
Maximize
Pj = ∑ wi xij
(5)
Subject to:
Li ≤ wi ≤ U i
(6)
i =1
m
∑ wi = 1
i =1
(7)
wi ≥ 0 ; i = 1, 2, …, m; j = 1, 2, …, n.
where Pj = the total preference value of expert Ej (j = 1, 2, …, n); wi = the best weights of alternative Ai (i = 1, 2, …, m); xij = the performance rating of alternative Ai (i = 1, 2, …, m) given by expert Ej (j = 1, 2, …, n); Li = the smallest of weights of alternative Ai (i = 1, 2, …, m) given by all n experts; Ui = the largest of weights of alternative Ai (i = 1, 2, …, m) given by all n experts. The objective function (5) is to maximize individual experts’ total preference value, which is represented by multiplying the best importance weights of alternatives (decision variables) by their assessed performance rating. Constraints (6) impose that the best weights generated must lie within the weight ranges assessed by all experts. Constraints (7) state that the best weights obtained for individual experts are to be normalized to sum
Optimizing Relative Weights of Alternatives with Fuzzy Comparative Judgment
653
to 1. In practical applications, cardinal weights are usually normalized to sum to 1, in order to allow the weight value to be interpreted as the percentage of the total importance weight [1]. Solving Model (5-7) will obtain the best total preference value Pj* of all alternatives for each individual expert Ej (j = 1, 2, …, n). The best weights generated from Model (5-7) reflect the best interests of individual experts or stakeholders, not necessarily the best interests of all experts or stakeholders as a whole. As such, the optimal weights of alternatives should maximize the overall preference value of all experts or stakeholders. In other words, the difference between the best total preference value of individual experts generated by Model (5-7) and the overall preference value generated by the optimal weights should be minimized. This can be achieved by solving the following model: Objective Minimize Subject to:
n
m
j =1
i =1
* 2 ∑ ( Pj − ∑ Wi xij )
Li ≤ Wi ≤ U i
(8) (9)
m
∑ Wi = 1
i =1
(10)
Wi ≥ 0 ; i = 1, 2, …, m; j = 1, 2, …, n. where Pj* = the best total preference value of expert Ej (j = 1, 2, …, n) generated by the model (5-7); Wi = the optimal weights of alternative Ai (i = 1, 2, …, m); xij = the performance rating of alternative Ai (i = 1, 2, …, m) given by expert Ej (j = 1, 2, …, n); Li = the smallest of weights of alternative Ai (i = 1, 2, …, m) given by all n experts; Ui = the largest of weights of alternative Ai (i = 1, 2, …, m) given by all n experts. The objective function (8) is to minimize the total preference difference between individual experts’ best preference value and the overall preference value of all experts, where Wi (i = 1, 2, …, m) are the decision variables. Constraints (9) impose that the optimal weights generated for all experts as a whole must lie within the weight ranges assessed by all experts. Constraints (10) state that the optimal weights obtained are to be normalized to sum to 1. Solving Model (8-10) will obtain the optimal weights Wi for all alternative Ai (i = 1, 2, …, m).
4 Empirical Study To improve the competitiveness of Taiwan’s air cargo industry in the region, and to further develop Taiwan as a regional and international air cargo hub, Taiwan’s government has been establishing the national air cargo development policy. Thought workshops and extensive discussions among experts in relevant public and private
654
C.-H. Yeh and Y.-H. Chang
sectors, the policy making process has identified 18 alternative development strategies needed for enhancing Taiwan’s air cargo industry in order to ensure its sustainable development. Table 2 shows these 18 development strategies, together with their corresponding development goal. With limited resources available, it is of strategic importance for the government to evaluate and prioritize alternative development strategies in terms of their weight (relative importance) and achievability. Table 2. Development goals and strategies for Taiwan’s air cargo industry
Development Goal A1 Promoting airport competitiveness
Development strategy A11 Enhancing airport capacity
A12
A2
Enhancing integration of transportation systems
A13 A14 A15 A21 A22
A3
Expanding air cargo routes
A31 A32
A4
Improving air cargo management systems
A41 A42 A43
A5
Developing air cargo operation A51 center A52
A53 A6
Expediting direct air cargo links between Taiwan and China
A61 A62 A63
Applying competitive airport charges Improving operational efficiency Enhancing airport management Enhancing airport transit functions Improving inland transportation and logistics infrastructure Establishing information platform among transportation systems Applying flexible route allocation Lifting carriers’ restrictions on the choice of air cargo routes Amplifying air cargo related rules and regulations Strengthening management of hazardous goods Encouraging regional strategic alliance among carriers Expediting the establishment of free trade zones Promoting computerization of air cargo logistics services Fostering internationally qualified professionals Planning facility requirements of direct air cargo services Coordinating technical issues of direct air cargo services Establishing rules and regulations of direct air cargo services
A survey questionnaire was designed to ask the experts in three stakeholder groups (including the government, the general public, and the air cargo industry) to assess (a) the relative weight of 6 development goals and 18 strategies respectively using the
Optimizing Relative Weights of Alternatives with Fuzzy Comparative Judgment
655
hierarchical pairwise comparison approach presented in Section 2, and (b) the achievability level of these strategies, using a set of self-definable linguistic terms. 37 questionnaire forms were distributed to air cargo experts in Taiwan, and 29 effective responses (including 9 government aviation officials, 11 academic researchers, and 9 air cargo practitioners) were received. It is noteworthy that all these experts are with different organizations, and particularly the 9 practitioners are from different sectors of the air cargo industry. Table 3 shows the survey result of applying Equations (1)-(3), which is the average of the fuzzy assessments given by all the experts. Table 3. Fuzzy weights and achievability level of development strategies
Development strategy A11 A12 A13 A14 A15 A21 A22 A31 A32 A41 A42 A43 A51 A52 A53 A61 A62 A63
Relative weight (0.04, 0.14, 0.50) (0.05, 0.17, 0.61) (0.08, 0.32, 1.00) (0.06, 0.22, 0.75) (0.05, 0.18, 0.63) (0.03, 0.08, 0.30) (0.03, 0.09, 0.33) (0.05, 0.18, 0.62) (0.04, 0.15, 0.52) (0.04, 0.12, 0.43) (0.02, 0.05, 0.23) (0.04, 0.12, 0.44) (0.08, 0.29, 0.95) (0.04, 0.14, 0.52) (0.05, 0.16, 0.58) (0.07, 0.23, 0.73) (0.07, 0.24, 0.77) (0.08, 0.30, 0.86)
Achievability level (56.5, 66.1, 75.1) (60.1, 71.0, 78.3) (61.3, 73.7, 80.7) (56.9, 68.3, 76.7) (59.6, 71.1, 79.7) (52.1, 63.2, 72.6) (53.4, 64.3, 73.5) (49.8, 59.1, 69.0) (50.5, 60.7, 70.2) (55.0, 65.1, 75.2) (50.9, 60.3, 70.2) (54.5, 65.9, 74.7) (64.6, 75.1, 83.2) (59.8, 70.0, 77.9) (56.5, 67.6, 75.8) (54.8, 66.4, 74.6) (55.0, 65.0, 73.8) (52.8, 63.3, 72.7)
To ensure the effectiveness and acceptability of the evaluation result, we use Models (5-7) and (8-10) for determining the optimal weights of 18 strategies from the perspective of the air cargo industry in consideration of the views of the government and the general public. As such, the objective function of Model (5-7) is to maximize the interests of the air cargo industry about 18 development strategies, which are represented by the total preference values of 9 practitioner experts (as individual carriers) for these strategies. The preference value of an alternative assessed by an expert in this empirical study is obtained by multiplying the relative weight of the alternative by the achievability level (as the performance rating) of the alternative assessed by the expert. The constraints of Model (5-7) are that the best weights generated must lie within the importance weight ranges assessed by the government and academic experts. Solving the model will obtain the best total preference value Pj* of all strategies for individual carriers j (j = 1, 2, …, n).
656
C.-H. Yeh and Y.-H. Chang
As a national development policy, the 18 strategies should ideally be prioritized based on the best interests of the air cargo industry as a whole, and not just for any particular carriers. As such, the objective function of Model (8-10) is to maximize the total preference difference between individual carriers’ best preference value and the overall preference value of all carriers. The constraints of the model are that the optimal weights generated for all carriers as a whole must lie within the weight ranges assessed by the government and academic experts. To apply the fuzzy survey results to Models (5-7) and (8-10), we use Equation (4) with α = 0 and λ = 0.5 to reflect that we have no particular preference for the fuzzy assessment results made by the experts. α = 0 implies that we use the mean value of a fuzzy number [18]. λ = 0.5 indicates that we weight all the values derived from fuzzy assessments equally. With the crisp values of relative weights and achievability levels of 18 strategies assessed by 9 government officials, 11 academic researchers, and 9 practitioners (acting as 9 individual carriers), Model (5-7) is solved by using LINDO software systems. The best total preference values ( Pj* ) for 9 individual carriers are 54.329, 55.939, 60.654, 69.657, 64.723, 52.944, 77.298, 51.978, and 83.536 respectively. This data are then used to solve Model (8-10) using LINGO software systems. Column 2 of Table 4 shows the optimal weights of 18 strategies. With these optimal weights and the average achievability level accessed by all experts, we can obtain the optimal overall preference value and ranking of 18 strategies as shown in Table 4. Table 4. Optimal relative weight and preference value of development strategies
Development strategy A11 A12 A13 A14 A15 A21 A22 A31 A32 A41 A42 A43 A51 A52 A53 A61 A62 A63
Optimal relative weight 0.051 0.063 0.096 0.078 0.064 0.029 0.027 0.053 0.033 0.043 0.021 0.044 0.083 0.063 0.043 0.058 0.074 0.077
Achievability level 65.908 69.805 71.885 67.293 70.132 62.615 63.753 59.299 60.477 65.103 60.460 65.017 74.299 69.241 66.655 65.282 64.563 62.914
Optimal preference value (ranking) 3.361 (10) 4.398 (8) 6.901 (1) 5.249 (3) 4.488 (6) 1.816 (16) 1.721 (17) 3.143 (11) 1.996 (15) 2.799 (14) 1.270 (18) 2.861 (12) 6.167 (2) 4.362 (7) 2.866 (13) 3.786 (9) 4.778 (5) 4.844 (4)
It is noteworthy that the optimal overall preference value of all strategies (i.e. sum of Column 4 of Table 4) is 66.806, which is greater than the average of the best total preference values of all strategies for 9 individual carriers obtained from Model (5-7)
Optimizing Relative Weights of Alternatives with Fuzzy Comparative Judgment
657
(being 63.451). This suggests that the overall preference value will increase if we consider the best interests of all carriers as a whole, rather than the best interests of individual carriers. This also justifies the use of the optimal weighting approach for the evaluation problem. In Models (5-7) and (8-10), the relative optimal weights of 18 strategies to be generated for maximizing the best interests of the air cargo industry are constrained by the opinions of the government and academic experts, as given by constraints (6) and (9). In addition to their practical significance in considering all stakeholders’ views with focus on the best interests of the air cargo industry, these constraints are necessary for achieving an effective result. If these constraints are not included in the models, the resultant optimal weights of 18 strategies are given in the second column of Table 5. The third and fourth columns of Table 5 show the optimal weights of 18 strategies for the government and academic groups respectively, without considering other stakeholder groups’ views. Clearly, the result of the optimal weighting models by considering only the best interests of one stakeholder group is not desirable. It is interesting to note that the three stakeholder groups hold different views of how these strategies should be prioritized in order to maximize their own best interests. Table 5. Optimal relative weights for three stakeholder groups without constraints
Development strategy A11 A12 A13 A14 A15 A21 A22 A31 A32 A41 A42 A43 A51 A52 A53 A61 A62 A63
Industry oriented 0.000 0.211 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.789 0.000 0.000 0.000 0.000 0.000
Government oriented Academic oriented 0.000 0.000 0.318 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.682 0.000 0.000 0.000 0.000 0.000
0.000 0.000 0.792 0.045 0.000 0.000 0.000 0.000 0.072 0.000 0.000 0.000 0.000 0.000 0.091 0.000 0.000 0.000
5 Conclusion Evaluating decision alternatives such as courses of action, plans, strategies and policy issues often involves assessing relative weights of decision alternatives via surveys using fuzzy comparative judgment. In this paper we have presented a survey based
658
C.-H. Yeh and Y.-H. Chang
optimal weighting approach which can produce optimal weights that maximize the overall preference value of alternatives as a whole. In particular, we have proposed a hierarchical pairwise comparison process for assessing relative weights of a large number of alternatives with comparative judgment. The empirical study conducted has demonstrated the effectiveness of the approach. The approach has general application in optimizing relative weights for assessing policy alternatives based on subjective judgements of different stakeholder groups.
References 1. Belton, V., Stewart, T.J.: Multiple Criteria Decision Analysis: An Integrated Approach. Kluwer, Boston (2002) 2. Blumenthal, A.: The Process of Cognition. Prentice-Hall, Englewood Cliffs, New Jersey (1977) 3. Buckley, J.J.: Fuzzy Hierarchical Analysis. Fuzzy Sets and Systems 17 (1985) 233-247 4. Buckley, J.J., Feuring, T., Hayashi, Y.: Fuzzy Hierarchical Analysis Revisited. European Journal of Operational Research 129 (2001) 48-64 5. Chang, Y.-H., Yeh, C.-H.: A Survey Analysis of Service Quality for Domestic Airlines. European Journal of Operational Research 139 (2002) 166-177 6. Chang, Y.-H., Yeh, C.-H.: A New Airline Safety Index. Transportation Research Part B: Methodological 38 (2004) 369-383 7. Dyer, J.S., Fishburn, P.C., Steuer, R.E., Wallenius, J., Zionts, S.: Multiple Criteria Decision Making, Multiattribute Utility Theory: The Next Ten Years. Management Science 38 (1992) 645-653 8. Hwang, C.L., Yoon, K.: Multiple Attribute Decision Making - Methods and Applications. Springer-Verlag, New York (1981) 9. Kaufmann, A., Gupta, M.M.: Introduction to Fuzzy Arithmetic, Theory and Applications. International Thomson Computer Press, Boston (1991) 10. Klir, G.R., Yuan, B.: Fuzzy Sets and Fuzzy Logic Theory and Applications. Prentice-Hall, Upper Saddle River, NJ (1995) 11. Leung, L.C., Cao, D.: On Consistency and Ranking of Alternatives in Fuzzy AHP. European Journal of Operational Research 124 (2000) 102-113 12. Miller, G. A.: The Magic Number Seven, Plus or Minus Two: Some Limits on Our Capacity for Processing Information. Psychological Review 63 (1956) 81-97 13. Saaty, T.L.: The Analytic Hierarchy Process. McGraw-Hill, New York (1980) 14. Saaty, T.L.: Rank from Comparisons and from Ratings in the Analytic Hierarchy/Network Processes. European Journal of Operational Research 168 (2006) 557-570 15. Yeh, C.-H.: The Selection of Multiattribute Decision Making Methods for Scholarship Student Selection. International Journal of Selection and Assessment 11 (2003) 289-296 16. Yeh, C.-H., Deng, H., Chang, Y.-H.: Fuzzy Multicriteria Analysis for Performance Evaluation of Bus Companies. European Journal of Operational Research 26 (2000) 1-15 17. Yeh, C.-H., Kuo, Y.-L.: Evaluating Passenger Services of Asia-Pacific International Airports. Transportation Research Part E 39 (2003) 35-48 18. Yeh, C.-H., Deng, H.: A Practical Approach to Fuzzy Utilities Comparison in Fuzzy Multi-Criteria Analysis. International Journal of Approximate Reasoning 35 (2004) 179-194 19. Zadeh, L.A.: Fuzzy Sets. Information and Control 8 (1965) 338-353
Model and Solution for the Multilevel ProductionInventory System Before Ironmaking in Shanghai Baoshan Iron and Steel Complex Guoli Liu and Lixin Tang* The Logistics Institute, Northeastern University, 110004 Shenyang, China Tel.: +86-24-83680169; Fax: +86-24-83680169 [email protected], [email protected]
Abstract. This research deals with the production-inventory problem originating from the ironmaking production system in Shanghai Baoshan Iron and Steel Complex (Baosteel). To solve this multilevel, multi-item, multi-period, capacitated lot-sizing problem, a deterministic mixed integer programming (MIP) model based on the minimization of production and inventory costs is formulated to determine the production and inventory quantities of all materials in each time period under material-balance and capacity constraints. Due to the large-scale variables and constraints in this model, a decomposition Lagrangian relaxation (LR) approach is developed. Illustrative numerical examples based upon the actual production data from Baosteel are given to demonstrate the effectiveness of the proposed LR approach. Keywords: Lot-sizing, combinatorial optimization, lagrangian relaxation.
1 Introduction Effective multilevel production-inventory planning has become the focus of research efforts. A wide variety of models and various solution procedures have been proposed. Diaby et al. [1] developed a Lagrangean relaxation-based heuristic procedure to generate near-optimal solutions to very-large-scale capacitated lot-sizing problems (CLSP) with setup times and limited overtime. Afentakis and Gavish [2] developed algorithms for optimal lot sizing of products with a complex product structure. They converted the classical formulation of the general structure problem into a simple but expanded assembly structure with additional constraints, and solved the transformed problem by a branch-and-bound based procedure. Afentakis et al. [3] presented a new formulation of the lot-sizing problem in multistage assembly systems, which led to an effective optimization algorithm for the problem. Billington et al. [4] dealt with the capacitated, multi-item, multi-stage lotsizing problem for serial systems by examining the performance of heuristics found effective for the capacitated multipleproduct, single-stage problem. Billington et al. [5] introduced a line of research on capacity-constrained multi-stage production scheduling problems. A review of the literature and an analysis of the type of problems that existed were presented. In their *
Corresponding author.
M. Gavrilova et al. (Eds.): ICCSA 2006, LNCS 3982, pp. 659 – 667, 2006. © Springer-Verlag Berlin Heidelberg 2006
660
G. Liu and L. Tang
subsequent paper (Billington et al. [6]) they presented a heuristic method, based on Lagrangian relaxation, for multilevel lot-sizing when there was a single bottleneck facility. This paper investigates the problem of making the production and inventory decisions for ironmaking materials manufactured through several stages in Baosteel so as to minimize the sum of setup, production and inventory holding costs while the given demand in each time period is fulfilled without backlogging. After an investigation on the multilevel production-inventory system before ironmaking in Baosteel has been made, the authors of this paper conclude that the process of making hot metals for the steel plant should be viewed as a continuous line process of consecutive batch operations that are illustrated in Figure 1. The problem considered here is virtually a deterministic, dynamic-demand, multilevel, multi-item, multi-period, multi-pass, capacitated lot-sizing problem in a special system structure depicted as a cyclic graph, which could be detected from the features summarized in Table 1. The remainder of this paper is organized as follows. In section 2, a novel mixed integer programming model is formulated to solve the production-inventory problem originating from Baosteel. In section 3, the Lagrangian relaxation approach is presented and the computational results are reported. Finally, Section 4 presents a discussion of future work and concludes the paper.
Fig. 1. The flowsheet of the ironmaking production system in Baosteel Table 1. Features of the production-inventory problem in Baosteel
• Each material may have several predecessors and several successors. • The capacity constraints may include the capacity limitations of multiple resources. • Independent demands could only be given to the final products. • Each material could respond to more than one technology routing. • The structure of the production system is allowed to be cyclic. • Both assembly and disassembly structures exist in the production system. • No backlogging is allowed.
2 Preliminaries and Mathematical Formulation of the Problem For the production-inventory problem considered in this paper, the following assumptions apply:
Model and Solution for the Multilevel Production-Inventory System
661
(1) The initial inventories of all items are zero. (2) No backorder allowed. (3) One material or material group could be changed into multiple products on a one-to-many or many-many operation. (4) One material or material group could be used to produce one product on a oneone or many-to-one operation. (5) In each time period, every intermediate product could be produced on different operations. (6) On one operation each intermediate product can only be produced from entirely different materials or material groups in any time period. (7) Some by-products could be processed to reversely supply their predecessors. (8) Each final product represents one kind of hot metal; (9) The external demands of final products are given and change with time. (10) No inventory problems on the final products are considered. (11) Each operation used to directly produce final products is denoted as ‘0’ and the operation of purchasing as ‘1’. (12) Operation “1” is considered as a one-one operation. (13) Every operation denoted as “0” should be a many-to-one or one-one operation. (14) Each final product only has independent demands. (15) Only one sequence of operations is appointed to produce every final product, so there is no alternative. Parameters in the model are: K = the number of stock yards; T = the set of indices for time periods; N = the set of indices for items in the production process, excluding final products; Nf = the set of final products; L = the set of indices for all operations, excluding reversely supplying; L1 = the set of one-to-many and many-many operations in L; L2 = the set of one-one and many-to-one operations in L, i.e., L2 = L - L1; LR = the operation standing for reversely supplying; Flt = the set of the core products of operation l in period t, l ∈ L1 , t ∈ T ; Bilt = the set of the by-products corresponding to core product i of operation l in period t, l ∈ L1 , t ∈ T , i ∈ Flt ;
Jlt = the set of the products of operation l in period t, l ∈ L2 , t ∈ T ; Dit = the demand for final product i in period t, i ∈ N f , t ∈ T ; Si = the set of immediate successors of item i, i ∈ N ; hi = the holding cost for one unit of item i in a period, i ∈ N ; prit = { l ∈ L i ∈ J lt or i ∈ Flt or i ∈ Blt }, i ∈ N , t ∈ T ; Alt = the set of items needed to be processed on operation l in period t, l ∈ L ∪ {LR } , t ∈T ; reversei = the set of items which could be provided by item i on operation LR, i ∈ N ;
662
G. Liu and L. Tang
{
}
Hijt = l ∈ prjt i ∈ Alt ∪ { LR j ∈ reversei }, i ∈ N , j ∈ Si , t ∈ T ; backi = the set of items which could provide item i on operation LR, i ∈ N ; leadij = the lead time of item j provided by item i on operation LR, i ∈ N , j ∈ reversei ; ρij = the total quantity of item i used to produce one unit of item j, i ∈ N , j ∈ reversei ; Rilt = the setup cost for the production of item i on operation l in period t, i ∈ N , l ∈ L , t ∈T ; msilt = the unit production cost for item i on operation l in period t, i ∈ N , l ∈ L , t ∈T ; rijlt = the total quantity of item i used to produce one unit of item j on operation l in period t, l ∈ L1 , t ∈ T , i ∈ Alt , j ∈ Flt ∩ S i ;
rijlt = 0, l ∈ L1 , t ∈ T , i ∈ Alt , j ∈ Blt ∩ S i ; rij0t = 0, l ∈ L1 , t ∈ T , i ∈ Alt , j ∈ Blt ∩ S i ; rijlt = the total quantity of item i used to produce one unit of item j on operation l in period t, l ∈ L2 , t ∈ T , i ∈ Alt , j ∈ S i ; rijlt = the total quantity of item i simultaneously obtained when one unit of item j is produced on operation l in period t, l ∈ L1 , t ∈ T , i ∈ Flt , j ∈ Blt ; pilt = the capacity absorption coefficient of item i in period t for operation l, l ∈ L2 , t ∈ T , i ∈ J lt ; pilt = the total quantity of the immediate predecessor needed to produce one unit of item i on operation l in period t, l ∈ L1 , t ∈ T , i ∈ Flt ;
Plt = the capacity limit for operation l in period t, l ∈ L \ {1}, t ∈ T ; P1t = the purchasing budget in period t, t ∈ T ; Vk = the largest inventory capacity of stock yard k, 1 ≤ k ≤ K ; M = a very large positive number. Decision variables in the model are: xit = the production quantity for item i in period t, i ∈ N , t ∈ T ; Iit = the inventory position for item i at the end of period t, i ∈ N , t ∈ T ; zilt = the production quantity for item i on operation l in period t, i ∈ N , l ∈ L , t ∈ T ; yilt = the binary variable which takes the value 1 if a setup is made for item i on operation l in period t and zero otherwise, i ∈ N , l ∈ L , t ∈ T ; wijt = the total quantity of item j which is provided by item i on operation LR and can be used in period t, i ∈ N , j ∈ reversei , t ∈ T . Using the above definitions, the production-inventory problem can now be stated as: Minimize C≡
∑ ∑ ∑ (R t∈T l∈L1 ∩ prit i∈Flt
ilt
⋅ y ilt + msilt ⋅ z ilt ) + ∑
+ ∑∑ 1 ⋅ hi ⋅ (I i ,(t −1) + xit + I it ) i∈N t∈T 2
∑ ∑ (R
t∈T l∈L2 ∩ prit i∈J lt
ilt
⋅ y ilt + msilt ⋅ z ilt ) (1)
Model and Solution for the Multilevel Production-Inventory System
663
Subject to
I i ,(t −1) + xit − I it − ∑
∑r
ijlt j∈Si l∈H ijt \{LR , 0}
=
∑ rij 0t ⋅ D jt +
j∈Si ∩ N f
⋅ z jlt
∑ ρ ij ⋅ wi , j ,t +lead , ∀i ∈ N , t ∈ T
j∈reversei
(2)
ij
z klt = riklt zilt , ∀k ∈ Bilt , i ∈ Flt , l ∈ L1 , t ∈ T
(3)
zilt ≤ M ⋅ yilt , ∀i ∈ N , l ∈ prit , t ∈ T
(4)
∑p
⋅ zilt ≤ Plt , ∀l ∈ L1 , t ∈ T
(5)
∑p
ilt
⋅ zilt ≤ Plt , ∀l ∈ L2 , t ∈ T
(6)
∑z
ilt
+
i∈Flt
i∈J lt
l∈ prit
ilt
∑w
j∈Backi
j ,i ,t
= xit , ∀i ∈ {i prit ≠ φ or Backi ≠ φ , i ∈ N }, t ∈ T
1 ⋅ ∑ (I i ,(t −1) + xit + I it ) ≤ Vk , k = 1, 2 i∈U k I i 0 = 0, ∀i ∈ N
, K , ∀t ∈ T
(7)
(8) (9)
I it , xit , zilt ≥ 0, ∀i ∈ N , l ∈ prit , t ∈ T
(10)
yilt ∈ {0, 1}, ∀i ∈ N , l ∈ prit , t ∈ T
(11)
The objective function in (1) represents the total setup, production/purchasing and inventory holding cost for all items over all time periods. Constraints (2) are material balance equations which state that dependent and independent demands are fulfilled from inventory, production and reverse logistics. Constraints (3) are defined to ensure that each core product and its by-products are produced in proportion. Constraints (4) together with Constraints (11) indicate that a setup cost will be incurred when a lot size is produced. Constraints (5) and (6) enforce that production must be within the limitations set by available capacity. Constraints (7) imply that each material is supplied by multiple production routings including reverse logistics. Constraints (8) represent inventory capacity limits. Constraints (9) establish zero initial inventories. Constraints (10) define the value ranges of the variables.
3 Solution Methodology A solution strategy composed of Lagrangian relaxation, linear programming and heuristics is developed. The details for the strategy are presented below.
664
G. Liu and L. Tang
3.1 Lagrangian Relaxation Algorithm
The decomposition approach for the production-inventory problem considered in this paper hinges on the observation that when we disregard constraints (4), the Lagrangian relaxation problem can be decomposed into easily solvable sub-problems for each kind of variables. With these facts, constraints (4) are dualized in the objective function with non-negative multipliers { u ilt }: (LR)
Minimize CLR(uilt)≡ ∑
∑ ∑ (R
ilt
t∈T l∈L1 ∩ prit i∈Flt
⋅ y ilt + msilt ⋅ z ilt ) + ∑
∑ ∑ (R
t∈T l∈L2 ∩ prit i∈J lt
ilt
⋅ y ilt + msilt ⋅ z ilt )
+ ∑∑ 1 ⋅ hi ⋅ (I i ,(t −1) + xit + I it ) + ∑∑ ∑ u ilt ⋅ ( z ilt − M ⋅ y ilt ) i∈N t∈T 2 i∈N t∈T l∈ prit
(12)
Subject to constraints (2), (3), (5)-(11) and
uilt ≥ 0,∀i ∈ N,l ∈ prit , t ∈ T
(13)
The relaxed problem (LR) can be decomposed into two smaller sub-problems, each involving one type of variables. The sub-problems are given as follows. (LR1)
Minimize CLR1(uilt)≡ ∑
∑ ∑R
t∈T l∈L1 ∩ prit i∈Flt
ilt
⋅ y ilt + ∑
∑ ∑R
t∈T l∈L2 ∩ prit i∈J lt
ilt
⋅ y ilt - M ⋅ ∑∑ ∑ u ilt ⋅ y ilt (14) i∈N t∈T l∈ prit
Subject to constraints (11) and (13) (LR2)
Minimize CLR2(uilt)≡ ∑
∑ ∑ ms
t∈T l∈L1 ∩ prit i∈Flt
+ ∑∑
∑u
i∈N t∈T l∈ prit
ilt
ilt
⋅ z ilt + ∑
∑ ∑ ms
t∈T l∈L2 ∩ prit i∈J lt
ilt
⋅ z ilt
1 ⋅ z ilt + ∑∑ ⋅ hi ⋅ (I i ,(t −1) + xit + I it ) i∈N t∈T 2
(15)
Subject to constraints (2), (3), (5)-(10), (13) Solution of the following Lagrangian Dual problem gives the maximum lower bound. (LD)
Maximize
CD(uilt) ≡ min CLR
Subject to constraints (2), (3), (5)-(11) and (13)
(16)
Model and Solution for the Multilevel Production-Inventory System
665
3.2 Sub-problem Solution (LR1)
Minimize CLR1(uilt)≡ ∑
∑ ∑R
t∈T l∈L1 ∩ prit i∈Flt
=∑
∑ ∑ (R
t ∈T l ∈L1 ∩ prit i∈Flt
− M ⋅∑
ilt
ilt
⋅ y ilt + ∑
∑ ∑R
t∈T l∈L2 ∩ prit i∈J lt
− M ⋅ uilt ) ⋅ yilt + ∑
∑ ∑u
t∈T l∈L1 ∩ prit i∈N \ Flt
ilt
⋅ y ilt - M ⋅ ∑∑
∑ ∑ (R
t ∈T l ∈L2 ∩ prit i∈J lt
ilt
⋅ yilt − M ⋅ ∑
∑u
i∈N t ∈T l ∈ prit
ilt
ilt
⋅ yilt
− M ⋅ uilt ) ⋅ yilt
∑ ∑u
t∈T l∈L2 ∩ prit i∈N \ J lt
ilt
⋅ yilt
(17)
Subject to constraints (11) and (13) Recall that yilt is a zero-one variable. To minimize the objective function described by (17), yilt equals zero when its coefficient is positive and one otherwise. Since LR2 is a typical linear programming model, it can be directly solved by standard linear programming software package OSL. 3.3 Construction of a Feasible Solution to the Original Problem
Because the discrete decision variables and the continuous decision variables are calculated separately in different sub-problems, the consistency among them defined by constraints (4) is generally violated. To construct a feasible solution based on the optimal solution of LR2, a heuristic method is adopted to restore feasibility. The algorithmic steps of the heuristics are outlined as follows. Step 1. Take the optimal solution of LR2 as the initial solution. Step 2. Search for item i that meet the conditions: i ∈ I and visit_mark [i] = 0. If there exist no such items, go to Step 5. Step 3. Search for the operation that satisfies the condition: z i l t > 0 . Set yi l t = 1 .
Step 4. If there exists no such operation, set visit_mark [i] = 1, go to Step 2; otherwise go to Step 3. Step 5. Stop. A feasible solution { Iit, xit, zilt , yilt } is obtained. 3.4 Subgradient Algorithm for Dual Maximization
In order to solve the Lagrangian dual problem with respect to (16), a subgradient method is constructed in what follows. The vector of multipliers, uilt, is updated by m m
uiltm +1 = Max{0, uiltm + t θ ( uiltm )}
(18)
where tm is the step size at the m-th iteration and θm( uiltm ) is the subgradient of CLR (uilt). The subgradient component relating the i-th material, the l-th operation and the t-th time period is θiltm (u iltm ) = z ilt − M ⋅ y ilt . The step size tm is given by
666
G. Liu and L. Tang
t m = βm
CU − C m , 0 < βm < 2 || θ m || 2
(19)
where CU is the best objective value found so far and Cm is the value of CLR (uilt) at the m-th iteration. The parameter βm is initially set to be a value greater than 0 and is multiplied by a factor after each iteration. The algorithm terminates when a given iteration number has been executed or a fixed duality gap has been reached. 3.5 Computational Results
The algorithms described in previous sections were programmed in Visual C++. In this experiment, the ironmaking production system in Baosteel described in Figure 1 was considered. To generate representative problem instances, the problem parameters in this experiment are based on actual production data from the ironmaking production system in Baosteel. Since the maximum number of materials used in practice is always less than 75, the performance of those algorithms was evaluated on a set of 120 test problems ranging in size from 15 to 75 materials and 6 to 18 time periods. Because Lagrangian relaxation cannot guarantee optimal solutions, the relative dual gap (CUB-CLB)/CLB was used as a measure of solution optimality, where CUB is the upper bound to the original problem and CLB is the lower bound. Optimal performance and running times of the Lagrangian relaxation method against different problem sizes are presented in Table 2. For all the algorithms tested, the same iteration number, 100, is imposed on the stopping criterion. Table 2. Average duality gaps and the average running times
Problem
Problem structures
No.
Materials × Periods
1
15 × 6
2
35 × 6
8.826898
6.668000
3
55 × 6
8.910097
40.339999
4
75 × 6
9.202016
28.589001
5
15 × 12
6
35 × 12
7.528647
22.871001
7
55 × 12
9.590387
53.315997
8
75 × 12
9.129756
93.825000
9
15 × 18
9.229039
9.835000
10
35 × 18
7.022790
49.191000
11
55 × 18
7.515264
116.329004
12
75 × 18
7.639370
217.553003
8.373741
53.74508
Average
Average Duality gap (%) 7.747066
8.143565
Running time (s) 1.552000
4.872000
Model and Solution for the Multilevel Production-Inventory System
667
From the computational results, the following observations can be made. (1) (2) (3) (4)
The average duality gaps of all algorithms tested are below 10% in all cases. The computational time increases as the number of materials increases. The computational time increases as the number of periods increases. The duality gaps are relatively stable when the number of materials or periods increases.
4 Conclusions In this paper, we discussed the production-inventory problem derived from the ironmaking production system in Baosteel and formulated it as a separable mixed integer programming model in order to determine the production and inventory quantity of each material in each time period under complicated material-balance constraints and capacity constraints. The objective of the model was to minimize the total setup, production/purchasing and inventory holding cost over the planning horizon. A decomposition solution methodology composed of Lagrangian relaxation, linear programming and heuristics was applied to solve the problem. Computational results for test problems generated based on the production data of Baosteel indicated that the proposed Lagrangian relaxation algorithm can always find good solutions within a reasonable time. Our further research may focus on extending the MIP model to other industries.
Acknowledgement This research is partly supported by National Natural Science Foundation for Distinguished Young Scholars of China (Grant No. 70425003), National Natural Science Foundation of China (Grant No. 60274049) and (Grant No. 70171030), Fok Ying Tung Education Foundation and the Excellent Young Faculty Program of the Ministry of Education, China. The authors would like to thank the Production Manufacturing Center in Baosteel for providing a lot of production information and data.
References 1. Diaby, M., Bahl, H.C., Karwan, M.H., Zionts, S.: A Lagrangean relaxation approach for very-large-scale capacitated lot-sizing. Management Science. 38(9) (1992) 1329-1340 2. Afentakis, P. and Gavish, B.: Optimal lot-sizing algorithms for complex product structures. Operations Research. 34(2) (1986) 237-249 3. Afentakis, P., Gavish, B., Karmarkar, U.: Computationally efficient optimal solutions to the lot-sizing problem in multistage assembly systems. Management Science. 30(2) (1984) 222-239 4. Billington, P., Blackburn, J., Maes, J.: R. Millen and L.N.V. Wassenhove, Multi-item lotsizing in capacitated multi-stage serial systems. IIE Transactions. 26(2) (1994) 12-24 5. Billington, P.J., Mcclain, J.O., Thomas, L.J.: Mathematical programming approaches to capacity-constrained MPR systems: Review, Formulation and Problem reduction. Management Science. 29(10) (1983) 1126-1141 6. Billington, P.J., Mcclain, J.O., Thomas, L.J.: Heuristics for multilevel lot-sizing with a bottleneck. Management Science. 32(8) (1986) 989-1006
A Coordination Algorithm for Deciding Order-Up-To Level of a Serial Supply Chain in an Uncertain Environment Kung-Jeng Wang1, Wen-Hai Chih1, and Ken Hwang2 1
Department of Business Administration, National Dong Hwa University, Hualien, 974, Taiwan, R.O.C. [email protected] 2 Graduate Institute of Technology and Innovation Management, National Chengchi University, Taipei, Taiwan, R.O.C.
Abstract. This study proposed a method for coordinating the order-up-to level inventory decisions of isolated stations to cooperatively compensate loss in supply chain performance in a serial supply chain due to unreliable raw material supply. Doing so will improve the customer refill rate and lower the production cost within an uncertain environment in which the chain is highly decentralized, both material supply and customer demand are unreliable, and delay incurs in information flow and material flow. The proposed coordination method is then compared to single-station and multiple-station order-up-to policies. Simulated experiments reveal that our proposed method outperforms the others and performs robustly within a variety of demand and supply situations.
1 Introduction A supply chain is regarded as a large and highly distributed system in the study. Each member in the chain is geographically situated and adheres to distinct management authorities. Centralized control can deteriorate responsiveness and, thus, is inapplicable to such a chain system. A coordinated policy is required to combat fluctuating supply and demand as well as information and material delays within a distributed supply chain. This, in turn, can improve customer satisfaction and reduces supply chain operating costs. Commonly, members in a decentralized supply chain employ installation stock policy to determine ordering quantity. However, such decisionmaking is myopic and has no optimization guarantee. This motivates many researchers to pursuit more efficient way to deal with the issue of order-up-to level decision-making in a decentralized manner. A serial type of supply chain is the focus of this study. A serial supply chain is the essence of a generalized supply chain network. Research regarding managing a nonfully cooperative, distributed serial chain and the serial supply chain inventory optimization with multi-criteria and asymmetric information remains very active [1,2]. This study proposed a novel order-up-to level (OUTL) inventory policy to against demand and material supply uncertainty and incomplete information situations. Typically, a supply chain is quantitatively evaluated from the perspectives of customer responsiveness (refill rates) and cost [3]. Herein, a synthesis index of these two M. Gavrilova et al. (Eds.): ICCSA 2006, LNCS 3982, pp. 668 – 677, 2006. © Springer-Verlag Berlin Heidelberg 2006
A Coordination Algorithm for Deciding OUTL of a Serial Supply Chain
669
measures is applied. Furthermore, the proposed policy is compared to two heuristics, single-station and multiple-station policies [4]. Fuzzy membership function is employed to represent material supply and customer demand within an uncertain environment. Simulation experiments were conducted to examine the robustness of this policy within a variety of supply and demand changes. The rest of this paper is organized as follows. Literature survey is presented in section 2. Section 3 defines the necessary notations. In section 4, we present a coordinated order-up-to level inventory policy and its computational algorithm. We illustrate the algorithm using a serial chain example with five stations. Section 5 evaluates the performance of the proposed coordinated OUTL policy, using a simulation model. The paper summaries with a concluding remark.
2 Literature Review Fundamentally, a supply chain consists of procurement, production, and distribution functionalities. Within each of these units, some degree of uncertainty exists. For instance, material shortage and defect, and damage during transportation can disturb procurement, stock shortage, lead time and lot sizing delays can interrupt production, and demand fluctuations can harm distribution tasks. Fuzzy membership function has been used to represent material supply and customer demand within an uncertain environment [4,5]. To manage inventory control within a serial supply chain, [6] proposed two order-up-to policies, single-station and multiple-station approaches within a distributed information environment. To reduce global supply chain cost and to increase customer satisfaction, inventory control systems within a supply chain were explored via stochastic modeling with periodic review and order-up-to policy within distinct market structures [7]. Moreover, Cohen and Lee [8] developed a model, which integrated production, material management and distribution modules. They addressed decision-making via optimization on lot size, order-up-to quantity, and review period. A probabilistic model was developed to assess service and slack levels of a supply chain within dumpy demands [9]. By presenting a simulation study, real options strategies in the inventory management of strategic commoditytype parts in a supply chain are discussed. [10] Information distortion in a supply chain and its causes was investigated by [11]. These researches mainly focus on how to attack the issues regarding demand/supply uncertainty, and information incompleteness. Notably, each member of a supply chain possesses a certain degree of freedom with which to make decisions. One of the major concerns is thus how to cooperate among the autonomous members. Within a supply chain, inventory ordering policies can be primarily classified into two categories, continuous review policy and periodic review policy. Basically, the former incorporates either a reorder point and order-up-to level policy, or a reorder point and order-quantity policy. The latter assumes that a periodic review time exists, and thus, considers order-up-to level policy and order-quantity policy, respectively. Herein, we will focus on a serial supply chain using periodic review policy with order-up-to level. Although progressive work has been done by researchers to resolve
670
K.-J. Wang, W.-H. Chih, and K. Hwang
the issue of material management decision-making in a decentralized manner, how to develop an appropriate order-up-to level policy in a distributed manner for a serial supply chain are worthy to be further explored, such that the impact of demand/supply uncertainty and asymmetric information can be minimized.
3 The Serial Supply Chain Model During each review period, a specific station examines demand requirements that stem from its downstream station. This review period results in information (in terms of local inventory level and refill rate) propagation delays within the supply chain. Besides, material transferring between stations requires a lead time that causes physical material delays. These delays disturb the demand-and-supply relation between two conjunctive stations, resulting in shortages, excess stock and backlogs. All the lead times are known and fixed to 1, excluding the first station (i.e., the retailer facing i
customers), which is zero. OUTL of a station i is denoted as S . Each station is characterized by its OUTL, transportation lead time, and inventory review period. It is also assumed that the supply chain produces a single product and has no capacity limitations. At the end of a review period, each station will examine its inventory and place orders from its upstream station as required. As well, backlog is allowed. In the highly distributed supply chain, each station is autonomous. It is assumed that refill rate and cost information is only shared between two conjunctive stations. In such an environment, a station will sacrifice local benefit-- refill rate and cost, to improve the refill rate for customers. The supply chain in our following illustration contains several stations. The first is nearest to the customers and the last station is a raw material receiving station. The K overall supply chain OUTL is represented by a vector S . Fuzzy membership function is applied to represent material supply and customer demand within an uncertain environment. Supplier has various ‘reliability’ modes— reliable and unreliable. The reliability of supply is represented by a membership function. As well, customer demand, per period, is denoted as a membership function. Notations are defined as follows. FR i : The refill rate of station i. To represent a proportion of quantity to supply a downstream station (i-1). FR 1 is the customer service level of the supply chain. For each station, let
d j : Customer (or the downstream station) demand in period j.
IK :
Inventory in the end of Kth review period, where KR ⎧ P + I K −1 − DK , I K −1 − DK ≥ 0 I K = ⎨ K −1 , K = 1,2,..., M , DK = ∑ d j , PK is the re0, elsewhere j =1+ ( K −1) R ⎩ plenishment quantity in Kth review period. of period within each review period.
I 0 and P0 equal to zero. R is the number
A Coordination Algorithm for Deciding OUTL of a Serial Supply Chain
671
S Ki : Shortages at the end of review period K of station i, where ⎧− I K , if I K < 0 . SK = ⎨ ⎩0 , if I K ≥ 0 N
D = ∑ d j : Total demand within N periods, where N = MR . j =1
M
Hence, FR i is obtained through 1 −
∑S K =1
i K
D
, 0 ≤ FR
i
≤1
, i = 1, 2, ...5 .
And the required replenishment quantity of a station from its upstream station is equal to OUTL of the station minus the sum of downstream station’s demand per review period ( D K ) and inventory level ( I K ). Cost of a station is comprised of holding cost, shortage cost, and overhead cost owing to a change in OUTL, which requires that the warehouse be re-sizing as well as the material handling facility. These costs in the Kth review period can be easily computed, and denoted as C h i (K ) , C s i (K ) and
i
C a , respectively. The total holding,
shortage, and OUTL overhead costs are defined as
K =1
M
∑C K =1
i a
(K ) ,
i h
(K ) ,
∑C K =1
i s
( K ) and
respectively. The total cost of the ith station is therefore
M M M i i TC i = ∑ Ch (K ) + ∑ Cs (K ) + ∑ C K =1
K =1
N
M
M
∑C
K =1
i a
( K ) . The average cost per product is defined as
i
i TC . TC = ∑ TC where TC =
i
D i =1 Our research framework is as follows. Reliable material supply (100%) is simu1
lated first. The resulting FR is a benchmark service level (BSL). Then, refill rates and cost changes are examined when a material supplier’s reliability decreases. To increase the refill rate up to its BSL, a coordinated OUTL policy is introduced. To construct a simulation system for the serial supply chain, a variable-driven simulation tool, PowerSim [13], is employed. Batch mean method [14] is applied for output analysis and simulation stability is validated through a warm-up checking.
4 Coordinated Order-Up-To Level Policy This study developed a coordinated order-up-to level inventory policy and its computational algorithm. In the serial chain, each station is autonomous and thus, decides its respective OUTL. Each station has its complete cost information and can also obtain partial information from its neighboring stations. For instance, the upstream station’s OUTL and current refill rate of its downstream station can be acquired. When a station adjusts its OUTL, the cost of the station will be evaluated. Finally, each station aims to minimize its local cost.
672
K.-J. Wang, W.-H. Chih, and K. Hwang
The rationale for the design of the OUTL algorithm can be described as follows. A poor refill rate within downstream station indicates the necessity to increase the OUTL of its corresponding upstream station. Similarly, a high OUTL within an upstream station implies a high OUTL of the downstream station. SC is employed to evaluate the ‘current’ system performance, via formula 1, in transient stages of the proposed OUTL algorithm described below. The definition of SC indicates that a higher value is preferred. Considering multiple measures in supply chain has been widely investigated and reported [3]. The synthesized performance measure using refill rate and production costs is gauged to evaluate the OUTL poli1
cies. FR and CI of formula (1) are normalized numbers, such that unit and scale are eliminated and different experimental settings can be fairly compared. Φ (•) maps its content to a linear function between [0, TC goal ]. 1
SC = x ( FR ) + (1-x) Φ (TC current − TC goal ) CI = 1 − TC goal
(CI), where x ∈ [0,1] , and
,
,
(1)
⎧ 0 if TC current − TC goal ≤ 0 ⎪⎪ Φ (TC current − TC goal ) = ⎨TC goal if TC current ≥ 2TC goal ⎪ ⎪⎩TC current − TC goal otherwise
,
,
When material supply is reliable (100% supply), TC goal is the corresponding average cost per product. TC current is the average cost per product under the current supply reliability. SR, which resembles SC, is computed for reliable supply. The procedure contains three stages: initialization stage, supply reliability changing stage, and each-station-executing-OUTL stage. In the initialization stage, BSL is measured for reliable supply. An initial OUTL
K S 0 and SR are obtained. Stage two
computes SC for unreliable supply. Stage three triggers the processes to decide an appropriate amount for increasing OUTL, such that the serial chain can recover its performance from unreliable material supply. The procedure follows a standard hill-climbing and neighborhood search algorithm [15]. In Stage three, every station in the serial chain asynchronously executes the coordinated OUTL policy that follows seven steps to decide individual OUTLs. Step I. If the inventory level of an upstream station has not increased, then halt; otherwise, go to Step II. Step II. (denoted as D algorithm) Each station i negotiates with its neighboring upstream and downstream stations asynchronously. OUTL is adjusted with formula (2), where OUTL adjustment continues until FR P > FR A ; that is each station constantly increases its OUTL by one unit before the refill rate under current (unreliable) supply environment reaches its original refill rate under reliable supply. The OUTL setting (x) with maximal K p is selected. i
i
A Coordination Algorithm for Deciding OUTL of a Serial Supply Chain
673
i i i i MAX { K p ( x) | K p ( x ) = ω F × ( FR p ( x ) − FR0 ) + ω C × (C 0 − C p ( x)) given B}
(2)
x = S 0i to + ∞
where
ω F : weight of refill rate. ωC : weight of cost,
ω F + ωC = 1. C0 i : cost of station i when supply re-
and
liability reduces and a new OUTL is not applied. C P i (x ) : cost of station i when supply reliability reduces and a new OUTL x is applied. FR0 i : refill rate of station i when supply reliability reduces and a new OUTL is not i applied. FR P (x ) : refill rate of station i when supply reliability is reduced and a new OUTL x is applied. i
FR A : refill rate of station i when supply reliability is
high.
B
is
a
conditional
event
in
that
when
i FR P ≤ FR A i , increase OUTL by one unit; otherwise, stop
the heuristic at the station. The purpose of step II is to coordinate the optimal OUTL of a station, such that refill rate to its succeeding station is thus maximized and its own cost is increased to the lowest level. Step III. Compute SC and ε 0 , a proportion of SC and SR before the 1th iteration. ω F to locate the local optimal soluStep IV. Adjust tion between two adjacent stations. An increase in
ωF
thus decreases ωC of a station, indicating that the refill rate of its downstream station can be sacrificed. Similarly, the increase in ωC , thus decreases ω F , indicating that its downstream station can improve the refill rate. Step V. Re-run D algorithm. Computes ε m in the mth iteration to deicide the search direction. Step VI. Do a neighborhood random search of
K S through
ω F by • units (predefined to specify incr incremental precision of ω F ). If ε m < ε 0 , switches the K search direction of S in the opposite (when an imK provement trial of SC through changing S is failed); adjusting
otherwise, search on the same direction. Step VII. Execute D algorithm and compute
εn
until
ε n < ε n −1 (an improvement trial of SC is failed). When
674
K.-J. Wang, W.-H. Chih, and K. Hwang
K ε n < ε n −1 , choose the OUTL S of the n-1 th iteration, and stop the procedure. For the clarifying the proposed procedure, a simulation model via PowerSim simulation package mimicking a five-station serial chain is designed to illustrate the proposed policy. Table 1. Simulation setting
Factor Supply reliability (μr )
Value/Definition Fuzzy membership function: Reliable{100% / 1}; Unreliable{80% / 1, 90% / 0.5, 100% / 0.25} (36,36,36,36,36)
K
Initial OUTL ( S ) Lead time (Li ) Review period ( R )
1 4 i
Unit holding cost ( Ch )
C h = 1.5, C h = 1, C h = 0.5, C h = 0.3, C h = 0.1 1
2
3
4
5
Unit shortage cost ( C s i ) C s 1 = 6.5, C s 2 = 4.3, C s 3 = 2.15, C s 4 = 1.29, C s 5 = 0.43 1 2 3 4 5 OUTL overhead cost C a = 1.1, C a = 0.9, C a = 0.7, C a = 0.6, C a = 0.4 5 ( Ca ) Demand ( μ DR ) Weight (x)
{7/0.25, 8/0.5, 9/0.75, 10/1, 11/0.75, 12/0.5, 13/0.25} 0.5
Fig. 1. Adjustment process
A Coordination Algorithm for Deciding OUTL of a Serial Supply Chain
675
Setting of the number of stations to five is complex enough for observation and manageable for explanation. Table 1 displays the required parameter setting. An initial OUTL is established via a simulated defuzzifying process. When supply reliability reduces to a worse level, the supply chain is affected in which the SC value decreases from 0.760 to 0.567. Figure 1 displays the process of the adjustment. Table 2 compares the supply chain performances of the proposed coordinated OUTL policy with those reliable and unreliable supply. The SC value increases from 0.567 to 0.701. Table 2. A comparison of the supply chain performances.
Material supply reliability mode SC
FR
Reliable 0.760 (=SR) 0.52 7.513
1
TC
(
Unreliable
K unadjusted S 0.567 0.35 9.131
Unreliable
K
) (adjusted S ) 0.701 0.82 10.659
5 Mediation Procedure In the above procedure, it was ascertained that each station decides its own OUTL individually. If a mediator existed within the supply chain to coordinate OUTL, an improved solution could be attained. Therefore, to further improve supply chain performances, a mediation procedure was constructed. In this mediation heuristic, when OUTL is adjusted, FR 1 and TC are both considered. FR1 and OUTL correlate positively. That is, in most cases, if OUTL is increased then FR1 is improved. Alternately, TC and OUTL represent a convex cost trade-off relation, which is due to both an excess and a shortage of materials. The mediation procedure is described as follows: 1. When (i) TC exceeds
C goal twice (a pre-defined threshold value) over in 1
reliable supply (7.513 in the case ) or (ii) FR exceeds BSL , the mediator requests that the station with the highest OUTL reduce its OUTL by one unit. When two or more stations have equal, highest OUTL, the one nearest the customer is reduced first.
K
2. The new S is applied to system adjustment procedure (Section 4.1) and
SC n +1 is computed until SC n +1 < SC n . Select OUTL of the n-1th iteration. Applying the mediation heuristic within unreliable supply, its performances are revealed. Prior to executing the mediation procedure of the unreliable supply case, the
K
local optima is established at S = (42,38,38,38,38) and SC is 0.716.
SC 0 = 0.701. The resulting
676
K.-J. Wang, W.-H. Chih, and K. Hwang
6 Performance Evaluation of the Coordinated OUTL Policy The coordinated OUTL policy is compared to two OUTL heuristics: the single station and the multiple station policies (a detail description of the two heuristics refers to [4]). The single-station policy adjusts OUTL for only one arbitrary station. The multiple station policy adjusts the OUTL of the corresponding stations by using an evaluation index, MF = FR 1 × ΔFR 1 . TC
Note that OUTL overhead cost and the synthesis performance index (formula 1) are applied in all the three policies such that they are fairly compared. Simulation experiments setting is listed in table 1. Figure 1 indicates that the coordinated OUTL policy with the mediation procedure outperforms the other policies. The robustness of the coordinated OUTL policy was examined based on a variety of x varying from 0.1 to 0.9 to represent a variety of viewpoints of performance focus (on refill rate or cost). Within all cases, the coordinated policy outperforms the others. Three types of demand patterns represented as fuzzy member functions and are employed to test the three OUTL policies. Table 3 presents that the coordinated policy outperforms the others under all demand situations.
Fig. 2. Comparison of coordinated, single-station, and multiple-station policies Table 3. The performances for different demand patterns under unreliable supply. Reliable (mean =10, Fuzzy member function= 10/1 ); Unreliable(mean =10, Fuzzy member function = 7/0.25, 8/0.5, 9/0.75, 10/1, 11/0.75, 12/0.5,13/0.25 )
{ }
{
Demand pattern
Reliable Unreliable
}
Single station Single sta- Multiple policy applied to tion policy station policy the 1st station applied to the 5th station 0.620 0.610 0.625 0.597 0.584 0.607
Coordinated policy with mediation procedure 0.716 0.706
A Coordination Algorithm for Deciding OUTL of a Serial Supply Chain
677
The experimental outcomes in the preliminary performance study of the proposed OUTL policy indicate that (i) the coordinated OUTL policy outperforms the singlestation and multiple-station policies for a variety of viewpoints of performance focus either on refill rate or cost, and different demand patterns; and (ii) the mediation procedure is able to further improve supply chain performances on the based of the coordinated OUTL policy.
7 Concluding Remarks This study has proposed a method for coordinating the inventory decisions of isolated stations in a supply chain to cooperatively compensate loss of total cost and refill rate due to unreliable raw material supply. The proposed method consists of a coordinated OUTL policy and a mediation produce. After compared to single-station and multiplestation policies using simulation and through a comprehensively experimental study, it indicates that the proposed method outperforms the others from a variety of viewpoints of performance and under different demand patterns when material supply is unreliable.
References 1. Parker, R. P., and Kapuscinski, R. Managing a non-cooperative supply chain with limited capacity, Yale School of Management. http://welch.som.yale.edu/researchpapers/ (2001) 2. Thirumalai, R. Multi criteria - multi decision maker inventory models for serial supply chains, PhD dissertation, Industrial Engineering, Penn State University (2001) 3. Beamon, B. M. Supply chain design and analysis: models and methods, International Journal of Production Economics, 55, (1998) 281-294 4. Petrovic, D., Roy, R., and Petrovic, R. Modeling and simulation of a supply chain in an uncertain environment, European Journal of Operational Research, 109, (1998) 299-309 5. Petrovic, D., Roy, R., and Petrovic, R. Supply chain modeling using fuzzy set, International Journal of Production Economics, 59, (1999) 443-453 6. Lee, H. L., Billington, C., Carter, B. Hewlett-Packard gains control of Inventory and Service Through Design for Localization, Interfaces, 23(4), (1993) 1-11 7. Lee, H. L. and Billington, C. Material management in decentralized supply Chains, Operations Research, 41(5), (1993) 835-847 8. Cohen, M. A. and Lee, H. L. Strategic analysis of integrated production-distribution systems: Models and methods, Operations Research, 36(2), (1998) 216-228 9. Verganti, R. Order overplanning with uncertain lumpy demand: a simplified theory, International journal of production research, 35(12), (1997) 3229-3248 10. Marqueza, A. C. and Blancharb, C. The procurement of strategic parts. Analysis of a portfolio of contracts with suppliers using a system dynamics simulation model, Int. J. Production Economics, 88 (2004) 29–49. 11. Lee, H. L., Padmanabhan, V., Whang S. Information distortion in a supply chain: the bullwhip effect, Management Science, 43(4), (1997) 546-558 12. Zimmermann, H. J. Fuzzy set theory--and its applications, Kluwer Academic Publishers (1991) 13. PowerSim Corporation PowerSim, Reference Manual (2005) 14. Law, A. M. and Kelton, W. D. Simulation modeling and analysis, McGraw-Hill, Inc., New York (1991) 15. Russell S. and Norvig P. Artificial intelligence: a modern approach, Prentice Hall International (2003)
Optimization of Performance of Genetic Algorithm for 0-1 Knapsack Problems Using Taguchi Method A.S. Anagun and T. Sarac Eskisehir Osmangazi University, Industrial Engineering Department, Bademlik 26030, Eskisehir, Turkey {sanagun, tsarac}@ogu.edu.tr
Abstract. In this paper, a genetic algorithm (GA) is developed for solving 0-1 knapsack problems (KPs) and performance of the GA is optimized using Taguchi method (TM). In addition to population size, crossover rate, and mutation rate, three types of crossover operators and three types of reproduction operators are taken into account for solving different 0-1 KPs, each has differently configured in terms of size of the problem and the correlation among weights and profits of items. Three sizes and three types of instances are generated for 0-1 KPs and optimal values of the genetic operators for different types of instances are investigated by using TM. We discussed not only how to determine the significantly effective parameters for GA developed for 0-1 KPs using TM, but also trace how the optimum values of the parameters vary regarding to the structure of the problem.
1 Introduction The KP is a well-known combinatorial optimization problem. The classical KP seeks to select, from a finite set of items, the subset, which maximizes a linear function of the items chosen, subject to a single inequality constraint. KPs have been mostly studied attracting both theorists and patricians. 0-1 KP is the most important KP and one of the most intensively studied discrete programming problems. The reason for such interest basically derives from three facts: (a) it can be viewed as the simplest integer linear programming problem; (b) it appears as a sub problem in many more complex problems; (c) it may represent a great many practical situations [1]. Many optimization problems are combinatorial in nature as 0-1 KP and quite hard to solve by conventional optimization techniques. Recently, GAs have received considerable attention regarding their potential as an optimization technique for combinatorial optimization problems [2-5]. In this paper, we present an approach for determining effective operators of GAs (called design factors) for solving KP and selecting the optimal values of the design factors. The KP is defined as follows:
⎧n max ⎨∑ p j x j x ⎩ j=1
⎫
∑ w j x j ≤ c, x ∈ [0,1] ⎬ n
j=1
n
⎭
M. Gavrilova et al. (Eds.): ICCSA 2006, LNCS 3982, pp. 678 – 687, 2006. © Springer-Verlag Berlin Heidelberg 2006
(1)
Optimization of Performance of GA for 0-1 KPs Using TM
679
where n items to pack in some knapsack of capacity c. Each item j has a profit pj and weight wj, and the objective is maximizing the profit sum of the included items without having the weight sum to exceed c. Since the efficiency of GAs depends greatly on the design factors, such as; the population size, the probability of crossover and the probability of mutation [6], selecting the proper values of the factors set is crucial. Various reproduction and crossover types can be used in regard to optimizing the performance of the GA. In addition, operator types may affect the efficiency of GA. In this study, to investigate the affect of the factors briefly discussed, we generated nine instances composed of three different sizes (50, 200 and 1000) and three different types correlation (uncorrelated, weakly correlated and strongly correlated). To optimize the efficiency of GAs in solving a 0-1 KP, we examined closely which genetic operators must be selected for each instance by using TM. The paper is organized in the following manner. After giving brief information about 0-1 KP, the GA for 0-1 KPs and the parameters that affect the performance of the GA are introduced. The fundamental information about TM is described next. Then the steps of experimental design are explained and applied to the problems taken into account. The results are organized regarding to the problems. Finally, the paper concludes with a summary of this study.
2 Genetic Algorithm for 0-1 Knapsack Problems GAs are powerful and broadly applicable in stochastic search and optimization techniques based on principles from evolution theory [7]. GAs, which are different from normal optimization and search procedures: (a) work with a coding of the parameter set, not the parameters themselves. (b) search from population of points, not a single point. (c) use payoff (objective function) information, not derivatives or other auxiliary knowledge. (d) use probabilistic transition rules, not deterministic rules [8]. KPs that are combinatorial optimization problems belong to NP-hard type problems. An efficient search heuristic will be useful for tackling such a problem. In this study, we developed a GA (KP-GA) working with different crossover operators (1:single-point,2:double-point,3:uniform) and reproduction types (1:roulette wheel,2:stochastic sampling, 3:tournament) to solve the 0-1 KP. The KP-GA is coded with VBA. The proposed GA for solving 0-1 KPs is discussed below. 2.1 Coding
We shall use an n bit binary string which is a natural representation of solutions to the 0-1 KPs where one means the inclusion and zero the exclusion of one of the n items from the knapsack. For example, a solution for the 7-item problem can be represented as the following bit string: [0101001]. It means that items 2, 4, 7 are selected to be filled in the knapsack. This representation may yield an infeasible solution. 2.2 Fitness
The fitness value of the string is equal to the total of the selected items profit. To eliminate the infeasible string we use penalty method. If any string is not feasible, its fitness value is calculated by using the procedure penalty is given below:
680
A.S. Anagun and T. Sarac procedure penalty: if
∑i=1 w i >c n
then;
begin p:= 1-( ∑i =1 w i /5*c); if p <= 0 then p:= 0.00001; fitness value of string i= fitness value of string i*p; end; n
2.3 Genetic Operators
A classical GA is composed of three operators: reproduction, crossover and mutation. Operators taken into consideration for solving 0-1 KP problems with GA are discussed as follows. The reproduction operator allows individual strings to be copied for possible inclusion in the next generation. The chance that a string will be copied is based on the string's fitness value, calculated from a fitness function. We use three types of reproduction operators; roulette wheel, remainder stochastic sampling, and tournament. Crossover enables the algorithm to extract the best genes from different individuals and recombine them into potentially superior children. KP-GA has three different crossover operators; single point, double point and uniform. Reproduction and crossover alone can obviously generate a staggering amount of differing strings. However, depending on the initial population chosen, there may not be enough variety of strings to ensure the GA searches the entire problem space, or the GA may find itself converging on strings that are not quite close to the optimum it seeks due to a bad initial population. Some of these problems may be prevented by introducing a mutation operator into the GA. In KP-GA, a random number is generated for all genes. If the random number is smaller than mutation rate, the value of the gene is changed. If it is 0, its new value will be 1 and if it is 1, it will be 0, respectively. The value of gene is protected, if random number is bigger than the mutation rate. When creating a new generation, there is always a risk of losing the most fit individuals. Using elitism, the most fit individuals are copied to the next generation. The other ones undergo the crossover and mutation. The elitism selection improves the efficiency of a GA considerably, as it prevents losing the best results. The termination condition is a check whether the algorithm has run for a fixed number of generations. The number of generations of 1000 is selected as termination condition for all of the experiments conducted.
3 Taguchi Method In principle, Taguchi design of experiments is used to get information such as main effects and interaction effects of design parameters from a minimum number of experiments. The behavior of a product or process is characterized in terms of design (controllable) and noise (uncontrollable) factors. Thus, TM may be applied to determine the best combination of design factors and to reduce the variation caused by the noise factors [9].
Optimization of Performance of GA for 0-1 KPs Using TM
681
The TM uses a special design of orthogonal arrays (OAs) to study the entire parameter space with only a small number of experiments. Experimental design using OAs, recommended by Taguchi, not only minimizes the number of treatments for each trial, but also keeps the pair-wise balancing property [10]. An OA is basically a matrix of rows and columns in which columns are assigned to factors or their interactions and rows represent the levels of various factors for a particular experimental trial [11]. The treatment combinations are chosen to provide sufficient information to determine the factors’ effects using the analysis of means. The OA imposes an order on the way the experiment is carried out. Orthogonal refers to the balance of the various combinations of factors so that one factor is given more or less weight in the experiment than the other factors. Orthogonal also refers to the fact that the effect of each factor can be mathematically assessed independently of the effects of the other factors [12]. The first step of designing an experiment with known numbers of factors in Taguchi’s method is to select a most suitable OA, which design to cover all the possible experiment conditions and factor combination. The selection of which OA to use predominantly depends on these items in order of priority [13]: (1) the number of factors and interactions of interest, (2) the number of levels for the factors of interest, and (3) the desired experimental resolution or cost limitations. In order to select an appropriate OA for experiments, the total degrees of freedom need to be computed. The degrees of freedom are defined as the number of comparisons between design factors that need to be made to determine which level is better and specifically how much better it is [14]. While the degrees of freedom associated with a design factor is one less than the number of levels, the degrees of freedom associated with interaction between two design factors is given by the product of the degrees of freedom for the two design factors. Basically, the degrees of freedom for the OA should be greater than or at least equal to those for the design factors. Once the levels of design factors are settled, the analysis of means (ANOM) is conducted to find affection of each factor on the performance criterion by calculating the mean of entire data of the design factors. Hence, the optimum level of each design factor can be found by concentrating on its corresponding response graph. The analysis of variance (ANOVA) is then performed to determine the significant factors for the selected criterion. Finally, a prediction model consisting of the significant factors is built and confidence intervals for estimated mean and each of the significant factors are constructed.
4 The Experimental Design In this paper, TM is applied to search the optimum values of the parameters of GAs designed for 0-1 KPs which are generated in terms of the number of items and correlation among weights and profits of items, respectively. With this consideration, it is aimed that whether the optimum values of the parameters affecting the performance of GAs may be changed while the numbers of items and correlation among weights and profits of items increasing and/or decreasing. Five design factors are identified as potentially important for performance of GAs: (1) population size, (2) crossover rate, (3) mutation rate, (4) crossover type, and (5)
682
A.S. Anagun and T. Sarac
reproduction type. For each of the design factors, based on the related research, three possible levels were considered as: population size of (10, 30, 50), crossover rate of (0.60, 0.75, 0.90), mutation rate of (0.001, 0.005, 0.01), crossover type (1, 2, 3) and reproduction type of (1, 2, 3), respectively. L27 (313) OA was selected since it is the most suitable plan for the conditions being investigated, which allows for examining 13 three-level design factors and/or interactions among them with 27 trials. Regarding to the light of the preliminary tests, the selected factors and the interactions between design factors were assigned to the columns of the OA before conducting tests. For further information on OAs and assigning factors to an OA, readers may refer to [9]. In order to observe the effects of noise sources, each experiment was repeated three times (3*27=81) under the same conditions. The order to the experiments was made random in order to avoid noise sources that had not been taken into account initially and that could affect the results in a negative way. To investigate how KP-GAs behave for different 0-1 KPs, nine types of data instances were randomly generated regarding to the number of items and the correlation among weights and profits of items. The L27 OA was then applied to each of the KPs and the optimum levels of the design factors that affect the performance of the KP-GA were determined separately.
5 Results Based on the combinations of design factors assigned to the selected OA, twentyseven different KP-GAs are formed for each of the 0-1 KPs. Three types of instances representing the correlation among weights and profits of items are generated by using the following definitions and Eq. (2) that are proposed by [15]: Uncorrelated instance: the weights wj and the profits pj are uniformly random distributed in [1, R], R=1000. Weakly correlated instance: the weight wj are distributed in [1, R] and the profits pj in [wj-R/10, wj+R/10] such that pj•1. 1. Strongly correlated instance: the weights wj are distributed in [1, R] and the profits are set to pj=wj+R/10.
Capacity is chosen as c=
1 n ∑wj 10 j=1
(2)
The data obtained from the trials; for instance, when the number of items is 50 and there is no correlation among instances, coded as [50_Unc] , are analyzed as follows. 5.1 Analysis of Mean (ANOM)
As mentioned in [16], the Taguchi has created the S/N (signal to noise) ratio to quantify the present variation. The term “signal” represents the desirable value (mean) and the term “noise” represents the undesirable value (standard deviation). There are several S/N ratios depending on the types of characteristics; lower the better (LB), nominal the best (NB), and higher the better (HB). Since the nature of the objective function for KPs is maximization, higher the better (HB) criterion was selected as a performance statistics:
Optimization of Performance of GA for 0-1 KPs Using TM
683
⎡1 n 1 ⎤ Z HB = −10 log ⎢ ∑ 2 ⎥ ⎣⎢ n i=1 y i ⎦⎥
(3)
Mean S/N Ratio (dB)
where n is the number of repetition of simulation under the same condition of design factors, y the characteristics, and subscript i indicates the simulation number of design factors in the OA table. As shown in Eq. (3), the greater the S/N ratio, the smaller is the variance of performance measure around the desired value. The data for [50_Unc] and Eq. (3) are applied to obtain the S/N response graph which is depicted in Fig.1. From Fig.1, the optimum levels of the design factors of KP-GA designed for [50_Unc] , A3B2C3D3E1 can be found and its corresponding values are shown as; A3: population size of 50, B2: crossover rate of 0.75, C3: mutation rate of 0.01, D3: crossover type of 3, and E1: reproduction type of 1. 80 79.5 79 78.5 78 77.5 77 76.5 76 75.5 75 A1
A2
A3
B1
B2
B3
C1 C2 C3 Design Factor Level
D1
D2
D3
E1
E2
E3
Fig. 1. S/N response graph for [50_Unc]
5.2 Analysis of Variance (ANOVA)
ANOVA, one of the most commonly and widely used analytical tools, is a technique that subdivides the total variation into meaningful components or sources of variation [17]. Using the ANOVA, one may be able to spot the key components (or factors) that cause excessive variation in products or processes. The ANOVA can be done with the raw data or with the S/N data. The ANOVA based on the raw data signified the factors which affect the average response rather than reducing the variation. On the other hand, the ANOVA based on the S/N data takes into account both these aspects and so it was used here. Since all columns in OA are assigned with the factors and interactions, called saturated design, the variations due to error are estimated by pooling the estimates of the factors and interactions having least variance. This also helps in determining Fisher test (F-test) for finding out the confidence level of the results. The ANOVA performed for the S/N data of [50_Unc] using ANOVA_TM software is given in Table 1. Table 1 indicates that factor C has the largest effect on the performance criterion. Factor A and factor E have the next largest effects, respectively. The remaining design factors have no effects due to their F values. Table 1 also indicates that the interaction of AxC has moderate effect and the interaction of CxE has least effect on the performance criterion. It is observed that the response graphs
684
A.S. Anagun and T. Sarac
for the interactions’ effects provided the identical results as the ones obtained regarding to the response graphs for the design factors. However, the interactions, in addition to the main design factors, should be considered to calculate the estimated mean S/N for the problem of [50_Unc] . Table 1. ANOVA for S/N data of [50_Unc] – After pooling Source A B CxD CxE C AxC BxC D E (e) Total
Pool [N] [N] [Y] [N] [N] [N] [Y] [Y] [N] [-]
Df 2 2 4 4 2 4 4 2 2 10 26
S 9.92062 0.53491 0.56962 2.26365 47.46478 14.52187 0.85650 0.34502 5.45266 1.77114 81.92963
V 4.96031 0.26745 0.14240 0.56591 23.73239 3.63047 0.21412 0.17251 2.72633 0.17711 3.15114
F 28.00694 1.51008 3.19525 133.99802 20.49839 15.39343
S’ 9.56640 0.18169 1.55521 47.11056 13.81343 5.09844 4.60490
rho% 11.68 0.22 1.90 57.50 16.86 6.22 5.62
The column under the rho% gives an idea about the degrees of contribution of the factors to the performance criterion. To verify that the factors and levels selected for the experiment are reasonably correct, after pooling, the rho% accounted by the different factors cumulatively should be greater than 60% [18]. Since the total of rho% regarding to the significant design factors and interactions is approximately 94.4%, it may be said that the conducted experiment to find the optimal values of the design parameters for KP-GA applied to [50_Unc] problem are appropriate and furthermore, confirmation tests may be run. 5.3 Confirmation Tests
In order to predict and/or verify the improvement of the performance characteristic, confirmation tests should be conducted using the optimal levels of the design factors based on the results obtained from ANOM and ANOVA. The estimated S/N ratio ηopt using the optimal levels of the main design factors and interactions can be calculated as [19]: o
ηopt = ηˆ + ∑ ( η j − ηˆ ) j=1
(4)
where ηˆ is the total mean S/N ratio, η j the mean S/N ratio of the main design factor or interaction at the optimal level, and o the number of the design factors that affect the performance characteristic. In regard to optimal levels of the design factors that significantly affect the performance of KP-GA used for [50_Unc] , the estimated S/N ratio of 78.94190 (dB) is computed using Eq. (4). The confidence interval of the estimated mean S/N ratio can be calculated by considering the following equation to verify whether the optimal solution of the objective function for the problem of [50_Unc] or targeted value is reached [20]:
Optimization of Performance of GA for 0-1 KPs Using TM
CI =
F(α,1, v e ) Ve N ; n eff = 1 + ∑ (⋅) n eff
685
(5)
where F(α,1, v e ) is the F-ratio required for α = risk, v e the degrees of freedom of error, Ve the pooled error variance, n eff the effective sample size, N is the total number of trials, and ∑ ( ⋅ ) the total degrees of freedom associated with items used in ηopt estimate. A confidence interval of 95% for the estimated S/N, F(0.05,1,10) = 4.96 , Ve = 0.17711 , and the effective sample size is n eff = 1.8 . Thus, the 95% confidence interval of the estimated optimum (dB) is computed as [78.24330; 79.64049]. The optimum levels of the design factors obtained for the problem of [50_Unc] are applied to the other 0-1 KPs generated with respect to the number of items and the correlation among weights and profits of items to examine whether the levels determined for the problem of [50_Unc] by means of TM are appropriate for the remaining generated 0-1 KPs. It is observed that the optimum levels of the design factors are basically problem dependent meaning that the appropriate levels of the design factors should be determined for each of the 0-1 KPs separately. Hence, each of the problems generated are analyzed based on the steps given in Section 5. In addition to results of problem [50_Unc] , the results obtained from analyses for the remaining 0-1 KPs being solved by KP-GA are shown in Table 2. As shown in Table 2, the estimated S/N means for the generated 0-1 KPs are somehow different than the optimal solution obtained using LINGO, software for optimization modeling, although the constructed confidence levels included the optimal solutions for each of the problems concerned. For a validity check, all of the experiments are repeated with the best combinations by considering different number of generations to verify whether the optimal solutions for the problems may be obtained. These experimentations concluded that the best combinations determined by the TM are able to produce the solutions as close as the ones obtained using LINGO, which makes the TM a powerful and sensitive approach for solving 0-1 KPs even if the size of the problems vary in terms of the numbers of items and the correlation among weights and profits of items. Table 2. Results for analyses of generated 0-1 KPs Problem Type Number Correlaof items tion
50 50 50 200 200 200 1000 1000 1000
No Weak Strong No Weak Strong No Weak Strong
Treatment Combination
Estimated S/N Mean
A3B2C3D3E1 78.94190 A3B3C3D1E1 68.97514 A3B3C3D3E1 72.02041 A2B1C2D1E3 90.53733 A3B3C3D1E2 81.39253 A3B3C3D3E2 83.96448 A3B1C1D1E2 104.12979 A3B1C1D1E2 95.11788 A3B1C2D1E2 97.53570 (*) optimal solution of the 0-1 KPs obtained from LINGO.
Confidence Interval (95%)
[78.24330; 79.64049] [68.70190; 69.24840] [71.73480; 72.30610] [90.38260; 90.69206] [81.15990; 81.62520] [83.65620; 84.27280] [103.2520; 105.0076] [94.93680; 95.29900] [97.36880; 97.70260]
Converted S/N*
78.9271 68.9307 71.9209 90.6665 81.6133 83.9770 105.0733 96.0850 98.2418
686
A.S. Anagun and T. Sarac
6 Conclusion The aim of this study is not only to determine the optimum parameters that affect the performance of a GA designed for a combinatorial optimization problem, but also to investigate whether or not the optimum parameters of a GA applied to the problems, specifically 0-1 KPs formed in terms of the numbers of the items and the correlation among weights and profits of items, vary. The results obtained from the experiments conducted are summarized below. The TM may be used to designate appropriate combinations of the parameters of the GA such that the optimal solutions may be reached; exactly the same solution for small and medium size problems, as close as the solution for the large size problems regardless of the correlation among weights and profits of items, comparing with the optimal solution obtained using LINGO. The optimum value of 50 is obtained for the population size regardless of the structures of the problems concerned. The mutation rate is decreased as the size of the problem increased. The double point crossover operator is not significant for the 0-1 KPs generated with respect to items and correlation among weights and profits of items. On the other hand, the appropriate crossover operator is obtained as uniform for small size problems, while single point crossover operator for large size problems despite the correlation. The stochastic sampling reproduction operator seems appropriate for medium and large size problems, while roulette wheel is suitable for small size problems. Since the combinations of the parameters change as the correlation among weights and profits of items shift, as well as the number of items, the correlation should be taken into consideration for solving such problems. The optimum parameters of GA designed for solving 0-1 KPs are dependent on the structure of problem. Hence, the initial values of the parameters may be selected using the results of this study as a table look-up. In order to provide a general outline for the persons who are interested in solving such problems by means of GA, the GA should be run for the different problems regarding to the numbers of items than the ones used in this study.
References 1. Martello, S., Toth, P.: Knapsack Problems Algorithms and Computer Implementations. John Wiley&Sons, England (1990) 2. Sakawa, M., Kato, K.: Genetic Algorithms with Double Strings for 0-1 Programming Problems. European Journal of Operational Research. 144 (2003) 581-597 3. Bortfeldt, A., Gehring, H.: A Hybrid Genetic Algorithm for the Container Loading Problem. European Journal of Operational Research. 131 (2001) 143-161 4. Taguchi, T., Yokota, T.: Optimal Design Problem of System Reliability with Interval Coefficient Using Improved Genetic Algorithms. Computers and Industrial Engineering. 37 (1999) 145-149 5. Michelcic, S., Slivnik, T., Vilfan, B.: The Optimal Cut of Sheet Metal Belts into Pieces of Given Dimensions. Engineering Structures. 19 (1997) 1043-1049 6. Iyer, S.K., Saxena, B.: Improved Genetic Algorithm for the Permutation Flowshop Scheduling Problem. Computers and Operations Research. 31 (2004) 593-606
Optimization of Performance of GA for 0-1 KPs Using TM
687
7. Gen, M., Cheng, R.: Genetic Algorithms and Engineering Design. John Wiley&Sons, New York (1997) 8. Goldberg, D.E.: Genetic Algorithms in Search, Optimization, and Machine Learning. Addison-Wesley, Reading (1989) 9. Taguchi, G.: Systems of Experimental Design. Unipub Kraus International Publishers, New York (1987) 10. Georgilakis, P., Hatziargyriou, N., Paparigas, D., Elefsiniotis, S.: Effective Use of Magnetic Materials in Transformer Manufacturing. Journal of Materials Processing Technology. 108 (2001) 209-212 11. Antony, J., Roy, R.K.: Improving the Process Quality Using Statistical Design of Experiments: A Case Study. Quality Assurance. 6 (1999) 87-95 12. Fowlkes, W.Y., Creveling, C.M.: Engineering Methods for Robust Product Design. Addison-Wesley, Reading (1995) 13. Ross, P.J.: Taguchi Techniques for Quality Engineering. 2nd edn. McGraw-Hill, New York (1996) 14. Nian, C.Y., Yang, W.H., Tarng, Y.S.: Optimization of Turning Operations with Multiple Performance Characteristics. Journal of Materials Processing Technology. 95 (1999) 9096 15. Martello, S., Pisinger, D., Toth, P.: New Trends in Exact Algorithms for the 0-1 Knapsack Problem. European Journal of Operational Research. 123 (2000) 325-332 16. Wu, D.R., Tsai, Y.J., Yen, Y.T.: Robust Design of Quartz Crystal Microbalance Using Finite Element and Taguchi Method. Sensors and Actuators. B92 (2003) 337-344 17. Roy, R.K.: A Primer on the Taguchi Method. VNR Publishers, New York (1999) 18. Ozalp, A., Anagun, A.S.: Analyzing Performance of Artificial Neural Networks by Taguchi Method: Forecasting Stock Market Prices. Journal of Statistical Research. 2 (2003) 29-45 19. Lin, T.R.: Experimental Design and Performance Analysis of TiN-coated Carbide Tool in Face Milling Stainless Steel. Journal of Materials Processing Technology. 127 (2002) 1-7 20. Syrcos, G.P.: Die Casting Process Optimization Using Taguchi Methods. Journal of Materials Processing Technology. 135 (2003) 68-74
Truck Dock Assignment Problem with Time Windows and Capacity Constraint in Transshipment Network Through Crossdocks Andrew Lim1,2 , Hong Ma1 , and Zhaowei Miao1,2 1 Dept of Industrial Engineering and Logistics Management, Hong Kong Univ of Science and Technology, Clear Water Bay, Kowloon, Hong Kong 2 School of Computer Science & Engineering, South China University of Technology, Guang Dong, P.R. China
Abstract. In this paper, we consider the over-constrained truck dock assignment problem with time windows and capacity constraint in transshipment network through crossdocks where the number of trucks exceeds the number of docks available, the capacity of the crossdock is limited, and where the objective is to minimize the total shipping distances. The problem is first formulated as an Integer Programming (IP) model, and then we propose a Tabu Search (TS) and a Genetic algorithms (GA) that utilize the IP constraints. Computational results are provided, showing that the heuristics perform better than the CPLEX Solver in both small-scale and large-scale test sets. Therefore, we conclude that the heuristic search approaches are efficient for the truck dock assignment problem. Keywords: Heuristic search, crossdocks, dock assignment. Areas: Heuristics, industrial applications of AI.
1
Introduction
Dock assignment for trucks is one of key activities at crossdocks. Trucks are assigned to docks for the duration of a time period during which the cargo and trucks are processed. Dock availability and times of arrivals/departures (as given by an estimated time of arrival/departure or ETA/ETD for each truck) can change during the course of the planning horizon due to operational contingencies (for example, delays, traffic control). A familiar scene at crossdocks these days is when arriving trucks are waiting for process, sometimes for a long time, before finally proceeding to their docks, because the gate is occupied by another truck. So they need to schedule those docks well in order to increase the utilization and achieve better performance of the transshipment network. Firstly, good dock assignment can help crossdock increase the utilization by reducing dock delays. Secondly, good dock assignment can minimize distances (times) cargo are required to transfer from dock to dock. Because of the large number of freight and the dynamic nature of the problem, scheduling has become more difficult. This has made it more important for crossdock operators to use docks in the best possible way. M. Gavrilova et al. (Eds.): ICCSA 2006, LNCS 3982, pp. 688–697, 2006. c Springer-Verlag Berlin Heidelberg 2006
Truck Dock Assignment Problem with Time Windows
689
We consider the truck dock assignment in transshipment network through crossdocks. But in classical models, where transshipment is studied in the context of network flow [1]. One such model where transshipment becomes an important factor is in crossdocking which has become synonymous with rapid consolidation and processing. Tsui and Chang used a bilinear program of assigning trailers to doors, where the objective was to minimize weighted distances between incoming and outgoing trailers [8]. Recently, a study by Bartholdi and Gue examined minimizing labor costs in freight terminals by properly assigning incoming and outgoing trailers to doors [2]. Although previous cross-docking studies have considered intra-terminal factors such as types of congestion that impact costs, they do not address actual dock assignments to arriving vehicles when considering the time window of trucks and capacity of crossdocks. Also our problem is similar in some ways to the problem of gate assignments in airports, for which some analytical work exists. For example, the basic gate assignment problem is a quadratic assignment problem and shown to be NP-hard [7]. Since the gate assignment problem is NP-hard, various heuristic approaches have been used by researchers and work has focused on the over-constrained airport gate assignment, where there is an excess of flights over gates [3, 6]. The objective there was to minimize the number of flights without any gate assigned (i.e. those left on the ramp) and the total walking distance. In this paper, we consider the over-constrained truck dock assignment problem with time windows and capacity constraint in transshipment network through crossdocks where the number of trucks exceeds the number of docks available and the capacity of the crossdock is limited, and where the objectives are to minimize the total shipping distances. The problem is formulated as an IP problem. The air gate assignment problem is NP-hard, then our problem is also NP-hard because the air gate assignment problem is a special case of our problem. We use both a Tabu Search and a GA algorithms to solve the problem. Computational results are provided , showing that our heuristics work well in all the test sets. This work is organized as follows: in the next section, we give an IP model. Tabu search and GA algorithms are developed in Section 3 and section 4. We provide computational results in Section 5. In Section 6, we summarize our findings.
2
Problem Description and Formulation
In this section, we provide an IP model for the over-constrained truck dock assignment problem with time windows and capacity constraint which attempts to assign trucks within its time window to docks to minimize shipping distances between docks. We have capacity constraint that the total number of pallets inside the crossdock cannot exceed the capacity of crossdock. Also note that in real world, cargo containers are huge in size, and one pallet usually carries exactly one cargo container at one time. Therefore, terms ’pallet’, ’cargo’, and cargo container’ refer to the same transportation unit in the transshipment network
690
A. Lim, H. Ma, and Z. Miao
throughout the paper. The following notations are used: N : set of trucks arriving at and/or departing from the crossdock; M : set of docks available in the crossdock; n: total number of trucks, that is |N |, where |N | denotes the cardinality of N ; m: total number of docks, that is |M |; ai : arrival time of truck i (1 ≤ i ≤ n); di : departure time of truck i (1 ≤ i ≤ n); wk,l : shipping distance for pallets from dock k to dock l (1 ≤ k, l ≤ m); fi,j : number of pallets transferring from truck i to truck j (1 ≤ i, j ≤ n;) C: capacity of crossdock, i.e. the maximum number of pallets the crossdock can hold at a time. In addition, we use another dock, dock m + 1, which is rent from others for temporary usage when there is not enough capacity left inside the crossdock, or when all the docks are occupied. New arriving truck should go to this dock to load and unload cargoes, but it will incur a much higher cost (distance is longer). This dock can be used by many trucks simultaneously with infinite capacity. We regard it as a reasonable remedy to make the problem always feasible. Let wk,m+1 (wm+1,k ) be shipping distance from dock k (m + 1) to dock m + 1 (k) (1 ≤ k ≤ m). Figure 1 illustrates an outline of the crossdock and major elements of our problem. The following auxiliary variables are used: xi,j ∈ {0, 1}: 1 iff truck i departs no later than truck j arrives; 0 otherwise.
Fig. 1. Truck Dock Assignment Problem with Operation Time Constraint
Truck Dock Assignment Problem with Time Windows
691
The decision variables are as follows: yi,k ∈ {0, 1}: 1 if truck i is assigned to dock k, 0 otherwise. (1 ≤ i ≤ n, 1 ≤ k ≤ m + 1). yi,k = yj,k = 1 implies that ai > dj or aj > di (1 ≤ i, j ≤ n); zijkl ∈ {0, 1}: 1 iff truck i is assigned to dock k and truck j is assigned to dock l (1 ≤ i, j ≤ n,1 ≤ k, l ≤ m + 1). Before we give out the model, in order to make the problem and data reasonable, some remarks of are made as follows: 1) fi,j ≥ 0, iff dj ≥ ai (1 ≤ i, j ≤ n), otherwise fi,j = 0 which means truck i will transfer some cargo to truck j iff truck j departs no earlier than truck i arrives; 2) ai < di (1 ≤ i ≤ n) which means for each truck, the arrival time should strictly smaller than its departure time; 3) n > m which satisfies the over-constrained condition; 4) capacity C is defined follows: when truck i comes, it consumes units of mas m n capacity equal to k=1 l=1 j=1 fi,j yi,k yj,l . On the other hand, when m m n truck j leaves, k=1 l=1 i=1 fi,j yi,k yj,l units of capacity are released. 5) sort all the ai ’s and bi ’s in an increasing order, and let tr (r = 1, 2..., 2n) correspond to these 2n numbers such that t1 ≤ t2 ≤ ... ≤ t2n . Use this notation, we can easily formulate the set of capacity constraints. Our objective is to minimize the total shipping distance of transferring cargo between docks inside the crossdock. The IP model is as follows: min
m+1 n n m+1
fi,j wk,l zijkl
k=1 l=1 i=1 j=1
s.t.
m+1
yi,k = 1(1 ≤ i ≤ n)
(1)
k=1
m m
zijkl ≤ yi,k (1 ≤ i, j ≤ n, 1 ≤ k, l ≤ m + 1)
(2)
zijkl ≤ yj,l (1 ≤ i, j ≤ n, 1 ≤ k, l ≤ m + 1)
(3)
yi,k + yj,l − 1 ≤ zijkl (1 ≤ i, j ≤ n, 1 ≤ k, l ≤ m + 1)
(4)
xi,j + xj,i ≥ zijkk (1 ≤ i, j ≤ n, i = j, 1 ≤ k ≤ m)
(5)
n
k=1 l=1 i∈{i: ai ≤tr } j=1
fi,j zijkl −
m m n
fi,j zijkl ≤ C(1 ≤ r ≤ 2n)
k=1 l=1 i=1 j∈{j: dj ≤tr }
(6)
692
A. Lim, H. Ma, and Z. Miao
yi,k ∈ {0, 1}, yj,l ∈ {0, 1}, zijkl ∈ {0, 1}(1 ≤ i, j ≤ n, 1 ≤ k, l ≤ m + 1)
(7)
Constraints (1) ensures that each truck must be assigned to exactly one dock. Constraints (2)-(4) jointly define the variable z which represent the logic relationship among yi,k , yj,l and zijkl . Constraint (5) specifies that one dock cannot be occupied by two different trucks simultaneously. Finally, constraint (6) is capacity constraint which means that for each time point tr , the total number of pallets inside the crossdock cannot exceed the capacity C.
3
Tabu Search
Tabu search (TS) is a heuristic search procedure that proceeds iteratively from one solution to another by moves in a neighborhood space with the assistance of adaptive memory [4]. We next describe TS memory and the framework used for the problem. 3.1
TS Memory
The solution is represented by a sequences A, which has length n (n is the number of trucks). The sequence A represents the dock assignment. For example, consider an instance with m docks and n trucks. The solution is then a sequence (s1 , s2 , ..., sn ), which means that Truck 1 is assigned to dock (gate) s1 , Truck 2 is assigned to dock s2 , . . . , Truck n is assigned to dock sn (0 ≤ si ≤ m, 1 ≤ i ≤ n). If the truck i is unassigned to any of the docks, which is possible when all the docks are occupied, we give si the value of 0. If the solution is feasible, the dock assignment is then uniquely determined by the sequence of (s1 , s2 , ..., sn ). The representation is depicted in Fig. 2. In the TS memory we implement, only the assignment information is captured so that only the move that has the identical neighborhood exchange move to the assignments will be forbidden.
Fig. 2. The Solution Representation
3.2
Neighborhood Search
We use the a modified neighborhood search approach from Ding [3], which consists of three moves: The Insert Move: Move a single truck to a dock gate other than the one it currently assigns. It is depicted in Fig. 3. The Interval Exchange Move: Exchange two truck intervals in the current assignment. A truck interval is a group of consecutive trucks assigned to one dock gate. The move is depicted in Fig. 4. The Rented Dock Exchange Move: Exchange a truck which is assigned to the rented dock with a truck that is assigned to a general dock gate.
Truck Dock Assignment Problem with Time Windows
693
Fig. 3. The Insert Move
Fig. 4. The Interval Exchange Move
3.3
TS Framework
The TS algorithm can be described by the following steps: 1. Generate an initial solution xinit randomly, set xcurr ← xinit ; 2. Generate a set of neighborhood solutions N (xcurr ) of xcurr by the Insert Move and Interval Exchange Move; 3. The solution x ∈ N (xcurr ) with the least cost and satisfying either one of the two conditions (1) and (2) and must satisfying condition (3) will be selected: (1) it is not forbidden (i.e. the assignment is not identical to any assignments of recent tabu tenure moves); (2) The cost of x is better than the current best cost (aspiration criterion); (3) The occupied capacity of x at boundary time points of all time windows is less than the capacity of crossdock C. 4. Set xcurr ← x ; update the TS memory; 5. If the termination conditions are satisfied, stop; otherwise jump to step 2; When we generate the neighborhood solutions, we randomly choose three types of moves with equal probability. There are two termination conditions: either the best solution cannot be improved within a certain number of iterations, or the maximum number of iterations has been reached.
4
Genetic Algorithm
Genetic algorithms (GA) have become a well-known meta-heuristic approach for difficult combinatorial optimization problems [5]. In this second approach to the dock assignment problem, we found it suitable to solve it. We first discuss some essential components of GA, including solution representation and crossover operators, and then outline the framework of our GA. 4.1
Solution Representation
The chromosome is an important component in GA and has great influence on the algorithm output. In the basic GA, a chromosome is usually encoded as a sequence and represents a solution. Similar to the TS solution representation (see Fig. 2), we here represent the dock assignment solution by a chromosome sequence that defines an assignment of trucks to the docks. During the population
694
A. Lim, H. Ma, and Z. Miao
initialization process, if the randomly generated chromosome is infeasible, we just drop the infeasible sequence and generate a new one. 4.2
Crossover and Mutation
In our problem, chromosomes are not permutation sequences such as in the Travelling Salesman Problem. Hence,well-known crossover operators, such as Partially Mapped Crossover and Cycle Crossover, cannot be used. We implemented two crossover operators: One-Point Crossover and Two-Point Crossover. In the One-Point Crossover, one random crossover point is selected. The first part of the first parent is attached with the second part of the second parent to make the first offspring. The second offspring is built from the first part of the second parent and the second part of the first parent. The following is an example of One-Point Crossover operator (the crossover point is denoted by |): Parent1: (3 2 4 2 4 | 1 3 1), Parent2: (2 1 2 4 3 | 2 2 4) Offspring1: (3 2 4 2 4 | 2 2 4), Offspring2: (2 1 2 4 3 | 1 3 1) In the Two-Point Crossover, two random crossover points are selected for one crossover operation. The chromosomal materials are swapped between two cut points to produce offsprings. This is illustrated in the following example: Parent1: (3 2 | 4 2 4 | 1 3 1), Parent2: (2 1 | 2 4 3 | 2 2 4) Offspring1: (3 2 | 2 4 3 | 1 3 1), Offspring2: (2 1 | 4 2 4 | 2 2 4) In the proposed GA, we use both of the above-mentioned cross over operators with equal probability. We chose the ‘Swap Mutation’ as our mutation operator, which selects two positions at random and swaps the values at those positions. For example, the following mutation swaps the values at position 3 and position 6: (3 2 4 2 4 1 3 1) → (3 2 1 2 4 4 3 1). For simplicity, we do not apply any repair function to the infeasible offsprings. Therefore, if an offspring is infeasible by violating constraints in the problem, we simply remove it from the GA population base. 4.3
GA Framework
With these components of GA, we now outline GA as follows . In this algorithm, #pop, #crossover, #iter and p1 are parameters which are specified within experiments. Initialize Pop with size pop for iter ← 1 to iter do for of f ← 1 to crossover do Randomly select ParentA and ParentB Crossover ParentA and ParentB to produce OffspringA and OffspringB end for for each new-produced individual indv do mutate indv with probability p1 end for
Truck Dock Assignment Problem with Time Windows
695
evaluate each individual select the best pop individuals from all the individuals update current best solution end for
5
Experimental Results and Analysis
All the algorithms were coded in Java and tested on a P4 2.4GHz PC with 512M RAM. As comparison, we use ILOG CPLEX 8.0 to the IP formulation presented in Section 2. The test generation process, parameter settings of various methods, and detailed computational results are presented in the following subsections. 5.1
Test Data and Experiment Setup
We chose a representative layout of a crossdock to have two parallel sets of terminals, where docks are symmetrically located in the two terminals shown in Fig. 5. We set the distance between two adjacent docks within one terminal (e.g., dock 1 and dock 3) to be 1 unit and the distance between two parallel docks in different terminals (e.g., dock 1 and dock 2) to be 3 units. To simplify the problem, we assumed that forklift can only walk ‘horizontally’ or ‘vertically’, i.e., if one forklift wants to transfer one pallet from dock 3 to dock 2, his walking distance is 1+3=4 units. This is similar to the so-called Manhattan metric. In addition, we set the distance between rented dock and any of docks inside the crossdock to be 10 in order to make it undesirable to use.
Fig. 5. Crossdock topology
The start points of truck time window ai (1 ≤ i ≤ n) are uniformly generated in the interval [1, n∗70 m ]. The end points bi are generated as bi = ai + [45, 74]. The number of transferring pallets fi,j is randomly generated in the interval [6, 60] if dj ≥ ai (0 otherwise). In TS, Maximum number of iterations is 106 and each time 100 neighbors are generated with a tabu tenure = 10. The algorithm is to terminate if the best solution was not improved within 104 iterations. In GA, we
696
A. Lim, H. Ma, and Z. Miao
specify iter = 104 , pop = 300, and crossover = 500. The mutation probability p1 is taken to be 0.2. The maximum iteration is 105 and the termination condition was when the best solution did not improve within 500 iterations. 5.2
The Results
We designed two categories of test sets. The detailed results are presented in Table 1–2 respectively. The first row of each table denotes the instance ID or group ID. The second row contains the sizes, where n × m means that there are n trucks and m docks. The rest of the rows provide the results of various methods proposed in this paper. Each result cell contains two values. The value on the top provides the result, whereas the value at the bottom provides the computational time in seconds. 1. Small-scale Test Sets Small-scale test sets are generated with the size (n × m) ranging from 12×4 to 18×7. We see that GA performs extremely well for this group of test sets, as it obtains the best objective values for all test sets. Solution quality of TS are slightly worse than GA while the runtime of TS is considerably faster. At last, CPLEX gets the optimal solutions only for the three smallest test sets within the time limit of 2 hours. For all other 7 test sets that CPLEX exceeds the time limit, the solution quality is generally worse than the two meta-heuristic approaches. 2. Large-scale Test Sets Large-scale test sets ranging from 20×6 to 35×9 are used here. We see from Table 2. that the performance of TS surpasses GA in both solution quality and runtime, but in general, both meta-heuristics have provided quite good solutions. beating GA in 8 of the 10 test sets. We also note the solution time of GA grows very quickly as the problem size expands, which indicates that GA does not scale quite well for the problem. As a whole, we can conclude from the experiment that both two proposed meta-heuristics are effective approaches for the dock assignment problem. If a minimal runtime of solving medium-to-large-scale instances is required, TS is then more preferred than GA. Table 1. Results: Small-scale Test Sets Size CPlex Time(s) TS Time(s) GA Time(s)
12 X 4 7523 1035 7523 0.55 7523 1.89
12 X 5 8914 5460 8917 0.58 8914 1.75
14 X 4 12117 3166 12200 0.56 12117 2.53
14 X 5 3680 ≥ 7200 3578 0.58 3464 2.61
16 X 6 22984 ≥ 7200 21017 0.66 20953 2.91
16 X 7 13049 ≥ 7200 12528 0.64 12489 4.13
18 X 6 14712 ≥ 7200 13760 0.72 13635 4.81
18 X 7 14276 ≥ 7200 13818 0.78 12986 5.58
Truck Dock Assignment Problem with Time Windows
697
Table 2. Results: Large-scale Test Sets Size CPlex Time(s) TS Time(s) GA Time(s)
6
20 X 6 62537 ≥ 7200 40777 1.06 42405 5.36
20 X 7 64682 ≥ 7200 49219 1.02 60610 1.95
25 X 6 101683 ≥ 7200 68610 1.31 68892 9.95
25 X 7 112648 ≥ 7200 76418 1.28 78112 7.20
30 X 8 146595 ≥ 7200 99442 1.84 95918 17.83
30 X 9 134762 ≥ 7200 97981 1.81 98355 15.80
35 X 8 148654 ≥ 7200 105312 2.28 105768 51.11
35 X 9 145725 ≥ 7200 116147 2.34 111341 59.00
Conclusions and Future Work
In this paper, we consider the over-constrained truck dock assignment problem with time windows and capacity constraint in transshipment network through crossdocks where the number of trucks exceed the number of docks available and the capacity of the crossdock is limited, and where the objectives are to minimize the total shipping distances. The problem is formulated as an IP model. We then propose a Tabu Search (TS) and a Genetic algorithms (GA) that utilize the IP constraints. Experiments were conducted using a range of test data sets that reflect realistic scenarios. The heuristic search algorithms are compared with the CPLEX solver, showing they obtain better results within shorter running times. In the future, we think that although this problem considers crossdock optimization problem, it can also be applied to the air gate assignment problem in airport to schedule the flights well in order to achieve better performance. The other direction of research is to study a realistic model which take the operational time, instead of the distance, of each pallet into consideration. Therefore, the arrival/departure time windows of the trucks can be well related to the innercrossdock operations.
References [1] J.E. Aronson. A survey on dynamic network flows. Annals of Operations Research, 20:1–66, 1989. [2] J. Bartholdi and K. Gue. Reducing labor costs in an ltl cross-docking terminal. Operations Research, 48:823–832, 2000. [3] H. Ding, A. Lim, B. Rodrigues, and Y. Zhu. The over-constrained airport gate assignment problem. Computers and Operational Research, 32:1867–1880, 2005. [4] F. Glover and M. Laguna. Tabu Search. Kluwer Academic Publishers, Dordrecht, The Netherlands, 1998. [5] John H. Holland. Adaptation in Natural and Artifical Systems. University of Michigan Press, Ann Arbor, MI, 1975. [6] A. Lim, B. Rodrigues, and Y. Zhu. Airport gate scheduling with time windows. Artificial Intelligence Review, 24:5–31, 2005. [7] T. Obata. The quadratic assignment problem: Evaluation of exact and heuristic algorithms. Technical report, 2000. [8] L. Tsui and C. Chang. Optimal solution to dock door assignment problem. Computers and Indutrial Engineering, 23:283–286, 1992.
An Entropy Based Group Setup Strategy for PCB Assembly In-Jae Jeong Department of Industrial Engineering, Hanyang University, Seoul, 133-791, South Korea [email protected]
Abstract. Group setup strategy exploits the PCB similarity in forming the families of boards to minimize makespan that is composed of two attributes, the setup time and the placement time. The component similarity of boards in families reduces the setup time between families meanwhile, the geometric similarity reduces the placement time of boards within families. Current group setup strategy considers the component similarity and the geometric similarity by giving equal weights or by considering each similarity sequentially. In this paper, we propose an improved group setup strategy which combines component similarity and geometric similarity simultaneously. The entropy method is used to determine the weight of each similarity by capturing the importance of each similarity in different production environments. Test results show that the entropy based group setup strategy outperforms existing group setup strategies. Keywords: Printed circuit board assembly, group setup, entropy method, similarity coefficient.
1
Introduction
This paper considers a group setup problem in a single SMT machine producing multiple types of boards. The head starts from a given home position, moves to feeder carriage on the machine to pick up the component. After picking up the component, the head moves to the placement location on the PCB for this component. Then the component is placed on the board and the head travel back to the feeder carriage to pick up the next component. The pick-and-place process continues until all components required for the board have been completed. Let K be the total number of family and Nf be the number of boards in family f. Then the total number of boards,N = K f =1 Nf . We assume that the head velocity, v (mm/sec) and the feeder installation/removal time,σ are constant for all types of boards. Also, let mf be the number of feeder changes required from family f − 1 to f and di be the length of tour followed by the head to assemble board i. bi is the batch size of board i. Leon and Peters (1996) proposed the following conceptual formulation of the group setup problem: M. Gavrilova et al. (Eds.): ICCSA 2006, LNCS 3982, pp. 698–707, 2006. c Springer-Verlag Berlin Heidelberg 2006
An Entropy Based Group Setup Strategy for PCB Assembly
699
K Nf bi Minimize: Makespan= f =1 (σmf + i=1 v di ) Subject to: Feeder capacity constraints Component-feeder constraints Component placement constraints The objective is to minimize the makespan for producing multiple types of boards. The first term of the makespan is the setup time to remove the previous setups and install components on feeders for current family. The second term is the time to place all components on all boards in a batch for current family. If all boards are grouped as a single family, the setup will occur only once minimizing setup time. However, the single family solution will increase the total placement time since the common setup is not prepared for individual boards. On the other hand, if all boards form a unique family of its own, the placement time reduction will be surpassed by setup time. Hence, boards must be grouped such that within the family, boards share as many common component types as possible (i.e., component similarity) in order to reduce setup time between families. Also the placement locations of boards within the family must be similar to each others (i.e., geometric similarity) in order to reduce placement time. Therefore the development of good similarity coefficient is important issue in a group setup strategy. The decision variables are the number of family K, the types of boards in family f, Nf and the placement sequence of locations in board i and the componentfeeder assignment for family f to determine di . The first constraints represent the feeder capacity constraints. Total number of different component types in any family can not exceed the feeder capacity since only one component type can reside in one feeder slot. The second constraints, component-feeder constraints means that each component needed for boards in a family must be assigned to a feeder. The third constraints, component placement constraints are equivalent to traveling salesman problem (TSP) constraints. That is, the placement head must visit all the placement locations on a board. The distance between two placement locations is the time for the head to move from the first placement location to the feeder slot containing component for the second placement then to the second placement location. Existing group setup strategies (1)considers component similarity only ( Leon and Peter 1998) or (2) forms families of boards based on geometric similarity and select the groups of boards based on component similarity in sequential manner ( Leon and Jeong 2005) or (3) considers an overall board’s similarity coefficient which combines component similarity and geometric measure by assigning equal weights(Quintana and Leon 1999). Leon and Jeong (2005) reported that the performance of group setup strategy of case (2) performs better than other cases. The motivation of this paper was the belief that the determining appropriate weights of case (3) and combining both similarities simultaneously could achieve a further reduction of makespan . Combining different criteria into a synthesized criterion falls into a well known research area, Multiple Criteria Decision Making (MCDM). In this paper, we use the entropy method for calibrating the weights assigned to the component similarity and the geometric similarity. The entropy
700
I.-J. Jeong
concept suggests that if the component similarity or the geometric similarity of boards is the same, the similarity can be eliminated from further considerations in forming the families of boards. Alternately, the weight assigned to a similarity is small if all boards have the similar value of corresponding similarity coefficient.
2
Backgrounds
There are a number of different PCB setup strategies to reduce makespan in the literature (i.e., unique setup, minimum setup, group setup, partial setup). In this section, we focus on partial setup and group setup because the partial setup performs better than other setup strategies (Leon and Peters, 1996) and the implementation of group setup is relatively easier than partial setup in real world. The procedure for partial setup strategy (Leon and Peters, 1996) is summarized in the following steps. Partial setup procedure Step 1: Determine an arbitrary board sequence Step 2: Repeat for a given number of times Step 3: For each board. Step 4: Find a feasible component-feeder assignment Step 5: Repeat for a given number of times Step 6: Find a placement sequence given a component-feeder assignment determined at Step 4 Step 7: Find a component-feeder assignment given a placement sequence determined at Step 6 Step 8: Determine the matrix of sequence-dependent changeover times. Sequence-dependent setup time is the time it takes to remove and install necessary feeders when changing from a board to another board. Step 9: Determine the board sequences that minimizes the total changeover time given the sequence dependent changeover time. As shown in Step 8, portions of the previous setup may remains intact when changing over between boards in partial setup. Therefore only a portion of components are removed and installed between boards which might be a complicated operation. However, in group setup, once all the family has been assembled, all of the components are completely removed from feeder slots. The traditional group setup procedure is summarized in the following steps (Leon and Peters, 1996). Group setup procedure. Phase 1: Clustering (Form K families of boards with similar boards. Family sizes can not exceed the maximum number of feeder slots.)
An Entropy Based Group Setup Strategy for PCB Assembly
701
Step 1: Put each board-type in a single-member family Step 2: Compute similarity coefficient, sij for all pairs of family i and j Step 3: Compute clustering objective values Step 4: Set T = max(sij ) Step 5: Merge the pair of board i∗ and j ∗ , if si∗ j∗ = T. Repeat until no more pairs can be merged at similarity level T. Step 6: Compute clustering objective and save the clustering solution if an improvement was achieved. Step 7: Repeat Step 2 through 6 while merging is possible. Phase 2: Component-feeder assignment and placement sequence. Step 8: Form a composite-board H(f ), f =1,. . . , K, this board consists of the superposition of all the placement locations with their corresponding components of the boards in family f. Step 9: Determine a feasible component-feeder assignment C(Hf ) Step 10: For all i ∈ Nf , find a placement sequence P (i), given C(Hf ) Step 11: For all i ∈ Nf , find a component-feeder assignment C(Hf ) given P (i) Step 12: Repeat Step 10 and Step 11 for a predetermined number of iterations. In Phase 1, the hierarchical clustering algorithm merges similar boards into a family. The clustering procedure continues until all boards form a single family. To form good families of boards, it is essential to develop a similarity coefficient which considers both the component similarity and the geometric similarity of any two boards. Another issue in hierarchical clustering is the development of clustering objective in order to evaluate the quality of board clustering (e.g., minimization of the similarity coefficient between families, maximization of the similarity coefficient within families). In phase 2, we consider each family as a single composite-board,Hf and determine the component-feeder assignment and placement sequence. For a given component-feeder assignment, C(Hf ), the placement sequencing problem can be solved as TSP problems. In this paper, we use the nearest-neighbor heuristic to solve the TSP. For a given placement sequences, P (i), the component-feeder assignment problem is a LAP. In this implementation, the LAP is solved using the shortest augmenting path algorithm proposed by Jonker and Vogenant (1987). The LAP/TSP heuristic terminates when it reaches the predetermined number of iteration. Currently, there exists two group setup strategies (i.e., Placement Location Matrix (PLM) based group setup strategy(Quinntana and Leon,1998 ) and Minimum Metamorphic Distance (MMD) based group setup strategy (Leon and Jeong, 2005)) which consider both component similarity and geometric similarity in the literature. Each strategy uses the same frame work of two phase procedure except the definition of the similarity coefficient in Step 2 and the clustering objective in Step 5. PLM based group setup strategy uses the following board’s similarity coefficient.
702
I.-J. Jeong
sij : similarity of board i and j. xi∩j : Number of Common Component (NCC) types between board i and j. xi∪j : total number of different component types required by board i and j. Dij : dissimilarity of board i and j. Fij : frequency ratio of the number of placement locations between board i and√j. X 2 + Y 2 :point magnitude of coordinate (X,Y ). pki : point magnitude of k th sorted placement location in ascending order for board i. ni : number of placement location of board i. N Pj :number of placement locations of board j. n∗ =min(ni , nj ). Xrange, Yrange: Cartesian distance of the largest board. CC sij = 0.5sN + 0.5(1 − Dij )Fij ij CC where sN = ij
xi∩j xi∪j ,
Dij =
√ √ ∗
n
n∗ k
(pki −pkj )2
Xrange2 +Y range2
, Fij =
(1) min(N Pi ,N Pj ) max(N Pi ,N Pj ) ,
The nominator of Dij measures the dissimilarity of the magnitude of boards and the denominator is the normalizing factor. Therefore (1- Dij ) represents the similarity measure of board i and j. Fij measures the frequency ratio of the number of placement locations between two boards. Therefore, two boards with the same number of placement locations are strongly associated. There are some limitations on PLM methods. First, point magnitude of two different points could be the same. For example, point magnitude of (a,b) is the same as the one of (b,a). This could be wrongly interpreted such that there is no dissimilarity between two points. Secondly, giving equal weights for similarities may not appropriate in cases where the placement time becomes more important than the setup time or vise versa in reducing makespan. A sequential treatment of component similarity and geometric similarity has been proposed by Leon and Jeong (2005) namely, Minimum Metamorphic Distance (MMD) based group setup strategy. Suppose that board i and j have the same number of placement locations of component type c. Then the Euclidean distance matrix from locations in board i to board j can be constructed. The problem is to find the best assignment of from-to locations which minimize the c total sum of Euclidean distance (M M Dij ). The solution can be easily found using LAP method. When boards with different number of locations are used, all the locations on the board with more locations are assigned to the locations on the board with less number of locations. In MMD based setup, a new geometric similarity has been proposed as follows: c M M Dij : minimum metamorphic distance of board i and board j for component type c. p : placement locations of board i. q :placement locations of board j. dcpq : Euclidean distance between location p and q with component type c.
An Entropy Based Group Setup Strategy for PCB Assembly
703
c M M Dij = 1 − ∀c c ∀c ∀p max∀q (dpq )
(2)
sMMD ij
As shown in equation (2), when MMD increases, the geometric similarity decreases. The authors suggested a group setup strategy considering the component CC similarity (i.e.,sN in equation (1)) and the MMD based geometric similarity ij MMD (i.e.,sij in equation (2)) sequentially. In hierarchical clustering, the proposed procedure merges two boards with the largest MMD similarity. Then the clustering objective is the maximization of average sMMD within families per unit ij feeder change between families. Therefore, the clustering objective is maximized when all boards in families are geometrically similar (i.e., placement time is minimized) and the number of feeder change is minimized (i.e., setup time is minimized). The limitation of the MMD based group setup is that the component similarity and the geometric similarity are not considered simultaneously. Forming the families of boards considering only geometric similarity may reduce the possibility of generating solutions which is favorable in reducing setup time. However the authors reported that MMD based group setup outperformed the PLM based group setup. In section 3, we propose a new group setup strategy CC which combines sN and sMMD using the entropy method. ij ij
3
Entropy Based Group Setup Strategy
In the past two decades, there has been of enormous growth in the area of multiattributes optimization. One of the most important issue in this research area is the development of appropriate weights for different attributes. As each attribute has different scale, synthesizing attributes by giving appropriate weights to each attribute is essential to solve the optimization problem. The entropy method suggests that the weight assigned to a criterion must be small if all alternatives have similar value for the criterion. On the other hand, when the difference between a criterion’s values is great, the criterion must be considered as important by giving large weight. Let N CCij = xi∪j : number of common component type between board i and board j. c M M Dij = ∀c M M Dij : minimum metamorphic distance between board i and board j ∀i, ∀j, i = j. Then the entropy measures of the criteria for NCC and MMD are as follows: e(N CC) = −
N N N CCij i=1 j=1
e(M M D) = −
SN CC
ln
N N M M Dij i=1 j=1
SMMD
N CCij SN CC
(3)
M M Dij SMMD
(4)
ln
704
I.-J. Jeong
N N j=1 N CCij , SMMD = i=1 j=1 M M Dij When all N CC N CCij are equal, then SN CCij = N (N2−1) and the maximum of e(N CC) is achieved which is emax (N CC) = ln N (N2−1) . This implies that if the value of
where SN CC =
N N i=1
a criterion is evenly distributed, then the entropy of the criterion is maximized and the entropy is minimized when the criterion value is biased. By setting a 1 1 normalization factor,K = emax (N , 0 ≤ e(N CC) ≤ 1 can be N (N −1) CC) = ln(
2
)
achieved. Therefore the normalized entropy measures of equation (3) and (4) are e(N CC) = −K
N N N CCij i=1 j=1
e(M M D) = −K
SN CC
ln
N N M M Dij i=1 j=1
SMMD
N CCij SN CC
(5)
M M Dij SMMD
(6)
ln
We impose a large weight for a criterion when the corresponding entropy measure is small since the information transmitted by the criterion is great (i.e., there exists great difference between the values of the criterion). The weights are calculated as follows: WN CC =
1 − e(N CC) 2 − (e(N CC) + e(M M D))
(7)
WMMD =
1 − e(M M D) 2 − (e(N CC) + e(M M D))
(8)
Using the entropy method, we propose a board’s similarity coefficient of board i and j as follows; CC sij = WN CC sN + WMMD sMMD (9) ij ij CC where sN is the component similarity as shown in equation (1) and sMMD is ij ij the MMD based geometric similarity as shown in equation (2). It is important to note that the entropy method can easily be extended to the development of board’s similarity coefficient with more than two criteria. The entropy based group setup strategy uses the generic group setup procedure in section 2 with the board’s similarity coefficient in equation (9). The clustering objective is as the same as the one of MMD based setup (i.e., maximization of average similarity within families per unit number of feeder change between families).
4
Experiments
In this paper, we consider a generic machine that has 70 feeder slots with 20mm between the slots. The board dimensions are maximum 320mm 245mm and the coordinates for each board were randomly generated from uniform distributions as follows: X=635+U(0,245), Y=254+U(0,320). The home position coordinate is
An Entropy Based Group Setup Strategy for PCB Assembly
705
(0,0) and the first feeder slot location is (457,0). The number of component types required per board were generated from U(6,20) from 70 different component types. We considered the time to install or remove feeder, in cases of 30(sec) and 60(sec). The head velocity, v was tested for 100(mm/sec) and 300(mm/sec). The batch size of boards, b were generated from U[50,100]. Also the total number of boards, N were generated from U[5,15]. The placement locations and the corresponding component types were generated from a seed board. A seed board is created with location (Lsx (i), Lsy (i)) where Lsx (i) is the x-coordinate of ith placement location for seed board and Lsy (i) is the y-coordinate. Cs (i) is the component types of ith placement location for the seed board. We fixed the number of placement location to 50 for the seed board. Based on the component similarity (C) and geometric similarity (G), another board (i.e., a child board) is created using the following formula; Lcx (i) = Lsx (i) + (1 − G) × 0.5 × 245 × U (−1, 1)
(10)
Lcy (i) = Lsy (i) + (1 − G) × 0.5 × 320 × U (−1, 1) Cs (i) with probability C Cc (i) = U(1,N Cc ), otherwise
(11) (12)
Where Lcx (i) is the x-coordinate of ith placement location for child board and Lcy (i) is the y-coordinate. Cc (i) is the component types of ith placement location for the child board. N Cc is the number of component type of the child board c. Based on these experimental factors and parameters, we generated 16 problem types as shown in Table 4. Each problem set consists of 20 random problems. Table 1. Problem types Head
Feeder
Component
Geometric
velocity
change time
similarity
similarity
(mm/sec)
(sec)
(C)
(G)
1
100
30
0.2
0.75
2
100
30
0.75
3
100
60
0.2
Problem type
Head
Feeder
Component
Geometric
velocity
change time
similarity
similarity
(mm/sec)
(sec)
(C)
(G)
9
100
30
0.2
0.2
0.75
10
100
30
0.75
0.2
0.75
11
100
60
0.2
0.2
Problem type
4
100
60
0.75
0.75
12
100
60
0.75
0.2
5
300
30
0.2
0.75
13
300
30
0.2
0.2
6
300
30
0.75
0.75
14
300
30
0.75
0.2
7
300
60
0.2
0.75
15
300
60
0.2
0.2
8
300
60
0.75
0.75
16
300
60
0.75
0.2
To measure the performance of the different setup strategies, the deviation from partial setup is computed as follows: setupstrategy −M P S Percent deviation from partial setup= M × 100% MP S Percent M P S represents the makespan of partial setup (PS) strategy. M setup strategy corresponds to the makespan of PLM based group setup (PLM), MMD based group setup (MMD) and the Entropy based group setup (ENT).
706
I.-J. Jeong
Table 2 summarizes the average setup time, average placement time and average makespan of different setup strategies. Consider problem type 1 where head velocity is 100mm/sec, feeder change time is 30sec, component similarity is 20% and geometric similarity is 75%. Note that in this specific problem, the placement time is more important than setup time since the head velocity is slow, the feeder change time is short. Result shows that PLM performs better in terms of setup time than PS, MMD and ENT. This is because PS, MMD and ENT achieve an improvement in the reduction of the placement time instead of setup time. In addtion ENT assigns the larger weight for the geometric similarity of boards than MMD under consideration. As a result, ENT dominates PLM and MMD by reducing about 8% and 4% of makespan relatively as shown in Table 2. In summary, test results show that ENT outperforms PLM and MMD in terms of makespan. The maximum percent deviation from PS of ENT is 2.72% while MMD and PLM are 5.6% and 9.35% respectively. This result implies that ENT balances the tradeoff between the setup time and the placement time and finds the solution that minimizes the makespan for all types of problems. Table 2. Summary of experimental results (PLM-PS)/PS*100
Problem type
*
(MMD-PS)/PS*100
*
(ENT-PS)/PS*100
Setup time
Placement
Makespan
Setup time
Placement
Makespan
Setup time
Placement
Makespan
Average
time Average
Average
Average
time Average
Average
Average
time Average
Average
1
-25.15
9.51
8.69
3.93
3.45
3.46
9.61
1.04
1.25
2
-21.58
1.21
0.86
-23.51
1.38
1.00
-10.64
0.90
0.72
3
-16.64
8.37
7.37
11.91
2.97
3.32
24.00
-0.30
0.66
4
-3.97
0.95
0.83
-10.99
1.52
1.21
-5.79
0.93
0.76
5
-15.67
8.61
7.25
12.25
4.89
5.30
34.44
-0.66
1.30
6
6.26
1.26
1.43
-3.86
0.97
0.81
15.17
0.53
7
14.70
0.19
1.02
8.94
0.61
1.09
18.67
0.24
1.29
8
21.71
-0.03
1.19
15.94
0.38
1.25
25.75
-0.11
1.35
9
-22.17
8.96
8.24
2.69
4.46
4.42
16.37
-0.04
0.34
10
-23.50
1.14
0.73
-26.87
1.66
1.19
-11.45
1.23
1.02
11
-24.37
10.79
9.35
4.97
4.93
4.94
13.45
2.26
2.72
12
-2.62
0.78
0.70
-8.26
1.25
1.01
10.06
0.72
0.96
13
-12.87
7.93
6.74
7.01
5.52
5.60
22.69
0.71
1.97
14
-5.38
1.36
1.12
0.67
0.86
0.85
12.31
0.22
0.65
15
12.55
0.04
0.75
12.99
0.10
0.83
39.29
-0.40
1.86
16
36.38
0.11
0.99
-21.54
1.44
0.89
-4.39
0.92
0.79
1.01
Overall
*
Average
-5.15
3.82
3.58
-0.86
2.27
2.32
13.10
0.51
1.17
Min
-25.15
-0.03
0.70
-26.87
0.10
0.81
-11.45
-0.66
0.34
Max
36.38
10.79
9.35
15.94
5.52
5.60
39.29
2.26
2.72
:Results from Leon and Jeong (2005)
5
Conclusions
This paper has presented an improved group setup strategy based on entropy method considering both component similarity and geometric similarity. It has demonstrated how the entropy method determines weights for different criteria to adapt to a variety of production conditions. The improved group setup strategy dominated PLM or MMD based group strategy for all types of problems. Overall, improved group setup strategy deviated from partial setup, maximum 2.72% and average 1.17%. Future research includes the extension of the multiple SMT machines and the consideration of multiple criteria in grouping PCBs (e.g., due dates).
An Entropy Based Group Setup Strategy for PCB Assembly
707
Acknowledgments This research was supported by the research fund of Hanyang University (HY2003). The author thanks Professor V. Jorge Leon at Texas A&M University for constructive suggestions.
References 1. Askin, R. G., Dror, M., and Vakharia, A. J.: Printed circuit board scheduling and component loading in a multimachine, openshop manufacutring cell. Naval Research Logistics 41(5) (1994) 587–608 2. Ball, M. O. and Magazine, M. J.: Sequencing of insertions in printed circuit board assemblies. Opearations Research 36(2) (1988) 192–201 3. Chan, D. and Mercier, D. :IC Insertion: An Application of the Traveling Salesman Problem. International Journal of Production Research 27(10) (1989) 1837–1841 4. Crama, Y.:Combinatorial optimization models for production scheduling in automated manufacturing systems. European Journal of Operational Research 99 (1997) 136–153 5. Crama, Y., Kolen, A. W. J., Oerlemans, A. G., and Spieksma, F. C. R.: Throughput rate optimization in the automated assembly of printed circuit boards. Annals of Operations Research 26 (1990) 455–480 6. Jonker, R. and Volgenant, A.: A shortest augmenting path algorithm for dense and sparse linear assignment problems. Computing 59 (1987) 231–340 7. Leon, V. J. and Jeong I. J.:An improved group setup strategy for PCB assembly. Lecture Notes in Computer Science. 3483 (2005) 312–321 8. Leon, V. J. and Peters B. A.: Re-planning and analysis of partial setup strategies in printed circuit board assembly systems. International Journal of Flexible Manufacturing Systems Special Issue in Electronics Manufacturing 8(4) 1996 389–412 9. Leon, V. J. and Peters B. A.: A comparison of setup strategies for printed circuit board assembly. Computers in Industrial Engineering 34(1) (1998) 219–234 10. Lofgren, C. B. and McGinnis, L. F.: Dynamic scheduling for flexible printed circuit card assembly. Proceedings of the IEEE Systems, Man, and Cybernetics Conference, Atlanta, GA, (1986) 11. Quintana, R., Leon, V. J.: An improved group setup management strategy for pick and place SMD assembly. Working paper (1997)
Cross-Facility Production and Transportation Planning Problem with Perishable Inventory Sandra Duni Ek¸sio˘ glu and Mingzhou Jin Department of Industrial and Systems Engineering, Mississippi State University, P.O. Box 9542, Mississippi State, MS 39762, USA [email protected] Abstract. This study addresses a production and distribution planning problem in a dynamic, two-stage supply chain. This supply chain consists of a number of facilities and retailers. The model considers that the final product is perishable and therefore has a limited shelf life. We formulate this problem as a network flow problem with a fixed charge cost function which is NP -hard. A primal-dual heuristic is developed that provides lower and upper bounds. The models proposed can be used for operational decisions.
1
Introduction
This paper investigates a planning model that integrates production, inventory and transportation decisions in a two-stage supply chain. Production and transportation activities have usually been studied separately by industry and academia, mainly because (i) each problem in itself is difficult and therefore the combined problem is not tractable, and (ii) different departments in an organization are in charge of each activity. In fact, the two activities can function independently if there is a sufficiently large inventory buffer that completely decouples the two. This, however, would lead to increased holding costs and longer lead times. The pressure of reducing costs in supply chains forces companies to take an integrated view of their production and distribution processes. The supply chain analyzed in this paper consists of a number of facilities, each with similar production capabilities, and a number of retailers. We assume that retailers’ demand for a single perishable product is known deterministically and that there are no production or transportation capacity constraints. Facilities produce the final product and carry inventories to satisfy retailers’ demands during the planning period. We assume that there is no transshipment between facilities. This situation is typical in the food and beverage industry, where the retailers are often supermarkets and restaurants that have a very limited storage capacity. Most of the food products are perishable and have a limited lifetime. To account for this, we constraint the number of periods that a product is stored at a facility before being shipped to the retailer. The decisions that need to be made are (i) the timing of production; (ii) the location and size of inventories; and (iii) the timing of shipment. The proposed model is suitable for tactical and operational planning. We assume that the planning period is a typical one, and it will repeat itself over M. Gavrilova et al. (Eds.): ICCSA 2006, LNCS 3982, pp. 708–717, 2006. c Springer-Verlag Berlin Heidelberg 2006
Cross-Facility Production and Transportation Planning Problem
709
time. This means the model is cyclic in nature. For this reason, we assume a fixed starting and ending period with varying initial and ending inventories. We model the problem as a network flow problem with fixed charge cost functions. This is an NP-hard problem since even some of its special cases are known to be NP-hard. For example, the single-period problem is a fixed charge network flow problem in bipartite networks (Johnson et al. [1]) that is NP-hard. The complexity of this problem led us to consider heuristic approaches. Production/inventory problems for perishable products have been studied. However, very little has been done in the area of integrated production and distribution planning for perishable products. Nahmias [2] presents a review of ordering policies for perishable inventories. Hsu [3] studied the economic lotsizing problem with perishable products. In this model, the deterioration rate of the inventory and its carrying cost in each period depend on the age of the stock. Myers [4] presents a linear programming model to determine the maximum satisfiable demand for products with limited shelf life. Previous work of the authors (Ek¸sio˘ glu et al. [9]) has been focused on variants of this production and distribution planning problem. In addition to previous work, this paper considers that the final product is perishable and has limited lifetime; and relaxes the assumption of constant initial inventory that is found in most inventory management models.
2
Problem Description and Formulations
This section presents a mixed integer linear programming (MILP) formulation of the cross-facility production and transportation planning problem with perishable inventory. Let F denote the number of facilities that produce and store the final products that are then delivered to R customers. The facilities have identical production capabilities. In other words, the final product can be produced in each facility. However, the setup and transportation costs, as well as the unit production and inventory holding costs, differ from one facility to the other, from one time period to the next. Given the projected demand of each retailer during the T -period planning horizon, the production and transportation planning problem decides how much to produce, transport and hold in inventory at each facility in order to meet demand at minimum cost. We assume that the planning horizon of length T is a typical one and repeats itself over time. All problem data are assumed cyclic with cycle length equal to T (bj,T +1 = bj1 , bj,T +2 = bj2 , . . . , where bjt is the demand at retailer j in period t). As a result, the inventory pattern at the facilities will be cyclic as well. We model this by letting the initial inventory be equal to the last period inventories. 2.1
Original Problem Formulation
Property 2 of an optimal solution to our problem (Section 2.3) implies that demand in a particular time period is satisfied from exactly one facility. We develop a network flow model based on the single source assignment property. Let
710
S.D. Ek¸sio˘ glu and M. Jin
pit denote the unit production cost at facility i in period t; sit is the production setup cost at facility i in period t; hit is the unit inventory cost at facility i in period t; and cijt is the total transportation cost of shipping bjt from facility i to retailer j in period t. The decision variables are: qit is the amount produced at facility i in period t; Iit is the inventory at facility i in the end of period t; xijt is a binary variable that equals 1 if there is a shipment from facility i to retailer j in period t, and equals 0 otherwise; and yit is a binary variable that equals 1 if production takes place at facility i in period t, and equals 0 otherwise. The following is a MILP formulation of the problem: F R T
minimize
{pit qit + sit yit + hit Iit + cijt xijt }
i=1 j=1 t=1
subject to
(P)
qit + Ii,[T +(t−1)] −
R
bjt xijt − Iit = 0
i = 1, . . . , F ; t = 1, . . . , T
(1)
j=1 F
xijt = 1
j = 1, . . . , R; t = 1, . . . , T (2)
i=1
qit −
t+k R
bj[τ ] yit ≤ 0
i = 1, . . . , F ; t = 1, . . . , T
(3)
bj[τ ] xij[τ ] ≤ 0
i = 1, . . . , F ; t = 1, . . . , T
(4)
i = 1, . . . , F
(5)
i = 1, . . . , F ; t = 1, . . . , T
(6)
τ =t j=1
Iit −
t+k R τ =t+1 j=1
Ii0 = IiT qit , Iit ≥ 0 yit , xijt ∈ {0, 1}
i = 1, . . . , F ; j = 1, . . . , R; t = 1, . . . , T. (7)
For our convenience, in this formulation we have used the notation [t] = (t+1)mod T − 1 i.e., Ii[t−1] = Ii,t−1 for t = 2, . . . , T and Ii[0]=IiT . Constraints (1) and (2) are the flow conservation constraints at the production and demand points respectively. Constraints (3) are the setup constraints. Constraints (4) are the perishability constraints, where k (k ≤ T −1) denotes the maximum number of periods that a product can be stored. Constraints (5) model the fact that the initial inventory is equal to the ending inventory and T is a typical sequence of periods that will repeat itself. Setting the initial inventory level equal to the ending inventory means that these inventory levels are not fixed, and the model will determine the ending inventory levels that will prepare the system for future demands. Constraints (6) are the non-negativity constraints, and (7) are the boolean constraints. Standard solvers such as CPLEX can be used to solve small instances of (P). Large problem instances are solved using the primal-dual algorithm.
Cross-Facility Production and Transportation Planning Problem
711
The transportation cost function fij (gijt ) is considered to be a concave function with respect to the amount shipped, gijt . Based on the single-source assignment property of an optimal solution to our problem, facility i in period t either will not ship to retailer j or will ship the total demand, bjt . This indicates that the transportation cost function consists of only two points gijt = 0 and gijt = bjt . The LP-relaxation of the transportation cost function passes through the points: gijt = 0 and gijt = bjt . Solving the LP-relaxation of (P) with respect to the transportation cost function gives a solution such that gijt = 0 or gijt = bjt . That means the LP-relaxation gives an exact approximation of the concave transportation cost function. Therefore, cijt = fij (bit ). In the special case when F = 1, retailers’ demands are satisfied from the same facility; therefore, there is no decision to be made about which facility will ship the final product. In this case problem (P) reduces to the classical economic lot-sizing problem (Wagner and Whitin [5]). 2.2
Extended Problem Formulation
Linear programming relaxation of formulation (P) that is obtained by replacing the boolean constraints (7) with the nonnegativity constraints is not tight. This t+k R is due to the constraints (3). τ =t j=1 bj[τ ] provides a high upper bound for qit , since the production in a period rarely equals this amount. One way to tighten the formulation is to split the production variables qit by destination into variables qijt[τ ] (τ = t, . . . , t + k). The new decision variable qijt[τ ] presents the amount produced at facility i in period t to satisfy the demand of retailer j in period τ . For these variables, a trivial and tight upper bound is the demand at retailer j in period τ , bjτ . The following is an equivalent formulation of (P) given with respect to decision variable qijt[τ ] : minimize
F T R t+k [ cijt[τ ] qijt[τ ] + sit yit ] i=1 t=1 j=1 τ =t
subject to F
τ
(Ex-P)
qij[T +t]τ = bjτ
j = 1, . . . , R; τ = 1, . . . T
i=1 t=τ −k
qijt[τ ] − bj[τ ] yit ≤ 0
i = 1, . . . , F ; j = 1, . . . R; t = 1, . . . T ; t ≤ τ ≤ t + k
qijt[τ ] ≥ 0 i = 1, . . . , F ; j = 1, . . . , R; t = 1, . . . T ; t ≤ τ ≤ t +k yit ∈ {0, 1} i = 1, . . . , F ; t = 1, . . . T, τ −1 where cijt[τ ] = pit + cij[τ ] /bj[τ ] + s=t hi[s] . In the special case when k = 0, no inventories are carried from one period to another. In this case the problem decomposes by period. The single-period problem is the facility location problem, which is still a difficult problem to solve.
712
2.3
S.D. Ek¸sio˘ glu and M. Jin
Properties of Optimal Solution
Using the network flow interpretation, we establish the required properties of optimal solutions to (P) when the costs are nonnegative. Theorem 1. There exists an optimal solution to problem (P) such that the demand at retailer j in period t is satisfied from either production or the inventory of exactly one of the facilities. Proof: The uncapacitated, production and transportation planning problem minimizes a concave cost function over a bounded convex set; therefore, its optimal solution corresponds to a vertex of the feasible region (Zangwill [6]). Let (q ∗ , x∗ , I ∗ ) be an optimal solution. In an uncapacitated network flow problem, a vertex is represented by a tree solution. The tree representation of the optimal solution implies that demand in every time period will be satisfied by exactly one of the facilities (in other words, x∗ijt x∗ljt = 0, for i = l and t = 1, 2, . . . , T ). Furthermore, for each facility in each time period, if the inventory level is posi∗ ∗ tive, there will be no production, and vice versa: qit Ii,[t−1] = 0, for i = 1, . . . , F , t = 1, . . . , T . 2 Theorem 1 implies properties 1 and 2 of the optimal solutions to our problem. Property 1. The uncapacitated, production and transportation planning problem has an optimal solution that is such that a facility in a time period t either produces or carries inventory from the previous period (or neither), but not both. This property of the optimal solutions is often referred to in the literature as the Zero Inventory Property (ZIP). ZIP applies to the classical single-item lot-sizing problem and some of its generalizations (Wagner and Whitin [5]). Property 2. The optimal solution of the problem is such that the demand in a time period is satisfied from a single facility. This property is equivalent to the single-source assignment property for the uncapacitated facility location problem. Property 3. Every facility in a given time period t either does not produce or produces the demand for a number of periods in the time interval t, . . . , [t + k] (the periods do not need to be successive). This property can be easily derived from Theorem 1 and the tree representation of an optimal solution.
3 3.1
Solution Procedures Exact Solution Approach
Theorem 2. There exists an algorithm for the uncapacitated, single commodity, integrated production and distribution planning problem with perishable commodities (P) that is polynomial in the number of facilities and exponential in the number of periods and retailers.
Cross-Facility Production and Transportation Planning Problem
713
Proof: The following steps describe the algorithm: 1. Consider all the possible assignments of demands bjt (j = 1, . . . , R and t = 1, . . . , T ) to facility i (for i = 1, . . . , F ). There is a total of F RT assignments. 2. Given an assignment of demands to facilities, for each facility i (i = 1, . . . , F ) we need to solve an uncapacitated, single-commodity lot-sizing problem (ELSPi ) that considers the final product to be perishable and is cyclic in nature. Without loss of optimality we can assume an optimal solution mint=1,...,T It = 0. We can solve problem (ELSPi ) for each t = 1, . . . , T , fixing It = 0 and treating period t as the “last” planning period. The cheapest one among the corresponding solutions is then the optimal solution. One should note that problem (ELSPi ) with IT = 0 is in fact the (ELS) problem. This problem can be solved in O(T logT ) (Wagelmans et al. [7]), and problem (ELSPi ) is solved in O(T 2 log T ). Therefore, the running time of this algorithm is bound by O(F T R+1 T 2 log T ). 3.2
Primal-Dual Heuristic
The dual problem of the LP-relaxation of (Ex-P) has a special structure that allows us to develop a primal-dual based algorithm. The following is the formulation of the dual problem: maximize
T R
bjt vjt
t=1 j=1
subject to
t+k τ =t
(D-P)
≤ sit i = 1, . . . , F ; t = 1, . . . , T vj[τ ] − wijt[τ ] ≤ cijt[τ ] i = 1, . . . , F ; t = 1, . . . , T ; t ≤ τ ≤ t + k wijt[τ ] ≥ 0 i = 1, . . . , F ; j = 1, . . . , R; t = 1, . . . , T ; t ≤ τ ≤ t + k.
R j=1 bj[τ ] wijt[τ ]
In an optimal solution to (D-P), both constraints wijt[τ ] ≥ 0 and wijt[τ ] ≥ vj[τ ] − cijt[τ ] should be satisfied. Since wijt[τ ] is not in the objective function, we can replace it with wijt[τ ] = max(0, vj[τ ] − cijt[τ ] ). This leads to the following condensed dual formulation: maximize
T R
bjt vjt
t=1 j=1
(D∗ -P)
subject to t+k R
bj[τ ] max(0, vj[τ ] − cijt[τ ] ) ≤ sit
i = 1, . . . , F ; t = 1, . . . , T.
τ =t j=1
The extended formulation of the multi-facility lot-sizing problem is a special case of the uncapacitated facility location problem. The primal-dual scheme
714
S.D. Ek¸sio˘ glu and M. Jin
we discuss, in principle, is similar to the primal-dual scheme proposed by Erlenkotter [8] for the facility location problem. However, the implementation of the algorithm is different. Wagelmans et al. [7] use a similar primal-dual scheme for the classical lot-sizing problem. They show that this algorithm solves the problem in O(T log T ). The dual variables have the following property: vt ≥ vt+1 , for t = 1, . . . , T. This property is used to show that the dual ascent algorithm gives the optimal solution to the economic lot-sizing problem. This property does not hold true for (D-P). Description of the Algorithm. Suppose the linear programming relaxation of (Ex-P) has an optimal solution (q ∗ , y ∗ ) that is integral. Let (v ∗ , w∗ ) denote an optimal dual solution. The complementary slackness conditions for the primal (Ex-P) and dual (D-P) problems are as follow: R t+k ∗ ∗ (C1 ) yit [sit − j=1 τ =t bj[τ ] wijt[τ ] ] = 0 for i = 1, . . . , F ; t = 1, . . . , T ∗ ∗ ∗ (C2 ) qijt[τ ] [cijt[τ ] − vj[τ ] + wijt[τ ] ] = 0 for i = 1, . . . , F ; j = 1, . . . , R; t = 1, . . . , T ; t ≤ τ ≤ t + k ∗ ∗ ∗ (C3 ) wijt[τ = 0 for i = 1, . . . , F ; j = 1, . . . , R; ] [qijt[τ ] − bj[τ ] yit ] t = 1, . . . , T ; t ≤ τ ≤ t + k t ∗ ∗ (C4 ) vjt [bjt − i τ =t−k qijt[T ] = 0 for j = 1, . . . , R; t = 1, . . . , T. +τ ] The simple structure of the dual problem can be exploited to obtain near optimal feasible solutions by inspection. Suppose that the optimal values of the first f −1 dual variables of (D∗ -P) are known. Then, to be feasible, the f -th dual variable (vlτ ) must satisfy the following constraints:
blτ max(0, vlτ − ciltτ ) ≤ Milt,τ −1 = sit − R τ −1 ∗ j=1 s=t bj[T +s] max(0, vj[T +s] − cij[T +t][T +s] ) −
l−1 j=1 bjτ
∗ max(0, vjτ − cij[T +t]τ )
for all i = 1, . . . , F and t = τ − k, . . . , τ . In order to maximize the dual problem, we should assign vlτ the largest value satisfying these constraints. When blτ > 0, this value is Milt,τ −1 vlτ = min {ciltτ + } (8) i=1,...,F ;τ ≥t blτ Note that if Miltτ −1 ≥ 0 implies vlτ ≥ ciltτ . A dual feasible solution can be obtained simply by calculating the value of the dual variables sequentially (Figure 1). A backward construction algorithm can then be used to generate primal feasible solutions (Figure 2). The primal-dual solutions found using these algorithms may not necessarily satisfy the complementary slackness conditions. Theorem 3. The solutions obtained with the primal and dual algorithms are feasible and they always satisfy the complementary slackness conditions (C1 ) and (C2 ). Proof: The proof is similar to the proof of Proposition 4.1 in Ek¸sio˘ glu et al. [9]. Hence, one can determine whether the solution obtained with the primal and dual algorithms is optimal by checking if conditions (C3 ) are satisfied or if the
Cross-Facility Production and Transportation Planning Problem
715
Miτ,τ −1 = siτ for i = 1, . . . , F ; j = 1, . . . , R; τ = 1, . . . , T for τ = 1 to T do for j = 1 to R do if bjτ = 0 then vjτ = 0 else vjτ = minit {cij[T +t]τ + Mi[T +t],τ −1 /bjτ }, τ − k ≤ t ≤ τ for t = τ − k to τ do for i = 1 to F do Mi[T +t]τ = max{0, Mi[T +t],τ −1 − bjτ ∗ max{0, vjτ − cij[T +t]τ }} enddo enddo enddo enddo
Fig. 1. Dual algorithm yit = 0, qijtτ = 0, i = 1, . . . , F ; j=1,. . . ,R; t = 1, . . . , T ; τ ≤ [t + k] P = {(j, l)|bjl > 0, for j = 1, . . . , R; l = 1, . . . , T } Start : τ = max l ∈ P , t = τ − k Step 1 : for i = 1 to F do for j = 1 to R do repeat t = t + 1 until Mi[T +t]τ = 0 and cij[T +t]τ − vjτ + max{0, vjτ − cij[T +t]τ } = 0 yi[T +t] = 1, and i∗ = i, t∗ = t, go to Step 2 enddo enddo go to Step 3 Step 2 : for t = t∗ to t∗ + k do for j = 0 to R do if ci∗ jt∗ [t] − vj[t] + max{0, vj[t] − ci∗ jt∗ [t] } = 0 then qi∗ jt∗ [t] = bj[t] , P = P − (j, [t]) enddo enddo Step 3 : if P = ∅ then go to Start Fig. 2. Primal algorithm
objective function values from the primal and dual algorithms are equal. The running time of this primal-dual algorithm is O(F RT 2 ).
4
Computational Results
In this section we describe our computational experience in solving the integrated production and transportation planning problem with perishable inventory. Our
716
S.D. Ek¸sio˘ glu and M. Jin Table 1. Summary of results of primal-dual algorithm
Problem
16 17 18 19 20 21 22 23 24
200-300 Error Cpu (%) (sec) 0.14 0.15 0.19 0.15 0.26 0.20 0.14 0.70 0.19 1.00 0.24 1.25 0.14 2.95 0.19 4.45 0.24 6.00
200-900 Error Cpu (%) (sec) 0.24 0.15 0.30 0.20 0.37 0.25 0.23 0.55 0.29 0.85 0.34 1.25 0.23 2.90 0.30 4.45 0.35 5.85
Setup Costs 600-900 900-1,500 Error Cpu Error Cpu (%) (sec) (%) (sec) 0.77 0.15 1.37 0.15 1.08 0.20 1.91 0.20 1.30 0.20 2.28 0.25 0.77 0.65 1.40 0.65 1.07 0.95 1.85 0.95 1.30 1.25 2.26 1.25 0.78 2.85 1.40 3.10 1.08 4.45 1.89 5.00 1.29 6.05 2.23 6.10
1,200-1,500 Error Cpu (%) (sec) 1.91 0.15 2.64 0.20 3.21 0.25 1.96 0.06 2.59 0.95 3.17 1.25 1.94 2.95 2.63 4.95 3.12 5.80
goal is to provide some indication of both the quality and the computing time of the lower and upper bounds generated using the primal-dual algorithm. The problem instances we use are the same as the ones presented in Ek¸sio˘ glu et al. [9]. For all problems we set k = T2 . The errors presented are calculated as follows: Error(%) =
Primal Sol. − Dual Sol. ∗ 100. Dual Sol.
It has been shown in the literature (Hochbaum and Segev [10], Ek¸sioˇ glu et al. [9]) that the ratio of setup to variable costs impacts the complexity of the fixed charge network flow problems. The results in Table 1 indicate that an increase in setup costs impacts the quality of the solutions from primal-dual algorithm. However, setup costs do not affect the running time of the algorithm. The results also show that an increase in the number of facilities and time periods impacts the performance of the primal-dual algorithm. For the same set of problems, formulation (Ex-P) is solved using CPLEX 9 callable libraries. CPLEX gives the optimal solution for problems 16 to 18, but fails to solve the rest of the problems because of the problem size. Table 2 presents the running time of CPLEX for problems 16 to 18. Table 2. CPLEX running times (in cpu seconds) Setup Costs Problem 200-300 200-900 600-900 900-1,500 1,200-1,500 16 15.85 15.95 16.05 16.55 16.50 17 26.55 27.05 28.35 28.75 44.40 18 40.25 41.05 42.20 42.10 86.50
Cross-Facility Production and Transportation Planning Problem
5
717
Conclusions
This paper presents two network flow formulations for an integrated production and transportation planning problem with perishable inventories. The network consists of a number of facilities and retailers. The facilities produce and carry inventory to satisfy retailers’ demands during T time periods. Retailers’ demands are known deterministically. Unlike the traditional inventory models, the starting and ending inventories are not constant. Section 3 presents an exact solution procedure and a primal-dual algorithm to solve the problem. The exact algorithm is polynomial in the number of facilities and exponential in the number of retailers and time periods. This algorithm runs in O(F T R+1 T 2 log T ) times. The primaldual algorithm is used to generate lower and upper bounds. Its running time is O(F RT 2 ) times. We tested the performance of the algorithms on a wide range of randomly generated problems. Computational results show high-quality solutions from the primal-dual algorithm. The maximum error gap was 3.12 percent and the maximum running time was 6.10 cpu seconds. Computational results demonstrate that the ratio of fixed to variable costs, the length of time horizon and the number of facilities impacted the running time and the quality of the solutions from the primal-dual algorithm and CPLEX. We identified a number of problems that CPLEX could not solve because it ran out of memory. However, for all problem classes, primal-dual algorithm gave high-quality solutions in a reasonable amount of time.
References 1. Johnson, D.S., Lenstra, J.K., Rinnooy Kan, A.H.G.: The complexity of the network design problem. Networks 8 (1978) 279–285 2. Nahmias, S.: Perishable inventory theory: A review. Operations Research 30 (1982) 680–708 3. Hsu, V.N.: Dynamic economic lot size model with perishable inventory. Management Science 46 (2000) 1159–1169 4. Myers, D.C.: Meeting seasonal demand for products with limited shelf lives. Naval Research Logistics 44 (1997) 473–483 5. Wagner,H.M., Whitin, T.M.: Dynamic version of the economic lot size model. Management Science 5 (1958) 89–96 6. Zangwill, W.I.: Minimum concave cost flows in certain networks. Management Science 14 (1968) 429–450 7. Wagelmans, A., van Hoesel, S., Kolen, A.: Economic lot sizing: An O(n log n) algorithm that runs in linear time in the wagner-whitin case. Operations Research 40 (1992) 145–156 8. Erlenkotter, D.: A dual-based procedure for uncapacitated facility location. Operations Research 26 (1978) 992–1009 9. Ek¸sio˘ glu, S.D., Romeijn, H.E., Pardalos, P.M. : Cross-facility management of production and transportation planning problem. Comp. Oper. Res. (in press) 10. Hochbaum, D.S., Segev, A.: Analysis of a flow problem with fixed charges. Networks 19 (1989) 291–312
A Unified Framework for the Analysis of M/G/1 Queue Controlled by Workload Ho Woo Lee1 , Se Won Lee1 , Won Ju Seo1 , Sahng Hoon Cheon1 , and Jongwoo Jeon2 1
Department of Systems Management Engineering, Sungkyunkwan University, Su Won, Korea 440-746 [email protected] http://web.skku.edu/~or 2 Department of Statistics, Seoul National University, Seoul, Korea 151-742 [email protected]
Abstract. In this paper, we develop a unified framework for the analysis of the waiting time, the sojourn time and the queue length of the M/G/1 queue in which the server is controlled by workload. We apply our framework to the D-policy M/G/1 queue and confirm the results that already exist in the literature. We also use our approach and derive, for the first time, the sojourn time distribution of an arbitrary customer in the M/G/1 queue under D-policy. The methodologies developed in this paper can be applied to a wide range of M/G/1 queueing systems in which the server is controlled by workload.
1
Introduction
In the D-policy queueing system, the server is turned off as soon as the system becomes empty. It is reactivated only when the cumulative workload (i.e. sum of the service times) exceeds some predetermined threshold D. It is noted that the service times of the customers who arrive during the idle period are neither identical nor independent. The pioneering studies on the D-policy queue were carried out by Balachandran [2], Balachandran and Tijms [3], Boxma [4] and Tijms [22]. Their primary concern was in the optimal control of D under a linear cost structure by using the mean workload. Studies on the queue length of the D-policy queue could not be found until Dshalalow [9] studied the queue length process of the D-policy queue with vacations. But as pointed out by Artalejo [1] and Chae and Park [7][8], his D-policy did not agree with the classical D-policy in the true sense because he implicitly assumed that the customers who have arrived during the idle period are assigned totally new iid service times when the busy period begins. The first successful works on the queue length of the D-policy queue in the true sense were carried out by Chae and Park [8] and Artalejo [1]. Their results show that the well-known decomposition property of the M/G/1 queue with M. Gavrilova et al. (Eds.): ICCSA 2006, LNCS 3982, pp. 718–727, 2006. c Springer-Verlag Berlin Heidelberg 2006
A Unified Framework for the Analysis of M/G/1 Queue
719
generalized vacations does not hold for the D-policy queueing system due to the dependencies of the service tims. Studies on the D-policy queueing system with MAP (Markovian Arrival process) arrivals were explored by Lee and Song [14] and Lee et al. [13]. More works concerning D-policy can be found in Sivazlian [20], Li and Niu [15], Rhee [18], Park and Chae [17], Lillo and Martin [16], Feinberg and Kella [10], Lee and Baek [11] and Lee et al. [12]. The first studies on the waiting time and queue length of the M/G/1/Dpolicy queue took different approaches. Park and Chae [17] used the supplementary variable technique to obtain the waiting time distribution. But Artalejo [1] used the concept of the regeneration cycle to derive the exact queue length distribution. Chae and Park [8] derived the PGF (probability generating function) of the queue length distribution by analyzing the departure points. Our objective in this paper is twofold: (a) We develop a methodology to analyze the queue length, waiting time and sojourn time under a unified framework. (b) We use our framework to confirm the existing results. We also derive the sojourn time distribution under the unified methodology. This sojourn time distribution will be the first one derived in the queueing literature. We note that the methodologies developed in this paper can be applied to a wide range of queueing systems in which the server is controlled by workload.
2
Preliminary Work and Case Classification
In this section, we lay the foundations for the unified framework that will be presented in subsequent sections. The starting point is to analyze the workload process during the idle period. The rationale is to get the information of the workload and the remaining idle period at an arbitrary arrival time during the idle period. These two quantities affect the queue length and waiting times of the future process. Before going into the details, let us define some notation. As for the service time random variable S, we will use s(x), S(x), s(n) (x) and S (n) (x) as the pdf, distribution function (DF), n-fold pdf and n-fold DF respectively. If we define as the renewal process that is generated by iid service times, then M (x) =
∞ n=1
S (n) (x)
and m(x) =
∞ d M (x) = s(n) (x) dx n=1
(2.1)
are the renewal function and the renewal density function of the process. Let us define I(x, y) as the indicator random variable that takes 1 if the workload process during the idle period passes through the interval (x, y] and 0 otherwise. Let us define its probability φ(x)dx = P r[I(x, x + dx] = 1], (x > 0).
(2.2)
720
H.W. Lee et al.
The event {I(x, x + dx] = 1} occurs iff one event of the -process occurs during (x, x + dx]. Thus, we have φ(x) = m(x). (2.3) Let Uidle (x) be the probability that the workload at an arbitrary time during d the idle period is less than or equal to x with uidle (x) = dx Uidle (x), (0 < x ≤ D) as its pdf. Then, we have 1 , E(ND ) m(x) uidle (x) = , (0 < x ≤ D), E(ND )
Uidle (0) =
(2.4a) (2.4b)
where ND is the number of customers at the busy period starting point and ID is the length of the idle period. For ND , we have (Ross [19]), P r(ND = k) = S (k−1) (D) − S (k) (D).
(2.5)
Then, we easily get the PGF and the mean as ND (z) =
∞
z k P r(ND = k) =
k=1
= 1 − (1 − z)
∞
z k S (k−1) (D) − S (k) (D)
k=1 ∞
z k S (k) (D),
(2.6a)
k=0
E(ND ) =
d ND (z)z=1 = 1 + M (D). dz
(2.6b)
Let UD be the workload at the busy period starting point. If we condition on the amount of work just before D, the pdf of UD becomes D uD (x) = s(x) + s(x − y)m(y)dy, (x > D) (2.7a) 0
with the Laplace-Stieltjes transform (LST) ∞ ∗ UD (θ) = E e−θUD = e−θx uD (x)dx D D
∗
= 1 − [1 − S (θ)] 1 +
e
−θx
m(x)dx .
(2.7b)
0
Then we have E(UD ) = −
d ∗ U (θ) θ=0 = E(ND )E(S) = E(S)[1 + M (D)]. dθ D
(2.7c)
Let us categorize the customers into two types as follows: SC(Special Customer) : the customer who arrives during the idle period. OC(Ordinary Customer) : the customer who arrives when the server is busy.
A Unified Framework for the Analysis of M/G/1 Queue
721
The workload and the remaining idle period at the arrival instance of an arbitrary SC (test-SC) determines the queue length and waiting time of the customers who arrive thenafter. Thus it is important to classify the possible situations at its arrival instance. We have four cases (Figure 1). (1) The test-SC is the first customer during the idle period. a (Case 1) The total work just after its arrival is less than or equal to D( in Figure 1). d (Case 2) The total work just after its arrival is greater than D (). (2) The test-SC is not the first customer during the idle period. b (Case 3) The total work just after its arrival is less than or equal to D (). c (Case 4) The total work just after its arrival is greater than D (). Above case classification is important since the derivations of the distributions of the queue length, the waiting time and the sojourn time in subsequent sections will be based on the above four cases.
U(t)
c
D b d a
Idle period
Busy period
Idle period
t
Fig. 1. Four cases at the arrival instance of the test-SC
3
Queue Length
In this section, we derive the queue length distribution based on the case clas¯ (z ), Π (z ) and P (z) at sification. We first note that the queue length PGFs Π an arrival, at a departure and at an arbitrary time point are all equal due to PASTA (Wolff [23]) and we have ¯ (z ) = Π (z ) = P (z). Π (3.1) Thus, we just need to find the departure point PGF Π (z ). Since the work is still conserved in our system, an arbitrary customer is an OC with probability ρ and a SC with probability (1 − ρ). Thus, conditioning on the customer type, we have P (z) = Π (z) = ρΠoc (z) + (1 − ρ)Πsc (z),
(3.2)
where Πoc (z) is the PGF of the queue length at an arbitrary OC departure and Πsc (z) is the PGF of the queue length at an arbitrary SC departure.
722
H.W. Lee et al.
To derive the queue length PGF Πoc (z), we note that the departure process of the OCs is stochastically equivalent to the departure process of the M/G/1 ∗∗ ∗∗ queue that starts with ND customers where ND is the number of customers that arrive during UD (the workload at the start of the busy period). Thus from the well-known decomposition property of the M/G/1 queue with generalized vacations (Takagi [21]), we get ∗ (1 − ρ)(1 − z)S ∗ (λ − λz) 1 − UD (λ − λz) · , ∗ S (λ − λz) − z λE(UD )(1 − z) (3.3) where ΠM/G/1 (z) is the queue length PGF in the ordinary M/G/1 queue and NE∗∗ (z) is the PGF of the backward recurrence time of the discrete-time renewal ∗∗ process that is generated by iid ND ’s. The derivation of Πsc (z) depends on the four cases we mentioned in Section 2. First, the queue length left behind by the departing test-SC is the sum of the following two:
Πoc (z) = ΠM/G/1 (z) · NE∗∗ (z) =
(i) the number of OCs who arrive during the workload ST just after its arrival (including its service time), (ii) the number NR of the OCs who arrive behind the test-SC during the idle period. Let us define the joint distribution of ST and NR as α(x, n)dx = P r(x < ST ≤ x + dx, NR = n), (x ≥ 0, n ≥ 0).
(3.4)
We note that the queueing process after the arrival of the test-SC is stochastically equivalent to the idle period under (D − x)-policy. Now, we have
∞
Πsc (z) = 0
e−(λ−λz)x
∞
z n α(x, n)dx.
(3.5)
n=0
Let α(i) (x, n) be the value of α(x, n) under case i mentioned in Section 2. Then, we get (Case 1) (Case 2) (Case 3)
α(1) (x, n) = Uidle (0)s(x)ψnD−x , (n ≥ 1, 0 < x ≤ D), (3.6a) α(2) (x, n) = Uidle (0)s(x), (n = 0, x > D), (3.6b) x α(3) (x, n) = uidle (y)s(x − y)dy ψnD−x , (n ≥ 1, 0 < x ≤ D), y=0+
(Case 4)
(3.6c) D
α(4) (x, n) =
uidle (y)s(x−y)dy, (n = 0, x > D),
(3.6d)
y=0+
where ψnv = P r(Nv = n) is the probability that the busy period starts with n customers under v-policy which is given by, from (2.5),
A Unified Framework for the Analysis of M/G/1 Queue
ψnv = S (n−1) (v) − S (n) (v), (n ≥ 1).
723
(3.7)
If we use (3.6a-d) in (3.5) after using (2.4a,b), we get
D ∞ 1 ∗ Πsc (z) = UD (λ − λz) + e−(λ−λz)x z n ψnD−x m(x)dx , E(ND ) 0 n=1
(3.8)
∗ where UD (θ) is the LST of the DF of UD . Now, using (3.3) and (3.8) in (3.2) yields
1−ρ P (z) = E(ND )
×
∗ UD (λ − λz) +
∞
D
zn
n=1
e−(λ−λz)x [S (n−1) (D − x) − S (n) (D − x)]m(x)dx
0
∗ (1 − ρ)(1 − z)S ∗ (λ − λz) [1 − UD (λ − λz)] +ρ· · . ∗ S (λ − λz) − z λE(UD )(1 − z)
(3.9)
We confirm that above P (z) is exactly equal to the one that was derived by Chae and Park [8] by using different approach.
4
Waiting Time
Conditioning on the customer type again, we have ∗ ∗ Wq∗ (θ) = (1 − ρ)Wq,sc (θ) + ρWq,oc (θ).
(4.1)
Noting that the busy period is a delay cycle with initial delay UD , we have (Takagi [21]) ∗ ∗ ∗ Wq,oc (θ) = Wq,M/G/1 (θ) · UD,E (θ) =
∗ (1 − ρ)θ 1 − UD (θ) · , θ − λ + λS ∗ (θ) θE(UD )
(4.2)
∗ where UD,E (θ) is the LST of the DF of the forward recurrence time of UD . ∗ Wq,sc (θ) can be obtained based on the four cases mentioned in Section 2. Let ∗ Wq,sc(i) (θ) be the LST of the waiting time in case i. Then, we have D ∗ ∗ (Case 1) Wq,sc(1) (θ) = Uidle (0)s(x)ID−x (θ)dx, (4.3a) x=0
(Case 2) (Case 3)
∗ Wq,sc(2) (θ) = Uidle (0)[1 − S(D)], (4.3b) D x ∗ ∗ Wq,sc(3) (θ) = e−θy uidle (y)s(x − y)ID−x (θ)dydx, x=0
(Case 4)
∗ Wq,sc(4) (θ) =
∞
x=D
y=0+
(4.3c) D
y=0+
e−θy uidle (y)s(x − y)dydx,
(4.3d)
724
H.W. Lee et al.
∗ where ID−x (θ) is the length of the idle period with threshold D − x and
∗ ID−x (θ)
n ∞ λ = S (n−1) (D − x) − S (n) (D − x) . θ+λ n=1
(4.4)
Using (2.4a,b) and (4.4) in (4.3a-d) after using (2.4a,b), we get ∗ Wq,sc (θ)
D
+
e
=
4
∗ Wq,sc(i) (θ)
i=1 ∞ −θx
0
k=0
1 = E(ND )
∞ k=0
λ θ+λ
k S (k) (D) − S (k+1) (D)
k λ S (k) (D − x) − S (k+1) (D − x) m(x)dx . θ+λ
(4.5)
Now, using (4.2) and (4.5) in (4.1) yields 1−ρ λ = U (θ) + −1 E(ND ) θ+λ ∞
k−1 k−1 D ∞ λ λ (k) −θx (k) × S (D) + e S (D − x)m(x)dx . θ+λ θ+λ 0
Wq∗ (θ)
∗
k=1
k=1
(4.6) It is confirmed that this is exactly equal to the one which was derived by using the supplementary variable technique by Park and Chae [17].
5
System Sojourn Time
In the usual M/G/1 queueing system without D-policy, the sojourn time is the simple sum of the two independent random variables, the waiting time and the service time: ∗ ∗ WM/G/1 (θ) = Wq,M/G/1 (θ) · S ∗ (θ).
(5.1)
But (5.1) is no longer true for the D-policy queueing system because the waiting time and the service time are dependent. Conditioning on the customer type, the LST of the sojourn time DF can be written as ∗ ∗ W ∗ (θ) = (1 − ρ)Wsc (θ) + ρWoc (θ).
(5.2)
Since the waiting time and the service of an arbitrary OC are independent, we have from (4.2), ∗ ∗ ∗ Woc (θ) = WM/G/1 (θ) · UD,E (θ) =
∗ (1 − ρ)θS ∗ (θ) 1 − UD (θ) · . ∗ θ − λ + λS (θ) θE(UD )
(5.3)
A Unified Framework for the Analysis of M/G/1 Queue
725
∗ Let Wsc(i) (θ) be the LST of the sojourn time in case i. Then, we have ∗ (Case 1) Wsc(1) (θ) = ∗ (Case 2) Wsc(2) (θ) = ∗ (Case 3) Wsc(3) (θ) = ∗ (Case 4) Wsc(4) (θ) =
D
∗ e−θx Uidle (0)s(x)ID−x (θ)dx,
x=0 ∞
e−θx Uidle (0)s(x)dx,
x=D D
e−θx
x=0 ∞
e−θx
x
(5.4a) (5.4b)
∗ uidle (y)s(x − y)ID−x (θ)dydx, (5.4c)
y=0+ D
x=D
uidle (y)s(x − y)dydx,
(5.4d)
y=0+
∗ where ID−x (θ) was given in (4.4). Combining all four cases, we get, after using (2.4a,b), 4
1 E(ND ) i=1 D ∞ λ n ∗ × UD (θ) + e−θx S (n−1) (D − x) − S (n) (D − x) m(x)dx . θ+λ 0 n=1
∗ Wsc (θ) =
∗ Wsc(i) (θ) =
(5.5) Using (5.3) and (5.5) in (5.2), we get 1 E(ND ) n D ∞ λ ∗ −θx × UD (θ) + e S (n−1) (D − x) − S (n) (D − x) m(x)dx θ+λ 0 n=1 ∗ λS ∗ (θ)[1 − UD (θ)] + . (5.6) ∗ θ − λ + λS (θ)
W ∗ (θ) =
We note that above LST of the system sojourn time is derived for the first time in this paper. The mean sojourn time becomes W =−
d ∗ W (θ) θ=0 dθ
λE(S 2 ) = E(S) + + 2(1 − ρ) = Wq + E(S).
6
D 0
x · m(x)dx 1−ρ + E(ND ) λE(ND )
D
[1 + M (D − x)]m(x)dx 0
(5.7)
Summary
In this paper, we presented a unified methodology that can be used to analyze the queue length, the waiting time and the system sojourn time under the same
726
H.W. Lee et al.
framework. The framework is based on the case classification of the workload at the arrival instance of an arbitrary special customer. We confirmed results that exist in the literature which had been derived by using different approaches. The sojourn time distribution we derived by using our approach was the first one in the queueing literature. Our approach can be extensively applied to analyze a variety of M/G/1 queueing systems in which the server is controlled by workload. Acknowledgement. This work was supported by the SRC/ERC program of MOST/KOSEF (grant R11-2000-073-00000).
References 1. Artalejo, J.R. : On the M/G/1 queue with D-policy. Applied Mathematical Modelling 25 (2001) 1055-1069 2. Balachandran, K.R. : Control policies for a single server system, Management Sci. 19 (1973) 1013-1018 3. Balachandran, K.R. and Tijms, H. : On the D-policy for the M/G/1 queue, Management Sci. 21(9) (1975) 1073-1076 4. Boxma, O.J. : Note on a control problem of Balachandran and Tijms, Management Sci. 22(8) (1976) 916-917 5. Boxma, O.J. : Workloads and waiting times in single-server systems with multiple customer classes, Queueing Systems 5 (1989) Nos.1-3 185-214 6. Boxma, O.J. and Groenendijk, W.P. : Pseudo-conservation laws in cyclic-service systems, J. Appl. Probab. 24(4) (1987) 949-964 7. Chae, K.C. and Park, Y.I. : On the optimal D-policy for the M/G/1 queue, J Korean Institute of Industrial Engineers (KIIE) 25(4) (1999) 527-531 8. Chae, K.C. and Park, Y.I. : The queue length distribution for the M/G/1 queue under the D-policy, J. Appl. Prob. 38(1) (2001) 278-279 9. Dshalalow, J.H. : Queueing processes in bulk systems under the D-policy, J. Appl. Probab. 35 (1998) 976-989 10. Feinberg, U.A. and Kella, O. : Optimality of D-policies for an M/G/1 queue with a removable server, Queueing Systems 42 (2002) 355-376 11. Lee, H.W. and Baek, J.W. : BMAP/G/1 queue under D-policy: queue length analysis, Stochastic Models 21(2-3) (2005) 1-21 12. Lee, H.W., Baek, J.W. and Jeon, J. : Analysis of queue under D-policy, Stochastic Analysis and Applications 23 (2005) 785-808 13. Lee, H.W., Cheon, S.H., Lee, E.Y. and Chae, K.C. : Workload and waiting time analysis of MAP/G/1 queue under D-policy, Queueing Systems 48 (2004) 421-443 14. Lee, H.W. and Song, K.S. : Queue length analysis of MAP/G/1 queue under Dpolicy, Stochastic Models 20(3) (2004) 363-380 15. Li, J. and Niu, S.C. : The waiting time distribution for the GI/G/1 queue under the D-policy, Prob. Eng. Inf. Sci. 6 (1992) 287-308 16. Lillo, R.E. and Martin, M. : On optimal exhaustive policies for the M/G/1 queue, Operations Research Letters 27 (2000) 39-46 17. Park, Y.I. and Chae, K.C. : Analysis of unfinished work and queue waiting time for the M/G/1 queue with D-policy, Journal of the Korean Statistical Society 28(4) (1999) 523-533
A Unified Framework for the Analysis of M/G/1 Queue
727
18. Rhee, H.K. : Development of a new methodology to find the expected busy periods for controllable M/G/1 queueing models operating under the multi-variable operating policies : concepts and applications to the dyadic policies, J Korean Institute of Industrial Engineers (KIIE) 23(4) (1997) 729-739 19. Ross, S.M : Stochastic Processes, 2nd ed., John Wiley & Sons, Inc. (1996) 20. Sivazlian, B.D. : Approximate optimal solution for a D-policy in an M/G/1 queueing system, AIIE Transactions 11 (1979) 341-343. 21. Takagi, H. : Queueing Analysis: A Foundation of Performance Evaluation, Vol. I, Vacation and Priority Systems, Part I. North-Holland: Amsterdam (1991) 22. Tijms, H.C., : Optimal control of the workload in an M/G/1 queueing system with removable server, Math. Operationsforsch. u. Statist. 7 (1976) 933-943 23. Wolff, R.W., : Poisson arrivals see time averages, Oper. Res. 30(2) (1982) 223-231
Tabu Search Heuristics for Parallel Machine Scheduling with Sequence-Dependent Setup and Ready Times Sang-Il Kim, Hyun-Seon Choi, and Dong-Ho Lee Department of Industrial Engineering, Hanyang University, Sungdong-gu, Seoul 133-791, Korea [email protected]
Abstract. We consider the problem of scheduling a set of independent jobs on parallel machines for the objective of minimizing total tardiness. Each job may have sequence-dependent setup and distinct ready times, i.e., the time at which the job is available for processing. Due to the complexity of the problem, tabu search heuristics are suggested that incorporate new methods to generate the neighborhood solutions. Three initial solution methods are also suggested. Computational experiments are done on a number of randomly generated test problems, and the results show that the tabu search heuristics suggested in this paper outperform the existing one.
1 Introduction Occurrence of machines in parallel is common in various manufacturing and assembly systems. For example, various parallel machine systems can be found in semiconductor manufacturing, printed circuit board (PCB) manufacturing, group technology (GT) cell with identical machines, steel manufacturing, injection molding process, etc. Among them, the injection molding process, which has a huge global market, is forced to reduce costs as well as increase productivity, and efficient scheduling is an important means to achieve them (Dastidar and Nagi 2004). In general, the parallel machine scheduling problem has two distinct decisions: allocation and sequencing. Allocation is a decision concerning the assignment of jobs to machines, while sequencing is to order the jobs assigned to each machine. Also, parallel machine scheduling can be classified according to: a) objective; b) machine type (identical or non-identical machine); c) job type (independent or dependent job); and d) preemption. Additional criteria for the problem classification are sequencedependent setup times and ready times. Here, the sequence-dependent setups imply that setup times depend on the type of job just completed and on the job to be processed. An example of a process with sequence dependent setup times can be found in the injection molding process. In this example, only the mold change time is required if the same material is used between two consecutive jobs, while the screw cleaning times are additionally required, otherwise. Also, the ready time of a job, which is related to dynamic scheduling, is the time at which the job is available for processing. Note that the static version of scheduling assumes that all jobs are simultaneously available for processing, i.e., zero ready times. M. Gavrilova et al. (Eds.): ICCSA 2006, LNCS 3982, pp. 728 – 737, 2006. © Springer-Verlag Berlin Heidelberg 2006
Tabu Search Heuristics for Parallel Machine Scheduling
729
This paper considers the parallel machine scheduling problem in which each job may have sequence-dependent setup and distinct ready times. Note that these two characteristics make the problem more realistic, but increase its complexity drastically. The objective is to minimize total tardiness, i.e., amount of time that jobs fail to meet their due dates. Compared with those for single machine scheduling, the research on parallel machine scheduling has been increased in last decades since various types of parallel machine systems are introduced to manufacturing and service systems. Since this paper focuses on the parallel machine scheduling problem with sequence-dependent setup and distinct ready times, the literature review is done on such problems. (See Cheng and Sin (1990) for literature reviews on various parallel machine scheduling problems.) Bean (1994) considers the parallel machine scheduling problem with sequence-dependent setup times for the objective of minimizing total tardiness, and suggests a genetic algorithm that can give near optimal solutions for the test problems, and later Koulamas (1997) suggests heuristic and simulated annealing algorithms for the same problem. Park and Kim (1997) consider the parallel machine scheduling problem with distinct ready times and suggest search heuristics that minimize work-in-process (WIP) inventory. For the problem with sequencedependent setup and ready times, Sivrikaya-Serifoglu and Ulusoy (1999) suggests a genetic algorithm that minimizes the sum of the earliness and tardiness costs. Later, Bilge et al. (2004) suggest tabu search heuristics for the objective of minimizing total tardiness and show from computational tests that their algorithms outperform the genetic algorithm of Sivrikaya-Serifoglu and Ulusoy (1999) while the earliness cost is not considered. Recently, Cao et al. (2005) consider the parallel machine problem of simultaneously selecting and scheduling parallel machines, and suggest other tabu search heuristics for the objective of minimizing the sum of machine holding and job tardiness costs. As stated earlier, this paper focuses on the parallel machine scheduling problem with sequence-dependent setup and ready times for the objective of minimizing total tardiness. In fact, the problem considered in this paper is the same as that of Bilge et al. (2004). Due to the complexity of the problem, we suggest tabu search heuristics that can give good solutions for practical sized problems within a reasonable amount of computation time. The algorithms suggested in this paper incorporate new methods to generate the neighborhood solutions. Three methods of obtaining the initial solutions are also suggested. Computational experiments are done on a number of randomly generated test problems and the results are reported. In particular, the new tabu search heuristics are compared with the existing one of Bilge et al. (2004).
2 Problem Description Before describing the parallel machine scheduling problem considered in this paper, we present a general structure of parallel machine systems in Figure 1. In this figure, j II k, implies job j is processed on machine k, j = 1, 2, ..., n, k = 1, 2, ..., m. Note that each job has a single operation that is performed on one of the parallel machines.
730
S.-I. Kim, H.-S. Choi, and D.-H. Lee
Fig. 1. Structure of parallel machine process
As stated earlier, there are two decision variables in the parallel machine scheduling problem: (a) allocating jobs to machines; and (b) sequencing the jobs assigned to each machine. In figure 1, the job allocation is denoted by j || k and the problems after allocating the jobs result in the single machine scheduling problems. It is assumed that the parallel machines are non-identical but uniform in that each machine is capable of processing all the jobs at different speeds. The objective is to minimize the total tardiness, which can be represented as n
∑ max{0, C i − d i } ,
i =1
where Ci and di denote the completion time and the due date of job i, respectively. Note that the completion times of jobs depend on the two decision variables, allocation and sequencing, and the problem considered here is to determine them while considering the sequence-dependent setup and ready times. In this paper, we consider the deterministic version of the problem, i.e., processing time and due date of each job are deterministic and given in advance. It is assumed that sequence-dependent setup times are given. Hence, the time interval that job j occupies machine k can be represented as sijk + pjk, where i is the job that precedes j in sequence on machine k, sikj is the setup time required for job j after job i is completed in machine k, and pjk is the processing time of job j on machine k. It is also assumed that all jobs are known at the beginning of the scheduling horizon, and their ready times are distinct and given in advance. Recall that the ready time of a job is the time at which the job is available for processing. Other assumptions made in the problem considered here are the same of those of the basic single machine scheduling problem. They are: (a) jobs are independent, i.e., no precedence relationship between any pair of jobs; (b) each machine can process only one job at a time and each job can be processed on one machine; (c) once a job is determined to be processed by a machine, it will stay on the machine until its completion, i.e., no job preemption; and (d) machine breakdowns are not considered. As noted in the previous research articles, the problem considered in this paper is an NP-hard problem. This can be easily seen from the fact that the single machine problem with the tardiness measure is NP-hard even without sequence-dependent setup and ready times (Du and Leung 1990).
Tabu Search Heuristics for Parallel Machine Scheduling
731
3 Solution Algorithms 3.1 Obtaining Initial Solutions We suggest three methods to obtain the initial solutions, each of which is used for the tabu search heuristics suggested in this paper. Detailed explanations of the three methods are given below. EDR (Earliest Due date plus Ready time) This method is an extension of the EDD (Earliest Due Date) rule, a commonly used one for various scheduling problems with due dates. Because we consider the problem with ready times, EDR uses the sum of due date and ready time as the sorting measure. More specifically, all jobs are sorted in a nondecreasing order of these sums, and the jobs are allocated to machines in this order. Also, the job allocation is done by selecting the machine with the smallest additional time. Here, the additional time of job j on machine k is defined as pjk − pjkj, where pjk is the processing time of job j in machine k (as defined earlier) and kj is the index for the machine where the smallest processing time of job j occurs. For example, consider two types (old and new) of machines. If the processing times of a job are 3 and 4 at the new and the old machines, respectively, the additional time of the old (new) one becomes 1 (0). MT (Minimum Tardiness) This is similar to the EDR method in that the job sorting measure is the sum of due date and ready time, and all jobs are sorted in a nondecreasing order of these sums. Unlike the EDR method, however, the job allocation is done by selecting the machine with the smallest total tardiness after the current job is allocated. (Note that the total tardiness is calculated for each machine.) In this method, tie breaks are done as follows. If there are two or more machines with zero or the same total tardiness, the machine with the maximum sum of slack time is selected. Here, the slack time of a job is the difference between its due date and completion time. MPS (Minimum Partial Schedule) This is the same as the MT method except that a set of tardy jobs is maintained during the job allocation. That is, if a job can be allocated to one or more machines without incurring tardiness, selected is the machine with the smallest completion time after the job is allocated. Otherwise, the job (that incurs tardiness) is assigned to the set of tardy jobs. Basically, this idea is similar to the Hodgson algorithm for the basic single machine scheduling problem that minimizes the number of tardy jobs. After all jobs are considered for allocation, a partial schedule can be obtained for the allocated job, and then the tardy jobs are scheduled. To schedule each of the tardy jobs, the best position (that gives the minimum total tardiness) is selected after it is assigned to all possible positions in the current partial schedule. 3.2 Tabu Search Heuristics Tabu search (TS) is one of the well-known local search techniques that have been successfully applied to various combinatorial optimization problems. Starting from an
732
S.-I. Kim, H.-S. Choi, and D.-H. Lee
initial solution, TS generates a new alternative S′ in the neighborhood of the original alternative S with a function that transforms S into S′. This is usually called a move, which can be made to a neighbor solution even though it is worse than the given solution. This makes a TS escape from a local optimum in its search for the global optimum. To avoid cycling, TS defines a set of moves that are tabu (forbidden), and these moves are stored in a set Λ, called tabu list. Elements of Λ define all tabu moves that cannot be applied to the current solution. The size of Λ is bounded by a parameter l, called tabu list size. If ⏐Λ⏐ = l, before adding a move to Λ, one must remove an element in it, the oldest one in general. Note that a tabu move can be always allowed to be chosen if it creates a solution better than the incumbent solution, the best objective value obtained so far. This is called the aspiration criterion. See Glover (1989) for a comprehensive description of various aspects of TS. An application of TS is generally characterized by several factors. They are: (a) initial solution methods; (b) neighborhood generation methods, i.e., set of possible moves applicable to the current solution; (c) definition of tabu moves with the tabu list size; and (d) termination condition(s). First, the solution is represented as Figure 2, which is modified from that of Bilge et al. (2004). In this figure, Jkd denotes the index of the dth job assigned to machine k, and nk denotes the number of jobs allocated to machine k. Also, the objective value for a given solution can be easily calculated by going through the sequence of jobs over each machine and summing up the tardiness of each job, calculated by taking into account distinct ready times, due dates, processing times on different types of machines, and sequence-dependent setup times. machine 1
J11
J12
J13
...
...
...
J1,n1
machine 2 : : machine k : : machine m
J21
J22
J23
...
...
...
...
Jk1
Jk2
Jk3
...
...
J k ,nk
Jm1
Jm2
Jm3
...
J n ,nm
J 2,n2
Fig. 2. Solution representation (adopted from Bilge et al. 2004)
Two neighborhood generation methods are suggested in this paper, each of which is based on swap and insertion methods. In general, the swap method, alternatively called the interchange method in the literature, generates a neighborhood solution by selecting two jobs in the current schedule and interchanging them, while the insertion method generates a neighborhood solution by randomly selecting two jobs and removing the first job from the original place and inserting it to the position that directly precedes the second job. Note that the swap and insertion methods can be done for the jobs assigned to the same machine (intra-machine) or different machines (inter-machine) since this paper considers the parallel machine scheduling problem.
Tabu Search Heuristics for Parallel Machine Scheduling
733
The neighborhood generation methods suggested in this paper have a hybrid structure with swap and insertion methods. A schematic description of the hybrid structure is given in Figure 3. As can be seen from the figure, the hybrid structure uses the swap and the insertion methods consecutively. More specifically, the swap move is done first for a pair of jobs, and then the insertion move is done for each of the neighborhoods generated by the swap move. Here, the intra- and inter-machine moves are considered for both swap and insertion methods. Note that Bilge et al. (2004) considers the intra- and inter-machine moves for the insertion method, but only the intra-machine moves for the swap method. Therefore, our neighborhood generation methods extend the method of Bilge et al. (2004) in that the search space is enlarged and hence the possibility of obtaining good solutions is increased.
Fig. 3. A schematic description of neighborhood generation
There may be several ways of considering moves. Among them, this paper uses the method of examining a portion of the entire neighborhood and taking the best move that is not tabu. This is because the hybrid method described earlier generates too many neighborhood solutions, and hence it is needed to limit the number of neighborhood solutions examined. This is called the candidate list strategy in the literature (Laguna et al. 1991). The purpose of the candidate list strategy is to screen the neighborhood solutions so as to concentrate on promising moves. This paper suggests two candidate list strategies, called small candidate list strategy (SCLS) and large candidate list strategy (LCLS) in this paper. Note that each of the two strategies is applied to the hybrid neighborhood generation method. In the SCLS, only one job with the maximum tardiness is selected for each machine, and these jobs are considered as the candidates for (intra-machine and inter-machine) insertion or swap move. Since this approach has shown to be quite fast and decreases the neighborhood size considerably, it is called as the small candidate list strategy. In the LCLS, on the other hand, two cases are considered. If the iteration number is odd, all tardy jobs on each machine are considered as the candidates for insertion or swap move. Otherwise, all non-tardy jobs on each machine are chosen as the candidates. Here, the insertion
734
S.-I. Kim, H.-S. Choi, and D.-H. Lee
and swap moves are done for both intra-machine and inter-machine. In this paper, therefore, we suggest six TS heuristics, each of which is a combination of the initial solution methods described in section 3.1 and the two neighborhood generation methods. Tabu moves are defined as follows. In the swap move, a pair of jobs that have been interchanged is defined as a tabu move. Also, the insertion method defines a tabu move as the job to be moved and the job that directly precedes the second job. More specifically, if job i is inserted between jobs (j – 1) and j, jobs i and j are stored in the tabu list. Here, in the case that job i is inserted to the last position in the sequence of jobs assigned to a machine, job i and the last job in the sequence are defined as a tabu move. As an exceptional case, a tabu move can be allowed to be chosen if it generates a solution better than the incumbent solution, the best objective value obtained so far. As stated earlier, the size of the tabu list is an important parameter. If the tabu list is too short, the TS heuristics may keep returning to the same local optimum, which prevents the search process from exploring a wide area of solution space. In contrast, if the tabu list is too long, it results in excessive computation time searching the tabu list to determine if a move is tabu or not. Thus, a longer time spent going through a tabu list provides less time for the procedure to explore the solution space for a given computation time. In this paper, the tabu list size l was determined using the empirical function, suggested by of Bilge et al. (2004), which depends on the problem size. Formally, it is represented as n⋅ n , l =h⋅ m − 0.5
where h is a parameter. Also, two independent tabu lists, which correspond to the swap and the insertion methods for generating neighborhoods, are created, and each of them is maintained in such a way that the oldest tabu move is removed before adding a new one if its tabu list is full. Finally, the TS heuristics stop if no improvements have been made for a certain number of consecutive iterations, denoted by LTS in this paper. This is a common termination rule used in the literature.
4 Computational Experiments To show the performances of the TS heuristics, computational tests were done on randomly generated test problems, and the results are reported in this section. In the test, 7 TS heuristics are compared, one suggested by Bilge et al. (2004) and six suggested in this paper. The TS heuristics were coded in C, and tests were performed on a workstation with an Intel Xeon processor operating at 3.2 GHz clock speed. Since optimal solutions cannot be obtained in a reasonable amount of computation time, the TS heuristics are compared by using a relative performance ratio. Here, the relative performance ratio of the TS heuristic a for a problem is defined as 100 ⋅ (Ca − Cbest ) / Cbest ,
where Ca is the objective value obtained using the heuristic a for the problem and Cbest is the best objective function value among those obtained from the seven TS heuristics.
Tabu Search Heuristics for Parallel Machine Scheduling
735
Table 1. Performances of the TS heuristics
(a) Cases of loose due dates Machines Jobs 20 40 4 60 80 60 80 8 100 120 100 120 12 140 160
Bilge et al.1 1.4 (4.3)* 0.5 (0.9) 0.5 (1.4) 0.1 (0.2) 1.2 (1.8) 1.6 (2.6) 0.8 (1.3) 0.2 (0.3) 1.1 (1.0) 1.4 (2.4) 0.4 (1.5) 1.6 (3.3)
EDR-SCLS2 0.5 (1.0) 3.3 (3.6) 4.6 (3.5) 2.4 (3.1) 2.1 (1.8) 3.7 (2.8) 6.2 (6.9) 3.1 (3.2) 3.7 (2.1) 4.0 (4.8) 5.0 (2.3) 4.8 (1.9)
MT-SCLS2 0.2 (0.3) 2.0 (2.6) 6.0 (4.4) 1.4 (1.8) 2.4 (2.7) 3.8 (2.7) 6.9 (8.0) 3.1 (3.2) 4.2 (2.1) 4.5 (3.9) 4.6 (2.9) 5.9 (2.0)
MPS-SCLS2 0.1 (0.3) 2.6 (3.1) 5.7 (4.2) 1.3 (1.6) 3.3 (2.8) 3.1 (3.3) 10.7 (13.6) 4.2 (5.9) 4.0 (1.7) 4.5 (3.3) 4.1 (2.0) 5.1 (2.0)
EDR-LCLS2 0.0 (0.0) 0.1 (0.3) 2.0 (3.6) 0.3 (1.0) 0.0 (0.0) 0.4 (1.1) 1.2 (2.6) 0.3 (0.9) 0.5 (0.8) 0.4 (0.9) 0.6 (0.8) 0.4 (0.8)
MT-LCLS2 0.0 (0.0) 0.2 (0.1) 1.9 (3.6) 0.3 (1.0) 0.0 (0.0) 0.3 (0.7) 0.8 (1.7) 0.3 (0.9) 0.4 (0.5) 0.4 (0.8) 0.8 (0.8) 0.7 (1.0)
MPS-LCLS2 0.0 (0.0) 0.1 (0.3) 2.9 (4.0) 0.3 (1.0) 0.0 (0.1) 0.4 (1.1) 0.4 (1.1) 0.3 (0.9) 0.4 (0.8) 0.6 (0.9) 0.7 (0.8) 0.6 (1.0)
1
Algorithm by Bilge et al. (2004) EDR, MT, MPS: Initial solutions SCLS, LCLS: Candidate selection strategies of neighborhood solutions * Average value of RPRs for 10 cases (standard deviation in parentheses) 2
(b) Cases of medium due dates Machines Jobs 20 40 4 60 80 60 80 8 100 120 100 120 12 140 160
Bilge et al.1 3.6 (4.0) 9.3 (6.5) 9.2 (17.9) 13.8 (28.2) 4.0 (2.0) 3.5 (3.2) 4.4 (4.4) 10.2 (25.6) 9.0 (13.0) 3.6 (4.5) 1.2 (1.6) 1.7 (2.0)
EDR-SCLS2 3.6 (2.7) 20.0 (16.4) 14.5 (10.2) 15.9 (17.9) 14.0 (5.7) 8.8 (4.4) 13.7 (6.0) 15.9 (10.0) 21.7 (23.2) 9.7 (7.3) 7.2 (2.7) 8.6 (4.0)
MT-SCLS2 4.1 (3.2) 21.2 (15.4) 16.0 (16.5) 16.8 (21.9) 13.8 (5.5) 9.3 (4.4) 12.9 (6.2) 10.6 (6.2) 21.0 (24.1) 9.0 (5.4) 7.6 (3.3) 8.6 (4.2)
MPS-SCLS2 2.8 (2.8) 22.5 (17.6) 20.6 (18.0) 14.0 (16.4) 14.4 (9.1) 9.6 (5.5) 14.4 (7.1) 14.7 (9.7) 20.9 (23.5) 9.1 (6.4) 7.5 ( 2.8) 9.0 ( 3.5)
EDR-LCLS2 0.1 (0.1) 0.5 (0.9) 1.1 (2.6) 1.1 (2.9) 0.8 (1.1) 0.2 (0.3) 0.3 (0.5) 2.3 (6.7) 0.0 (0.1) 0.2 (0.2) 0.2 (0.3) 0.6 (0.7)
MT-LCLS2 0.0 (0.1) 0.7 (1.0) 1.2 (2.0) 2.1 (3.3) 0.7 (1.2) 0.4 (0.7) 0.3 (0.4) 2.3 (6.5) 3.8 (10.8) 0.4 (0.5) 0.3 (0.3) 0.4 (0.4)
MPS-LCLS2 0.0 (0.0) 0.8 (0.7) 0.4 (0.6) 1.3 (3.0) 1.1 (1.8) 0.4 (0.7) 0.4 (0.5) 2.5 (6.7) 4.5 (11.8) 0.3 (0.5) 0.2 (0.3) 0.5 (0.4)
MT-LCLS2 0.1 (0.3) 0.3 (0.5) 2.3 (3.4) 0.5 (1.4) 0.1 (0.2) 0.7 (0.9) 1.4 (1.8) 0.5 (0.9) 1.1 (1.3) 0.5 (0.8) 1.7 (3.3) 0.5 (0.7)
MPS-LCLS2 0.1 (0.2) 0.5 (0.7) 1.3 (2.1) 0.7 (1.1) 1.3 (1.0) 0.7 (1.1) 0.5 (0.7) 2.0 (3.4) 1.4 (1.6) 1.0 (1.7) 0.5 (0.7) 1.2 (3.1)
(c) Cases for tight due dates Machines Jobs 20 40 4 60 80 60 80 8 100 120 100 120 12 140 160
Bilge et al.1 1.4 (1.5) 7.5 (4.3) 4.8 (3.7) 8.8 (11.7) 3.2 (1.7) 3.8 (1.8) 6.9 (6.5) 8.2 (13.1) 7.0 (4.9) 4.6 (3.0) 14.6 (35.8) 5.0 (4.5)
EDR-SCLS2 1.1 (1.2) 12.4 (4.2) 28.5 (16.7) 29.5 (15.4) 13.5 (4.7) 21.0 (5.8) 23.2 (6.9) 25.8 (14.1) 18.2 (5.8) 16.4 (7.0) 25.1 (8.7) 19.6 (18.5)
MT-SCLS2 1.4 (1.2) 13.6 (4.2) 29.1 (16.7) 31.6 (15.4) 12.4 (4.7) 21.5 (5.8) 21.6 (6.9) 26.1 (14.1) 16.6 (5.8) 14.5 (7.0) 25.3 (8.7) 16.6 (18.5)
MPS-SCLS2 0.9 (1.0) 14.7 (6.6) 26.3 (14.1) 26.6 (12.7) 12.5 (4.6) 22.2 (5.7) 25.3 (10.7) 27.8 (18.8) 17.1 (3.9) 15.4 (5.2) 25.4 (10.6) 18.8 (16.5)
EDR-LCLS2 0.1 (0.1) 0.2 (0.4) 0.1 (0.2) 0.5 (0.8) 1.2 (1.0) 0.5 (0.6) 0.7 (1.2) 0.8 (1.1) 1.4 (1.5) 1.3 (2.2) 1.6 (1.8) 0.9 (1.2)
736
S.-I. Kim, H.-S. Choi, and D.-H. Lee
For the test, 360 problems were generated randomly, i.e., 10 problems for 36 combinations of three levels of the number of machines (4, 8 and 12), eight levels of the number of jobs (20, 40, 60, 80, 100, 120, 140 and 160), and three levels of the due date tightness (loose, medium, tight). More specifically, the number of jobs was set to 20, 40, 60 and 80 for the case of 4 machines, 60, 80, 100 and 120 for the case of 8 machines, and 100, 120, 140 and 160 for the case of 12 machines. The test problems were generated using the method of Sivrikaya-Serifoglu and Ulusoy (1999), which was also adopted by Bilge et al. (2004). Details of the problem generation method are omitted here because of the space limitation. Also, to find the most appropriate parameter values for the TS heuristics, several values for the parameters for the tabu list size (h) and the termination condition (LTS) were tested on several representative test problems, and they were set to 1 and 5000, respectively. Results for the test problems are summarized in Table 1 which shows the relative performance ratios of the seven TS heuristics for the cases of loose, medium, and tight due dates, respectively. It can be seen from the table that the TS heuristics with the LCLS strategy outperform the others. Although it is not reported, statistical tests also showed statistically significant difference between the heuristics with the LCLS strategy and the others. Also, no significant differences could be found among the three TS heuristics with the LCLS strategy. This implies the performances of the TS heuristics suggested in this paper do not depend on the initial solution method. In particular, the three TS heuristics outperform the existing algorithm of Bilge et al. (2004) significantly. More specifically, we could obtain improvements from the three TS heuristics about 1%, 6%, and 8% in overall average for the cases of loose, medium and tight due dates (Table 1(a), (b), and (c)), respectively. Also, due to their large search space, the heuristics with the LCLS strategy required longer computation times than those with the SCLS strategy. However, most of the test problems were solved within 1 hour even for the large-sized test problems with 12 machines and 160 jobs, which is reasonable for practical scheduling problems, e.g., an injection molding shops from our experience.
5 Concluding Remarks This paper considered the parallel machine scheduling problem with sequencedependent setup and ready times for the objective of minimizing total tardiness. The machines are uniform in that they can process all the jobs with difference processing and setup rates. Due to the complexity of the problem, we suggested tabu search heuristics together with three initial solution methods (EDR, MT, and MPS) and two neighborhood generation methods (SCLS and LCLS). To show the performances of the heuristics, computational experiments on randomly generated test problems were done and the results showed that the tabu search heuristics with LCLS outperformed the others due to their extended search space. In particular, they gave significant improvements over an existing heuristic. This research can be extended as follows. First, in the theoretical aspect, it may be needed to develop optimal algorithms. To the best of our knowledge, there is no other research article that suggests an optimal algorithm for generalized parallel machine scheduling with sequence-dependent setup and ready times. Second, more sophisticated neighborhood generation methods such as the intensification and
Tabu Search Heuristics for Parallel Machine Scheduling
737
diversification strategies can be developed for the search heuristics. Finally, other search techniques can also be applied to this complex problem.
Acknowledgements This work was supported by Korea Research Foundation Grant funded by Korean Government (MOEHRD) (KRF-2005-041-D00893). This is gratefully acknowledged.
References 1. Bean, J. C.: Genetic algorithm and random keys for sequencing and optimization, ORSA Journal on Computing, Vol. 6 (1994) 154-160. 2. Bilge, U., Kirac, F., Kurtulan, M., and Pekgun, P.: A tabu search algorithm for parallel machine total tardiness problem, Computers and Operations Research, Vol. 31 (2004) 397-414. 3. Cao, D., Chen, M., and Wan, G.: Parallel machine selection and job scheduling to minimize machine cost and job tardiness, Computers and Operations Research, Vol. 32 (2005) 1995-2012. 4. Cheng, T. C. E. and Sin, C. C. S.: A state-of-the-art review of parallel-machine scheduling research, European Journal of Operational Research, Vol. 47 (1990) 271-292. 5. Dastidar, G. S. and Nagi, R.: Scheduling injection molding operations with multiple resource constraints and sequence dependent setup times and costs, Computers and Operations Research, Vol. 32 (2004) 2987-3005. 6. Du, J. and Leung, J. Y. T.: Minimizing total tardiness on one machine is NP-hard, Mathematics of Operations Research, Vol. 15 (1990) 483-495. 7. Glover, F.: Tabu Search: Part I, ORSA Journal on Computing, Vol. 1 (1989) 190-206. 8. Koulamas, C.: Decomposition and hybrid simulated annealing heuristics for the parallel machine total tardiness problem, Naval Research Logistics, Vol. 44 (1997) 109-125. 9. Laguna, M., Barnes, J. W., and Glover, F.: Tabu search methods for a single machine scheduling problem, Journal of Intelligent Manufacturing, Vol. 2 (1991) 63-74. 10. Park, M.-W. and Kim, Y.-D.: Search heuristics for a parallel machine scheduling problem with ready times and due dates, Computers and Industrial Engineering, Vol. 33 (1997) 793-796. 11. Sivrikaya-Serifoglu, F. and Ulusoy, G.: Parallel machine scheduling with earliness and tardiness penalties, Computers and Operations Research, Vol. 26 (1999) 773-787.
The Maximum Integer Multiterminal Flow Problem C´edric Bentz CEDRIC-CNAM, 292, Rue Saint-Martin, 75141 Paris Cedex 03, France [email protected]
Abstract. Given an edge-capacitated graph and k terminal vertices, the maximum integer multiterminal flow problem (MaxIMTF) is to route the maximum number of flow units between the terminals. For directed graphs, we introduce a new parameter kL ≤ k and prove that MaxIMTF is N P-hard when k = kL = 2 and when kL = 1 and k = 3, and polynomial-time solvable when kL = 0 and when kL = 1 and k = 2. We also give an 2 log2 (kL + 2)-approximation algorithm for the general case. For undirected graphs, we give a family of valid inequalities for MaxIMTF that has several interesting consequences, and show a correspondence with valid inequalities known for MaxIMTF and for the associated minimum multiterminal cut problem.
1
Introduction
Routing problems in networks are commonly modeled by flow or multicommodity flow problems. Given an edge-capacitated graph (directed or undirected), the goal is to route flow units (requests) between prespecified vertices. When one seeks to route the maximum number of flow units from a unique source to a unique sink, the problem is the famous maximum flow problem. The FordFulkerson’s theorem [11] gives a good characterization for this case, which is efficiently solvable [1]. In particular, this theorem states that, if the capacities are integral, the value of a maximum integer flow is equal to the value of a minimum cut, i.e., to the value of a minimum weight set of edges whose removal separates the source from the sink. Unfortunately, this does not hold for more general variants. One of the most studied variant is the maximum integer multicommodity flow problem: given an edge-capacitated graph G = (V, E) and a list of source-sink pairs, the goal is to simultaneously route the maximum number of flow units, each unit being routed from one source to its corresponding sink. This problem is N P-hard even for two source-sink pairs [10], and cannot be approximated within |E|1/2− (resp. within (log |E|)1/3− ) for every > 0 in directed graphs [15] (resp. in undirected graphs [2]) unless P = N P (recall that, for a maximization (resp. minimization) problem, an α-approximation algorithm is a polynomial-time algorithm that always outputs a feasible solution whose value is at least 1/α times (resp. at most α times) the value of an optimal solution). The corresponding generalization of the problem of finding a minimum cut is the M. Gavrilova et al. (Eds.): ICCSA 2006, LNCS 3982, pp. 738–747, 2006. c Springer-Verlag Berlin Heidelberg 2006
The Maximum Integer Multiterminal Flow Problem
739
minimum multicut problem, which asks to select a minimum weight set of edges whose removal separates each source from its corresponding sink. This problem is also N P-hard in several special cases, and has a noticeable relationship with the former: the continuous relaxations of the linear programming formulations of the two problems are dual [8]. In particular, this interesting property has been used to design good approximation algorithms for both problems [14]. Further results and references concerning these problems can be found in [1] and [8]. Another generalization of the maximum flow problem is the maximum integer multiterminal flow problem (MaxIMTF): given an edge-capacitated graph and a set T = {t1 , . . . , tk } of terminal vertices, MaxIMTF is to route the maximum number of flow units between the terminals. Note that this problem is a particular maximum integer multicommodity flow problem in which the source-sink pairs are (ti , tj ) for i = j. The associated minimum multiterminal cut problem (MinMTC) is to select a minimum weight set of edges whose removal separates ti from tj for i = j. Note that MaxIMTF and MinMTC also have the duality relationship mentioned above. MinMTC has been widely studied in the undirected case (see [3], [5], [6], [7], [8], [9], [16] and [20]), and the directed case has also received some attention: Garg et al. [13] show that it is N P-hard even for k = 2 and give an 2 log2 k-approximation algorithm, and Naor and Zosin [18] give a 2-approximation algorithm. However, the algorithm of Garg et al. has an interesting property: it computes a multiterminal cut whose value is at most 2 log2 k times the value of an integer multiterminal flow, and hence is an 2 log2 kapproximation for both MinMTC and MaxIMTF (while the algorithm of Naor and Zosin does not provide an approximate solution for MaxIMTF). (Note that the same idea easily yields an log2 k-approximation algorithm for MaxIMTF in undirected graphs.) Costa et al. [8] show that MaxIMTF and MinMTC are polynomial-time solvable in acyclic directed graphs by using a simple reduction to a maximum flow and a minimum cut problem, respectively. To the best of our knowledge, these are the only results about MaxIMTF in directed graphs. In undirected graphs, MaxIMTF has recently be shown to be polynomial-time solvable by using the ellipsoid method [17] (the result is based on the associated Mader’s theorem on T -paths [19, Chap. 73]). Algorithmic aspects of special cases have also been studied (inner eulerian graphs in [12] and trees in [4]). However, it can be easily noticed that, for all the problems mentioned above, the general directed case is “harder” than the undirected one, since there exists a linear reduction from the latter to the former: simply replace each edge by the gadget given in [19, (70.9) on p. 1224]. The motivation of this paper is to explore further the complexity of MaxIMTF. Given a directed graph, we say that a terminal is lonely if it lies on at least one directed cycle containing no other terminal, and we let TL denote the set of lonely terminals and kL = |TL |. We shall see that kL is a key parameter for better understanding the complexity and approximability of MaxIMTF. Moreover, some of our results will extend to MinMTC. We first show that MaxIMTF is strongly N P-hard in directed graphs, even if kL = k = 2 or if kL = 1 and k = 3 (Section 2). Then, we prove MaxIMTF to
740
C. Bentz
be tractable when kL = 0 and when kL = 1 and k = 2, and improve the 2 log2 kapproximation algorithm of Garg et al. [13] by providing an 2 log2 (kL + 2)approximation algorithm for the general case (Section 3). Eventually, we give a family of valid inequalities for MaxIMTF in undirected graphs, and show an interesting correspondence with valid inequalities known for the associated problem MinMTC (Section 4). Note that, throughout this paper, we consider only simple graphs. We call Directed (resp. Undirected) MaxIMTF the problem MaxIMTF defined in directed (resp. undirected) graphs. Moreover, due to lack of space, we sometimes omit some details in our proofs.
2
N P-Hardness Proof
We show in this section that Directed MaxIMTF is strongly N P-hard, even if k = kL = 2 (or kL = 1 and k = 3). In order to do this, we adapt the proof, given in [10], of the N P-completeness of the directed integer multicommodity flow problem with two source-sink pairs, (s1 , s1 ) and (s2 , s2 ): given an arc-capacitated directed graph G = (V, A) and two integer demands d1 and d2 associated with the respective source-sink pairs, it asks to decide whether these demands can be simultaneously routed while respecting the capacity constraints (if, for an instance, the answer is yes, then this instance is solvable). In the instance used in the proof of [10, Theorem 3], d1 = 1, d2 ≤ |V | and all the arcs have capacity 1. Moreover, this instance satisfies |Γ − (s1 )| = |Γ − (s2 )| = 0 and
|Γ + (s1 )| = |Γ + (s2 )| = 0
where, for v ∈ V , Γ + (v) = {u ∈ V such that (v, u) ∈ A} and Γ − (v) = {u ∈ V such that (u, v) ∈ A}. We modify this initial instance as follows: we add two new vertices, t1 and t2 , and four arcs (t1 , s1 ), (s2 , t1 ), (t2 , s2 ) and (s1 , t2 ), valued by 1, d2 , d2 and 1 respectively (see Fig. 1(a)). It is easy to see that the initial instance is solvable if and only if the optimum value for the maximum integer multicommodity flow instance defined on the pairs (t1 , t2 ) and (t2 , t1 ) is equal to d2 + 1 (no flow unit being routed from si to sj , from si to sj for i = j, or from si to sj for i = j). Moreover, the latter instance is equivalent to a directed maximum integer multiterminal flow instance with two terminals, t1 and t2 . Eventually, we can replace each one of the two arcs (s2 , t1 ) and (t2 , s2 ) by d2 directed paths of length two (containing only arcs with capacity 1) between the corresponding endpoints, and obtain an instance of MaxIMTF where each arc has capacity 1. The described transformation is clearly polynomial, and hence Theorem 1. Directed MaxIMTF is N P-hard in graphs with unit capacities, even with only two terminals.
The Maximum Integer Multiterminal Flow Problem
S2
[d 2 ]
t2
t2
S2
[d 2 ]
[1] [1]
'
S1
S1 S1
[1] t1
t1 '
S2
(a) Reduction for kL = 2 and k = 2
t3
S '1
[1]
[d 2 ]
741
[d 2 ]
S '2
(b) Reduction for kL = 1 and k = 3
Fig. 1. Reductions for Directed MaxIMTF
In particular, this implies the strong N P-hardness of Directed MaxIMTF. It can also be noticed that this result matches the complexity result for the associated cut problem MinMTC in directed graphs [13]. However, in the proof of Theorem 1, k = kL = 2: so, what happens when kL = 1? Actually, a slightly different proof shows that Directed MaxIMTF remains N P-hard, even with unit capacities. We define three new terminals instead of two: t1 , t2 and t3 . Moreover, we add four arcs (t1 , s1 ), (s2 , t1 ), (t2 , s2 ) and (s1 , t3 ), valued by 1, d2 , d2 and 1 respectively (note that TL = {t1 }; see Fig. 1(b)). In this instance, it is easy to see that there exists an integer multiterminal flow of value d2 + 1 if and only if 1 flow unit is routed from t1 to t3 and d2 flow units are routed from t2 to t1 . This implies: Theorem 2. Directed MaxIMTF is N P-hard in graphs with unit capacities, even if kL = 1 and k = 3. We shall deal with the cases where kL = 0 and where kL = 1 and k = 2 in the next section.
3
Exact and Approximation Algorithms
From the previous section, Directed MaxIMTF is strongly N P-hard even for kL = 1 and k = 3 and for k = kL = 2. Hence, if P = N P, the only efficient algorithms one can expect to design are approximation algorithms. In this section, we improve the 2 log2 k-approximation algorithm of Garg et al. [13] and give an 2 log2 (kL + 2)-approximation algorithm for Directed MaxIMTF. The basic idea of our approach is to combine the algorithm of Garg et al. with an interesting strengthening of [8, Proposition 3]. The main idea of the proof of [8, Proposition 3] (that shows that MaxIMTF and MinMTC are polynomialtime solvable in acyclic directed graphs) is to split up each terminal vertex ti
742
C. Bentz
Fig. 2. Splitting up terminal ti
into two new vertices, ti and ti , such that all the vertices in Γ − (ti ) are linked to ti and ti is linked only to the vertices in Γ + (ti ). Then, we add two new vertices, σ and τ , and link (by arcs with sufficiently large capacities) every ti to τ and σ to every ti (see Fig. 2). Finally, we compute a maximum flow between σ and τ (obviously, we assume that the capacities are integral). The obtained flow is a valid integer multiterminal flow for the initial instance if, in the modified instance, no flow unit is routed from ti to ti for some i. The main point for us is that, if there is no lonely terminal, then, by splitting up the terminals as explained, there will remain no directed path from ti to ti for each i, and hence we will be able to solve MaxIMTF and MinMTC using the above technique. Actually, if we want to guarantee that, after splitting up each terminal, the modified graph does not admit a directed path from ti to ti for some i (otherwise, we cannot be sure that the flow we will compute in the modified graph will be a valid multiterminal flow in the initial graph), this is essentially the best (i.e., weakest) assumption that can be made. Namely, one can easily show: Theorem 3. After splitting up all the terminals, there is no directed path between ti and ti for each i if and only if kL = 0. This also implies the following strengthening of [8, Proposition 3]: Theorem 4. MinMTC and MaxIMTF are polynomial-time solvable in directed graphs if kL = 0, by using a max flow-min cut algorithm. Actually, the last remaining case, i.e., the case where kL = 1 and k = 2, is also polynomial-time solvable. Indeed, one can prove that on any directed cycle containing only the terminal in TL , there is a removable arc, i.e., an arc lying on no elementary path between the two terminals (otherwise, there would be a vertex of this cycle lying on another directed cycle containing only the terminal not in TL , which is impossible). By iteratively removing such arcs, we are back to the case where kL = 0. Hence: Theorem 5. Directed MaxIMTF is tractable if kL = 1 and k = 2. Theorems 3 and 4 show the importance of the parameter kL for both MaxIMTF and MinMTC. Moreover, this suggests the following approach for finding
The Maximum Integer Multiterminal Flow Problem
743
approximate solutions for these two problems: first, (a) split up each terminal ti ∈ T − TL into ti and ti as explained above, add the two vertices σ and τ , and link (by heavy arcs) every ti to τ and σ to every ti ; then, (b) compute a multiterminal cut and flow for this new instance (i.e., where the terminal set is TL {σ, τ }) by using the algorithm of Garg et al. [13]. The definition of TL guarantees that we obtain a valid integer multiterminal flow. Hence, the main difference with their algorithm is that, before using their divide-and-conquer strategy, we transform the graph by replacing the terminals in T − TL by two new terminals, σ and τ . This implies that we use Garg et al.’s algorithm on an instance with kL + 2 terminals, and so we obtain an approximation factor of 2 log2 (kL + 2) (instead of 2 log2 k). Actually, one can even prove that this analysis of the approximation ratio is tight by using a particular family of instances built on an undirected tree with k = 2p vertices (all vertices are terminal), and which is transformed into a directed graph by replacing each edge by the gadget given in [19, (70.9) on p. 1224] (each arc of the gadget has the capacity of the initial edge). Due to lack of space, we do not give the whole construction here.
4
Polyhedral Results for the Undirected Case
In this section, we give a family of valid inequalities for the LP formulation of Undirected MaxIMTF given in [13]. We call them tree inequalities, as they can be seen as the “flow counterpart” of the tree inequalities given in [7] and characterizing completely the polytope of MinMTC in trees (as shown in [7]). Theorem 6. Let U be an undirected tree and XU = {t1 , . . . , th } be the terminals in U . Assume the leaves of U coincide with these h terminals (assume without loss of generality that we have removed from U the edges such that no flow unit can be routed through them). Then, if FU denotes the total number of flow units that are routed between the terminals in XU , the inequality i∈{1,...h} ci FU ≤ 2 is a valid inequality for MaxIMTF (called tree inequality), where, for each i, ci is the minimum capacity of the edges contained in the path pi linking ti to ni , its nearest vertex of degree at least 3 in U (see Fig. 3). A proof of this theorem is given below. In particular, this proof will imply that these inequalities (together with the usual constraints) are sufficient to guarantee the existence of an integer optimal solution to the continuous relaxation of the linear program (i.e., of an optimal solution for MaxIMTF) in undirected trees (recall that the tree inequalities for MinMTC do give a complete characterization of the associated polytope in undirected trees [7]). These inequalities may also be sufficient to completely characterize the polytope associated with MaxIMTF in undirected trees. On the other hand, a quadratic algorithm for
744
C. Bentz
ti p'i
ui vi
pi ni
Fig. 3. A tree instance for Theorem 6 ((ui , vi ) has capacity ci )
MaxIMTF in trees is already known [4]. Our original motivation came from the fact that the authors of [7] used the description of the polytope associated with MinMTC to derive an efficient algorithm for MinMTC in trees by using complementary slackness conditions: can it be done for MaxIMTF? Actually, one can prove that the tree inequalities for MaxIMTF are a special case of a more general class of valid inequalities, that we call inner odd set inequalities: they have been used very recently (the paper [17] has appeared just after the submission of the present paper) to prove that Undirected MaxIMTF was polynomial-time solvable via the ellipsoid method. They are derived from the fact that evenness considerations are well-known to be of great importance in integer flow problems with several sources and sinks (see [12] for example). They can be defined as follows: Definition 1. Let G = (V, E) be an undirected graph and let T be the set of terminal vertices. For each X ⊆ V \ T , let (X, V \ X) be the set of edges lying between X and V \ X and let c(X, V \ X) be the total capacity of the edges in (X, V \ X). Then, F , the total number of flow units routed between the terminals in T , satisfies the following valid inequality (called inner odd set inequality) c(X, V \ X) F ≤ . 2 Roughly speaking, the validity of these inequalities comes from the fact that each flow unit in G is routed through no or an even number of edges in (X, V \ X), since X contains no terminal (see Fig. 4). Hence, each flow unit routed through an edge in (X, V \ X) is counted at least twice, and the total amount of flow in (X, V \ X) is thus equal to (at least) 2F . This amount being at most c(X, V \ X), we can divide both sides of the inequality by 2 and use the integrality of F to obtain the desired result. Now we can prove Theorem 6 by using Definition 1 and the following fact. Theorem 7. The tree inequalities are a special case of the inner odd set inequalities. Proof. We use the notations of Theorem 6. Given an undirected tree U , for each i, let (ui , vi ) be an edge of pi with capacity ci (ui lying in the path from ti to vi ),
The Maximum Integer Multiterminal Flow Problem
745
T
X
V\X (X,V \ X) Fig. 4. An inner odd set configuration (c(X, V \ X) is odd)
and let pi be the path from ti to ui (see Fig. 3). Then, the tree inequality onU is obtained by taking the inner odd set inequality defined on the set X = V \ i pi , since any flow unit has to be routed through at least one edge (and, in fact, through exactly two edges) in (X, V \ X), i.e., through (ui , vi ) and (uj , vj ) for some i and j. In trees, it is not difficult to see that all the inner odd set inequalities that matters are the ones corresponding to tree inequalities. Hence, from [17], these inequalities suffice to guarantee the existence of integer optimal solutions in undirected trees. In fact, there exists an interesting relationship between the inner odd set inequalities and the inner eulerian assumption made in [12]. Given an edgecapacitated undirected graph G = (V, E), the degree of a vertex v ∈ V , denoted by dV (v), is the sum of the capacities of the edges adjacent to v. Moreover, forX ⊆ V , dX (v) is the degree of vertex v in the subgraph of G induced by X {v}. A graph is inner eulerian if every non terminal vertex has an even degree. Theorem 8 shows that the inner odd set inequalities are useless if the graph is inner eulerian. Theorem 8. Given a graph G = (V, E) and a set of terminal vertices T , G is inner eulerian if and only if ∀X ⊆ V \ T , c(X, V \ X) is even. Proof. It is easily seen that, by definition, G is inner eulerian if and only if ∀X ⊆ V \ T with |X| = 1, c(X, V \ X) is even. Now, assume that G is inner eulerian, and let X ⊂ V contain no terminal. Then, we have c(X, V \ X) = dV \X (v) = dV (v) − dX (v) . v∈X
v∈X
v∈X
Moreover, v∈X dV (v) is even since G is inner eulerian, and v∈X dX (v) is always even (because it is equal to two times the sum of the capacities of the
746
C. Bentz
edges having both endpoints in X). This implies that c(X, V \ X) is even, and Theorem 8 follows. An interesting question would be to determine whether the inner odd set inequalities (together with the usual constraints) give a complete characterization of the polytope of Undirected MaxIMTF. Theorem 8 shows that a positive answer to this question would imply that the polytope of the continuous relaxation of the LP formulation of Undirected MaxIMTF is integral in inner eulerian undirected graphs (the existence of integer optimal solutions was already known [12]).
5
Conclusion and Open Problems
The parameter kL introduced in this paper improves our knowledge of the boundary between tractable and intractable cases of MaxIMTF in directed graphs. Several results, positive and negative, about tractability and approximability of this problem, are provided. Moreover, we have given a family of valid inequalities for the undirected case, and have proved an interesting correspondence with valid inequalities already known. However, two important questions remain open: is there an O(1)-approximation algorithm for the general directed case? Moreover, are they other tractable special cases (e.g., planar digraphs)?
References 1. Ahuja, A.K., Magnanti, T.L., Orlin, J.B.: Network Flows – Theory, Algorithms, and Applications. Prentice Hall, Englewood Cliffs, New Jersey (1993). 2. Andrews, M., Zhang, L.: Hardness of the undirected edge-disjoint paths problem. Proceedings STOC’05 (2005) 276–283. 3. Bertsimas, D., Teo, C.-P., Vohra, R.: Analysis of LP relaxations for multiway and multicut problems. Networks 34 (1999) 102-114. 4. Billionnet, A., Costa, M.-C.: Multiway cut and integer flow problems in trees. In: Liberti, L., Maffioli, F. (eds.): CTW04 Workshop on Graphs and Combinatorial Optimization. Electronic Notes in Discrete Mathematics 17 (2004) 105–109. 5. Cˇ alinescu, G., Karloff, H., Rabani, Y.: An improved approximation algorithm for Multiway Cut. Proceedings STOC’98 (1998) 48–52. 6. Chen, D.Z., Wu, X.: Efficient algorithms for k-terminal cuts on planar graphs. Algorithmica 38 (2004) 299–316. 7. Chopra, S., Rao, M.R.: On the multiway cut polyhedron. Networks 21 (1991) 51– 89. 8. Costa, M.-C., L´etocart, L., Roupin, F.: Minimal multicut and maximal integer multiflow: a survey. European Journal of Operational Research 162 (2005) 55–69. Elsevier. 9. Dahlhaus, E., Johnson, D.S., Papadimitriou, C.H., Seymour, P.D., Yannakakis, M.: The complexity of multiterminal cuts. SIAM Journal On Computing 23 (1994) 864–894. 10. Even, S., Itai, A., Shamir, A.: On the complexity of timetable and multicommodity flow problems. SIAM Journal on Computing 5 (1976) 691–703.
The Maximum Integer Multiterminal Flow Problem
747
11. Ford, L.R., Fulkerson, D.R.: Maximal Flow Through a Network. Canadian Journal of Mathematics 8 (1956) 339–404. 12. Frank, A., Karzanov, A., Seb¨ o, A.: On integer multiflow maximization, SIAM J. Discrete Mathematics 10 (1997) 158–170. 13. Garg, N., Vazirani, V.V., Yannakakis, M.: Multiway cuts in directed and node weighted graphs. In: Abiteboul, S., Shamir, E. (eds.): 21st International Colloquium on Automata, Languages and Programming. Lecture Notes in Computer Science, Vol. 820. Springer-Verlag (1994) 487–498. 14. Garg, N., Vazirani, V.V., Yannakakis, M.: Primal-dual approximation algorithms for integral flow and multicut in trees. Algorithmica 18 (1997) 3–20. 15. Guruswami, V., Khanna, S., Rajaraman, R., Shepherd, B., Yannakakis, M.: Nearoptimal hardness results and approximation algorithms for edge-disjoint paths and related problems. Proceedings STOC’99 (1999) 19–28. 16. Karger, D., Klein, P., Stein, C., Thorup, M., Young, N.: Rounding Algorithms for a Geometric Embedding of Minimum Multiway Cut. Proceedings STOC’99 (1999) 668–678. 17. Keijsper, J.C.M., Pendavingh, R.A., Stougie, L.: A linear programming formulation of Mader’s edge-disjoint paths problem. Journal of Combinatorial Theory, Series B 96 (2006) 159–163. 18. Naor, J., Zosin, L.: A 2-approximation algorithm for the directed multiway cut problem. Proceedings FOCS’97 (1997) 548–553. 19. Schrijver, A.: Combinatorial Optimization - Polyhedra and Efficiency. Algorithms and Combinatorics 24. Springer-Verlag (2003). 20. Yeh, W.-C.: A Simple Algorithm for the Planar Multiway Cut Problem. Journal of Algorithms 39 (2001) 68–77.
Routing with Early Ordering for Just-In-Time Manufacturing Systems Mingzhou Jin, Kai Liu, and Burak Eksioglu Department of Industrial Engineering, Mississippi State University, P.O. Box 9542, Mississippi State, MS, 39762, USA [email protected] Abstract. Parts required in Just-In-Time manufacturing systems are usually picked up from suppliers on a daily basis, and the routes are determined based on average demand. Because of high demand variance, static routes result in low truck utilization and occasional overflow. Dynamic routing with limited early ordering can significantly reduce transportation costs. An integrated mixed integer programming model is presented to capture transportation cost, early ordering inventory cost and stop cost with the concept of rolling horizon. A four-stage heuristic algorithm is developed to solve a real-life problem. The stages of the algorithms are: determining the number of trucks required, grouping, early ordering, and routing. Significant cost savings is estimated based on real data.
1 Introduction The Just-In-Time (JIT) philosophy originated from the work of Taiichi Ohno at Toyota Motor Company and made its way to the US about 20 years ago [1]. The JIT philosophy is now adopted by most automakers all over the world. Based on the JIT philosophy, inventory is considered a big cost contributor so that the goal is to reduce inventory levels to “zero” [2]. Therefore, parts are ordered and transported only when they are needed in the production. Based on a recent project with one of the major automakers in the US that implements a JIT system, we found out that the inbound logistics decision making process has the following procedures: 1. The production plan is determined based on dealer orders or forecasted demand, and it is derived by manufacturing needs such as line balancing. 2. The parts are ordered based on daily production needs, and the logistics group has little control on how many to order and when to order. 3. Milk-runs (routes) are determined based on average demand. Following the milk-runs trucks visit the suppliers to pick up the required parts and then come back to the manufacturing plant. Each truck has one run everyday. A supplier may be visited by one or more trucks. This is different from the typical capacitated vehicle routing problem (CVRP), in which each supplier is visited by exactly one truck. The problem is similar to the split delivery vehicle routing problem (SDVRP) [3, 4]. Because inbound logistics costs are not considered during production planning, high volatility of part consumption in the assembly line leads to frequent and small M. Gavrilova et al. (Eds.): ICCSA 2006, LNCS 3982, pp. 748 – 756, 2006. © Springer-Verlag Berlin Heidelberg 2006
Routing with Early Ordering for Just-In-Time Manufacturing Systems
749
batches. The practice of assigning the suppliers to the trucks (milk-runs) based on average daily demand and keeping the same routes every day makes the problem worse. Furthermore, routes are typically determined manually, which also hurts the efficiency. Truck utilizations can be as low as 30%, while sometimes overflow happens when additional trucks are required. In this paper we address the above mentioned problem of the mismatch between production and logistics. To eliminate this mismatch, “dynamic routing” and “early ordering” are proposed as solutions. Dynamic routing means that the routes are determined daily based on production needs. Though integrated production planning and route scheduling is implemented in some other industries [5], in the automotive industry it is difficult to fully incorporate logistics needs into production planning. The industry has implemented a manufacturing driven JIT system for such a long time that any major change requires approval from high-level management, which is usually difficult to achieve. The proposed early ordering policy will not affect the production planning process. In the automotive industry, production plans are made several days before actual production starts. Thus, the required parts are known several days before they are actually consumed in the assembly line. Early ordering policy simply allows the parts to be shipped one or two days early to save transportation cost. However, late ordering is not allowed in order not to disturb the manufacturing process. The parts that are ordered early will be stored in the inbound warehouse for one or two days and possibly increase inventory holding costs. In Section 2, a mixed integer programming (MIP) model is proposed for the routing problem with early ordering in a JIT environment. A heuristic algorithm to solve the model is presented in Section 3. Section 4 concludes the paper and includes some cost saving estimates based on a real case.
2 A Mixed Integer Programming Model for Daily Routing with Early Ordering In an assembly system such as the auto assembly plant described above, there are two main costs: transportation and early ordering inventory costs. The transportation cost is composed of a fixed cost for each truck, a variable cost for each mile, and a fixed cost for each stop. Among them, the fixed cost for each truck dominates the others. In the literature, most routing studies do not consider the stop issue. The automotive company that we have studied has to pay a fixed amount to the trucking companies for each stop on the routes because a stop means additional handling time and effort. The number of stops is also a constraint because a truck can not finish a route in one day if it has to make too many stops. In the standard CVRP models, since each supplier can be visited exactly once, the number of total stops is fixed. Therefore, there is no need to consider stop costs in the CVRP models. Though the routing decision influences the number of stops in an SDVRP model, stop costs and constraints are usually not addressed in the SDVRP literature [6-8]. Since early ordering is allowed, additional inventory holding costs are considered in the proposed model. Typically, a major component of the inventory holding cost is the cost of capital invested [1]. However, automakers and their suppliers have longterm relationships, and the payments are made periodically (e.g. weekly or biweekly).
750
M. Jin, K. Liu, and B. Eksioglu
Therefore, the inventory holding costs are mainly driven by the space occupied rather than the capital invested. The demand for one part could be several hundred pieces or more every day, and they are held in containers during shipping and handling. The number of parts in a container is called a unit load, which may have several dozen or up to hundred pieces of the same part. Therefore, the demand and the amount delivered are measured in unit loads rather than pieces. The notation and the model for the routing problem with early ordering are given below. Parameters: K: T: P: N: C: u: q: λ: w: dt,p: ci,j: rp,i:
the number of available trucks (k=1,2,…,K); the number of days in the planning horizon (t=1,2,…,T); the set of parts (p∈P); the number of suppliers (i,j=0,1,2,…,N; 0 is used for the origin); truck capacity; the inventory holding cost per unit space for early ordered parts; the fixed cost per truck per day; the variable transportation cost per truck per mile; the cost for one stop; the demand (in unit loads) for part p on day t (with lead time); the distance from supplier i to supplier j; indicates whether or not part p is provided by supplier i (0: no; 1: yes; note that each part is provided only by one supplier;
N
∑ rp ,i = 1 ,
p∈P);
i =1
v p: S:
space required by one unit load of part p; the maximum number of stops allowed for each truck.
Decision variables: ot,k,p: xt,k,i,j: lt,k,i: st,k,i: It,p:
the unit loads of part p shipped by truck k on day t; equals 1 if truck k visits supplier j right after supplier i on day t; 0 otherwise; the remaining capacity of truck k after visiting supplier i on day t (l0,k,t = C); equals 1 if truck k visits supplier i on day t; 0 otherwise; the inventory (in unit loads) of p ordered early on day t.
The MIP model for daily routing with early ordering: T
min u ∑
T
K
N
t =1 p∈P
t =1 k =1 j =1
K
s.t.
T
K
N
T
K
N
∑ v p It,p + q∑ ∑ ∑ xt,k,0 , j + λ ∑ ∑ ∑ ci , j xt ,k ,i, j + w∑ ∑ ∑ st ,k , j
I t −1, p − I t,p + ∑ ot,k , p = dt,p
t =1 k =1 i , j = 0
t =1 k =1 j =1
t = 1,..., T ; p ∈ P;
(1.1) (1.2)
k =1
−C ⋅ st ,k ,i +
∑
i = 0,.., N ,i ≠ j
∑ v p rp,i ot,k , p ≤ 0
p∈P
xt ,k ,i , j − st ,k , j = 0
t = 1,..., T; k = 1, ..., K;i = 1,..., N ; (1.3) t = 1,..., T; k = 1,..., K; j = 1,..., N ; (1.4)
Routing with Early Ordering for Just-In-Time Manufacturing Systems
∑
i = 0, ..., N ,i ≠ j
xt ,k ,i , j −
Cxt ,k ,i , j +
∑
i = 0,..., N ,i ≠ j
∑ v p rp,i ot,k , p −
p∈P
xt ,k , j ,i = 0
751
t = 1, ..., T; k = 1,..., K; j = 1, .., N ; (1.5)
lt ,k ,i + lt ,k , j ≤ C t = 1,..., T ; k = 1,..., K; i, j = 0,..., N; j ≠ i ;
(1.6)
t = 1,..., T ; k = 1,..., K;
(1.7)
N
∑ st ,k , j ≤ S j =1
lt , k ,i , I t , p , ot,k , p ≥ 0; st ,k ,i , xt , k ,i , j : binary ; ot,k , p :integers. The objective function (1.1) minimizes the sum of the inventory holding costs and the transportation costs. The first constraint set (1.2) represents the inventory evolvement over days and ensures that the production needs of all parts are satisfied. The second constraint set (1.3) ensures that only those trucks that visit supplier i pick up parts from supplier i. The third constraint set (1.4) help obtain the numbers of truck stops. The fourth constraint set (1.5) is used to keep the flow at each supplier balanced (i.e. the number of trucks arriving at a supplier equals the number of trucks leaving that supplier). The fifth constraint set (1.6) makes sure that the amount picked up by a truck does not exceed its capacity and eliminates sub-tours. The last set of constraints (1.7) is used to make sure that the number of stops by a truck does not exceed the maximum number allowed. The proposed model combines transportation and inventory decisions. It is a variant of the SDVRP with additional constraints to address the issues of truck stops and early ordering. The standard SDVRP takes the total required space at each demand point into consideration, but this model addresses the problem at the part level. Since thousands of parts are required and hundreds of suppliers need to be visited every day, the model of the whole inbound logistics for this automaker is very large. The SDVRP itself is a well-known NP-complete problem [3]. Therefore, good solutions rather than optimal solutions are expected in practice for a large-scale SDVRP. The early ordering policy makes the computational burden even heavier by adding one more dimension of time into the problem. In reality, the suppliers are usually grouped by regions, and several transportation service providers are in charge of one region. Thus, the logistics problem of each region can be solved separately. However, dozens of suppliers and hundreds of parts are usually involved in a region. In a problem with 15 suppliers, 150 parts, and 7 available trucks there will be more than 3000 binary variables when only one-day early ordering is allowed. When CPLEX [9], a commercial optimization software, is used to solve the model, 800MB of memory is used up after one day of calculations while the gap between the lower and upper bounds is still more than 90%. Therefore, fast heuristics are necessary for daily decision making.
3 A Four-Stage Heuristic Algorithm A number of constructive heuristics can be found in the literature for the CVRP or the SDVRP. Most of them are two-stage algorithms including a grouping stage and a routing stage. Typically, a bin packing problem with restrictions is used in the grouping stage. Belenguer et al. [4] develop an algorithm based on a lower bound to
752
M. Jin, K. Liu, and B. Eksioglu
solve the SDVRP (this needs some more explanation). Archetti et al. [5] use dynamic programming to solve SDVRP instances where vehicles have small capacities. For the model given in Section 2, we develop a four-stage heuristic algorithm. The basic scheme is illustrated in Fig. 1.
Initialize: n=0 n=n+1
Determine the number of required trucks for the next T days (t=n, n+1,…,n+T-1)
Stage 1
Group the suppliers for day t=n
Stage 2
No
Is early ordering necessary?
Stage 3
Yes Adjust truck loading to implement early ordering Route trucks for each group on day n
Stage 4
Implement the routing and pickup plan for day n Fig. 1. The basic scheme of the four-stage heuristic
In the first stage, the number of required trucks is calculated for days t=n, n+1,…, n+T-1. Initially, early ordering is not considered and the number of required trucks is found by the following equation: Kt =
where θt =
∑ v p dt , p
θt (Total required volume of day t ) C (truck capacity) × 0.9 (Allowance)
,
(2)
and the allowance of 0.9 is used to account for space that may
p∈P
be wasted because of unit loads. The number of required trucks on day t without early ordering is ⎡K t ⎤ . If only one-day early ordering is allowed, early ordering is implemented if the following two conditions are satisfied: (3.1) ⎢⎡ Kt + Kt +1 ⎥⎤ < ⎢⎡ Kt ⎥⎤ + ⎢⎡ Kt +1 ⎥⎤ ; and Kt +1 - ⎢⎣ Kt +1 ⎥⎦ ≤ UB . (3.2) Since the fixed cost for each truck dominates other costs, early ordering is only implemented when a truck can be saved on the next day (condition (3.1)). For example, if Kt=2.3 and Kt+1=2.2 then three trucks are required for both days without
Routing with Early Ordering for Just-In-Time Manufacturing Systems
753
early ordering. If 0.2 truck space of parts can be moved from day t+1 to day t (i.e. Kt=2.5 and Kt+1=2), one truck can be saved on day t+1. Also, note that early ordering increases inventory holding costs. Therefore, it is implemented only when the space moved from day t+1 to day t is not more than UB (e.g. 30%) of the truck capacity (condition (3.2)). In the second stage, grouping is done to determine which suppliers should be visited by each truck. There are many grouping heuristics in the literature. The basic idea is to group the nearby suppliers together without violating the truck capacity constraints. We propose the following optimization model to solve the grouping problem for each day t. N K
N K
i =1 k =1
i =1 k =1
x+ x− y+ y− min w∑ ∑ st ,k ,i + τ ∑ ∑ ( J i ,k + J i ,k + J i ,k + J i ,k )
s.t.
∑ v p ot ,k,p ≤ C
(4.1) k = 1,2,...,K ;
(4.2)
∑ ot ,k , p = dt , p
p ∈ P;
(4.3)
∑ v p rp,i ot ,k , p ≤ Cst ,k ,i
k = 1,2,...,K ; i = 1,2,...,N ;
(4.4)
k = 1,2,...,K ;
(4.5)
p∈P K
k =1
p∈P N
∑ st ,k ,i ≤ S i =1 qkx − bix
≤ M (1 − st ,k ,i ) + J ix, k+
i = 1, 2 ,..., N; k = 1, 2 ,...,K ;
(4.6)
bix
− qkx
≤ M (1 − st ,k ,i ) +
J ix,k−
i = 1,2,...,N ; k = 1,2,...,K ;
(4.7)
qky
− biy
≤ M (1 − st ,k ,i ) +
J iy, k+
i = 1,2,...,N ; k = 1,2,...,K ;
(4.8)
i = 1,2,...,N ; k = 1,2,...,K ;
(4.9)
biy -qky
≤ M (1 − st ,k ,i ) +
J iy, k−
st ,k ,i : binary; ot ,k,p ≥ 0, integer;
qkx , qky , J ix,k+ , J ix,k− , J iy, k+ , J iy,k−
≥0.
where bix , biy : M: τ: qkx , qky : J ix,k+ , J ix,k− , J iy,k+ , J iy,k− :
the (x, y) coordinators of supplier i; a large number to facilitate modeling; cost per unit distance; the latitude and longitude of the virtual center for truck k; the rectangular distance between supplier i and virtual center of truck k.
If truck k serves a set of suppliers (Ik) in the grouping model, its virtual center is
(
)
defined as (q kx , qky ) that minimizes ∑ bix − qkx + biy − qky . The objective function i∈I k
(4.1) minimizes the total costs including the stop cost and the distance cost
754
M. Jin, K. Liu, and B. Eksioglu
approximated by the sum of rectangular distances between the virtual center of the trucks and the suppliers served by those trucks. The first constraint set (4.2) forces the load of each truck to be less than or equal to the capacity. Constraint set (4.3) has demand satisfied for all parts requested by the assembly line. Constraint set (4.4) ensures that truck k stops at supplier i if parts are to be picked by truck k from that supplier. Constraint set (4.5) makes sure that the number of stops by a truck does not exceed the maximum number allowed. The last four constraint sets (4.6-4.9) are used to obtain the rectangular distances from the suppliers to the virtual centers of the trucks. Though the model looks cumbersome, the number of variables and constraints are significantly less than those of the original model (1.1-1.7). A problem with 15 suppliers, 150 parts and 7 available trucks has about 105 binary variables and 750 constraints. CPLEX can yield a solution to such a relatively small problem in 5 minutes with less than 1% gap on average. In the third stage, how to implement the early ordering policy is determined. Let Lk,t be the remaining capacity of truck k on day t. The following algorithm given in Fig. 2 is developed for implementing one-day early ordering. Assume et is the total space occupied by the parts ordered early on day t ( et = 0.9C ( K t +1 - ⎣K t +1 ⎦ ) . Step 0. Initialize k (k = 1) Step 1. Move all parts provided by one supplier i, served by truck k and satisfying both of the following conditions: • they are needed on both day t and t+1 :
∑ rp ,i ot , k , p ≥ 0
p∈P
and
∑ rp ,i dt +1, p ≥ 0 ;
p∈P
• the total day t+1 volume from the supplier i does not exceed the remaining capacity: ∑ rp,i dt +1, p ≤ C − Lk ; p∈P
If a movement happens, • update the utilized capacity Lk of truck k; • update the total early ordered volume. Step 3. If the total early ordered volume reaches ei, go to end. Step 4. If k < K: k=k+1 and go to step 1. Step 5. Move the parts satisfying the following conditions for truck k=1,...,K until the total early ordered volume reaches ei: • It is needed on both day t and t+1: ot ,k , p ≥ 0 and dt +1, p ≥ 0 • Day t+1 volume of the same part does not exceed the remaining capacity of the truck: dt +1, p ≤ C − Lk Step 6. If the total early ordered volume is still smaller than ei, arbitrarily move the parts needed on both days until the early ordered volume reaches ei. Fig. 2. The Algorithm to Determine the Early Ordering Policy
Routing with Early Ordering for Just-In-Time Manufacturing Systems
755
The algorithm that determines the early ordering policy first tries to reduce the number of stops on day t+1 without increasing the number of stops on day t. The second priority is to reduce the number of handlings of the same parts. The fourth stage deals with the routing problem for each truck which is a standard Travel Sales Problem (TSP). Though the TSP is an NP-hard problem, its optimal solution can be obtained in seconds by CPLEX in this case because each truck usually has at most five stops. The concept of rolling horizon is used for the overall algorithm. If only one-day early ordering is allowed (T=2), the first step is implemented for two days (t=n and n+1). Grouping and routing models are only solved for the first day (t=n). On the next day, the second day’s demand information will be updated, and the early ordered parts will be deducted from it. The four-stage algorithm will be implemented after updating n=n+1 with the new information of the third day’s demand.
4 Implementation and Conclusion The proposed four-stage algorithm is implemented on a real inbound logistics problem faced by a major automotive company in the US. The region under study has 15 suppliers and 158 parts, and usually 4 to 7 trucks are required every day. Only oneday early ordering is allowed because of information availability and inventory concerns. One month’s worth of real data is used, and the result is obtained in 10 minutes. The average truck utilization is improved from 40% to 80% while its variability over trucks also becomes much smaller. The total cost savings is about 20% including 24% savings on the number of used trucks, 17% savings on the total number of stops, and 15% savings on the total traveled distance. Early ordering does not happen frequently in that about 2% of the parts (in space) are ordered early. Of the total 20% savings, about 4% is contributed by the early ordering policy. This paper presents a mixed integer programming model and a heuristic algorithm to improve the inbound logistics for Just-In-Time manufacturing systems. A dynamic routing policy and an early ordering policy are proposed to reduce the total cost. The proposed model and the algorithm are tested using real-life data obtained from an auto manufacturer. The recommended policies and the proposed heuristic algorithm works well in the sense that the computational speed is high while the quality of the solution is much better than that of the solution currently used by the auto manufacturer. In this paper, only a constructive heuristic is discussed without any improvement phase. A future research extension is to develop more sophisticated improvement heuristics to obtain better solutions to the problem.
References 1. Askin, R., Golderg, J.: Design and Analysis of Lean Manufacturing. John Wiley & Sons, New York (2002). 2. Monden, Y.: Toyota production system - 3rd edition: An Integrated Approach to Just-InTime. Institute of Industrial Engineering, Norcross, GA (1993). 3. Dror, M., Trudeau, P.: Savings by Split Delivery Routing. Transportation Science 23 (1989) 141-145.
756
M. Jin, K. Liu, and B. Eksioglu
4. Dror, M., Laporte, G., Trudeau P.: Vehicle Routing with Split Delivery. Discrete Applied Mathematics 50 (1994) 239-254. 5. Bredström, D., Rönnqvist, M.: Integrated Production Planning and Route Scheduling in Pulp Mill Industry. Proceedings of the 35th Hawaii International Conference on System Sciences (2002). 6. Belenguer, J.M., Martinez, M.C., Mota, E.: Lower Bound for the Split Delivery Vehicle Routing Problem. Operations Research 48 (2000) 801-810. 7. Archetti, C., Mansini, R., Speranza, M.G.: The Split Delivery Vehicle Routing Problem with Small Capacity. Technical Report, Department of Quantitative Methods, University of Brescia (2001). 8. Lee, C.G., Epelman, M., White, C.C.: A Shortest Path Approach to the Multiple-vehicle Routing with Split Picks-up. To appear in Transportation Research - Part B (2005). 9. ILOG, ILOG CPLEX 9.0 User’s Manual, 2003.
A Variant of the Constant Step Rule for Approximate Subgradient Methods over Nonlinear Networks Eugenio Mijangos University of the Basque Country, Department of Applied Mathematics and Statistics and Operations Research, P.O. Box 644, 48080 Bilbao, Spain http://www.ehu.es/~mepmifee
Abstract. The efficiency of the network flow techniques can be exploited in the solution of nonlinearly constrained network flow problems (NCNFP) by means of approximate subgradient methods (ASM). We propose to solve the dual problem by an ASM that uses a variant of the well-known constant step rule of Shor. In this work the kind of convergence of this method is analyzed and its efficiency is compared with that of other approximate subgradient methods over NCNFP.
1
Introduction
Many nonlinear network flow problems have nonlinear side constraints, these are named nonlinearly constrained network flow problems (NCNFP) and can be expressed as minimize x
f (x)
subject to x ∈ F c(x) ≤ 0,
(1) (2) (3)
where: – The set F is F = {x ∈ IRn | Ax = b, 0 ≤ x ≤ x}, where A is a node-arc incidence m × n-matrix, b is the production/demand m-vector, x are the flows on the arcs of the network represented by A, and x are the capacity bounds imposed on the flows of each arc. – The side constraints (3) are defined by c : IRn → IRr , such that c = [c1 , · · · , cr ]t , where ci (x) is nonlinear and twice continuously differentiable on the feasible set F for all i = 1, · · · , r. – f : IRn → IR is nonlinear and twice continuously differentiable on F .
The research was partially supported by grant MCYT DPI 2005-09117-C02-01.
M. Gavrilova et al. (Eds.): ICCSA 2006, LNCS 3982, pp. 757–766, 2006. c Springer-Verlag Berlin Heidelberg 2006
758
E. Mijangos
In recent works NCNFP has been solved using partial augmented Lagrangian methods with quadratic penalty function [7] and with exponential penalty function [8], and using approximate subgradient methods [8, 9]. In this work we focus on the primal problem NCNFP and its dual problem maximize subject to:
q(μ) = min l(x, μ) = min {f (x) + μt c(x)}
(4)
μ ∈ M,
(5)
x∈F
x∈F
where M = {μ | μ ≥ 0, q(μ) > −∞}. We assume throughout this paper that the constraint set M is closed and convex, q is continuous on M, and for every μ ∈ M some vector x(μ) that minimizes l(x, μ) over x ∈ F can be calculated, yielding a subgradient c(x(μ)) of q at μ. We propose to solve NCNFP by using primal-dual methods, see [1]. The minimization of the Lagrangian function l(x, μ) over F can be performed by means of efficient techniques specialized for networks, see [14]. Since q(μ) is approximately computed, we consider approximate subgradient methods [9] in the solution of this problem. The basic difference between these methods and the classical subgradient methods is that they replace the subgradients with inexact subgradients. Different ways of computing the stepsize in the approximate subgradient methods have been considered. The diminishing stepsize rule (DSR) suggested by Correa and Lemar´echal in [3] for exact subgradients. A dynamically chosen stepsize rule that uses an estimation of the optimal value of the dual function by means of an adjustment procedure (DSRAP) similar to that suggested by Nedi´c and Bertsekas in [11] for incremental subgradient methods. A dynamically chosen stepsize whose estimate of the optimal value of the dual function is based on the relaxation level-control algorithm (DSRLC) designed by Br¨ annlund in [2] and analyzed by Goffin and Kiwiel in [5]. A variant of the constant step rule (VCSR) of Shor [13]. The convergence of the first three methods was studied in the cited papers for the case of exact subgradients. The convergence of the corresponding approximate (inexact) subgradient methods is analyzed in [9], see also [6]. In this work the convergence of VCSR is analyzed and its efficiency compared with that of DSR, DSRAP, and DSRLC over NCNFP problems. The remainder of this paper is structured as follows. Section 2 presents the variant of the constant step rule together with the other ways of computing the stepsize in the approximate subgradient methods; Section 3 describes the solution to the nonlinearly constrained network flow problem; and Section 4 puts forward experimental results. Finally, Section 5 concludes the paper.
2
Approximate Subgradient Methods
When, as happens in this work, for a given μ ∈ M, the dual function value q(μ) is calculated by minimizing approximately l(x, μ) over x ∈ F (see (4)), the subgradient obtained (as well as the value of q(μ)) will involve an error.
A Variant of the Constant Step Rule for Approximate Subgradient Methods
759
In order to analyze such methods, it is useful to introduce a notion of approximate subgradient [13, 1]. In particular, given a scalar ε ≥ 0 and a vector μ with q(μ) > −∞, we say that c is an ε-subgradient at μ if q(μ) ≤ q(μ) + ε + ct (μ − μ),
∀μ ∈ IRr .
(6)
The set of all ε-subgradients at μ is called the ε-subdifferential at μ and is denoted by ∂ε q(μ). An approximate subgradient method is defined by μk+1 = [μk + sk ck ]+ ,
(7)
where ck is an εk -subgradient at μk and sk a positive stepsize. In our context, we minimize approximately l(x, μk ) over x ∈ F, thereby obtaining a vector xk ∈ F with l(xk , μk ) ≤ inf l(x, μk ) + εk .
(8)
x∈F
As is shown in [1, 9], the corresponding constraint vector, c(xk ), is an εk -subgradient at μk . If we denote qεk (μk ) = l(xk , μk ), by definition of q(μk ) and using (8) we have q(μk ) ≤ qεk (μk ) ≤ q(μk ) + εk ∀k. (9) 2.1
Stepsize Rules
Throughout this paper, we use the notation q ∗ = sup q(μ), μ∈M
M∗ = {μ ∈ M | q(μ) = q ∗ },
dist(μ, M∗ ) =
inf
μ∗ ∈M∗
μ − μ∗ ,
where · denotes the standard Euclidean norm. Assumption 1: There exists scalar C > 0 such that for μk ∈ M, εk ≥ 0 and ck ∈ ∂εk q(μk ), we have ck ≤ C, for k = 0, 1, . . . In this paper, four kinds of stepsize rules have been considered. Variant of the Constant Step Rule (VCSR). As is well known the classical scaling of Shor (see [13]) s sk = k c gives rise to a s-constant-step algorithm. Note that constant stepsizes (i.e., sk = s for all k) are unsuitable because the function q may be nondifferentiable at the optimal point and then {ck } does not necessarily tend to zero, even if {μk } converges to the optimal point, see [13]. On the other hand, in our case ck is an approximate subgradient, hence it can exist a k such that ck ∈ ∂εk q(μk ) with ck = 0, but εk not being sufficiently small. In order to overcome this trouble we have considered the following variant: s sk = , (10) δ + ck where s and δ are positive constants. The following proposition shows its kind of convergence.
760
E. Mijangos
Proposition 1. Let Assumption 1 hold. Let the optimal set M∗ be nonempty. Suppose that a sequence {μk } is calculated by the ε-subgradient method given ∞ by (7), with the stepsize (10), where k=1 εk < ∞. Then s (δ + C). 2
q ∗ − lim sup qεk (μk ) < k→∞
(11)
Proof. Let μ∗ be a point that maximizes q; i.e., μ∗ ∈ M∗ such that q ∗ = q(μ∗ ). According to (7) we have μk+1 − μ∗ 2 ≤ μk + sk ck − μ∗ 2 = μk − μ∗ 2 + 2sk (ck )t (μk − μ∗ ) + s2k ck 2 ≤ μk − μ∗ 2 − 2sk (q(μ∗ ) − qεk (μk ) − εk ) + s2k ck 2 , where the last inequality follows from the definition of ε-subgradient, see (6), which gives q(μ∗ ) ≤ qεk (μk ) + εk + (ck )t (μ∗ − μk ). Applying the inequality above recursively, we have μk+1 − μ∗ 2 ≤ μ1 − μ∗ 2 − 2
k
si [q(μ∗ ) − qεi (μi )] +
i=1
k
(s2i ci 2 + 2εi ),
i=1
k
k
which implies 2 i=1 si [q(μ∗ ) − qεi (μi )] ≤ μ1 − μ∗ 2 + i=1 (s2i ci 2 + 2εi ). Combining this with k k ∗ i si [q(μ ) − qεi (μ )] ≥ si [q(μ∗ ) − qεk ], i=1
i=1
where qεk = max qεi (μi ), we have the inequality 0≤i≤k
k k ∗ 1 ∗ 2 2 si [q(μ ) − qεk ] ≤ μ − μ + (s2i ci 2 + 2εi ), i=1
i=1
which is equivalent to q(μ∗ ) − qεk ≤
μ1 − μ∗ 2 + 2
k
2 i 2 i=1 (si c
k
i=1 si
+ 2εi )
.
Since μ∗ is any maximizer of q, we can state that 2 dist (μ1 , M∗ ) + ki=1 s2i ci 2 + 2 ki=1 εi q ∗ − qεk ≤ . k 2 i=1 si
(12)
On the other hand, as the stepsize is si = s/(δ + ci ), by Assumption 1 we have k ks si > (13) δ+C i=1
A Variant of the Constant Step Rule for Approximate Subgradient Methods
761
∞ k 2 i 2 2 2 i 2 2 and so i=1 si = ∞. In addition, as si c < s , it holds i=1 si c < ks . Therefore k 2 i 2 ks2 i=1 si c < = s(δ + C). (14) k ks i=1 si δ+C ∞ Taking into account the assumption k=1 εk < ∞ together with (12)-(14), for k → ∞ we obtain s q ∗ − lim qεk ≤ (δ + C), (15) k→∞ 2 as required.
Note that for very small values of δ the stepsize (10) is similar to Shor’s classical scaling; in contrast, for big values of δ (with regard to sup{ck }) the stepsize (10) looks like a constant stepsize. In this work by default δ = 10−12 , with s = 100. Diminishing Stepsize Rule (DSR). The convergence of the exact subgradient method using a diminishing stepsize was shown by Correa and Lemar´echal, see [3]. In the case where ck is an εk -subgradient we have the following proposition, which was proved in [9]. Proposition 2. Let the optimal set M∗ be nonempty. Also, assume that the sequences {sk } and {εk } are such that sk > 0,
∞
sk = ∞,
k=0
∞
s2k < ∞,
k=0
∞
sk εk < ∞.
(16)
k=0
Then, the sequence {μk }, generated by the ε-subgradient method, where ck ∈ ∂εk q(μk ) (with {ck } bounded), converges to some optimal solution. An example of such a stepsize is sk = 1/ k, for k = k/m + 1. We use by default m = 5. An interesting alternative for the ordinary subgradient method is the dynamic stepsize rule q ∗ − q(μk ) sk = γk , (17) ck 2 with ck ∈ ∂q(μk ) and 0 < γ ≤ γk ≤ γ < 2, which was introduced by Poljak in [12] (see also Shor [13]). Unfortunately, in most practical problems q ∗ and q(μk ) are unknown. The latter can be approximated by qεk (μk ) = l(xk , μk ) and q ∗ is replaced with an k estimate qlev . This leads to the stepsize rule sk = γk
k qlev − qεk (μk ) , ck 2
where ck ∈ ∂εk q(μk ) is bounded for k = 0, 1, . . ..
(18)
762
E. Mijangos
Dynamic Stepsize Rule with Adjustment Procedure (DSRAP). An option to estimate q ∗ is to use the adjustment procedure suggested by Nedi´c and Bertsekas [11], but fitted for the ε-subgradient method, its convergence was analyzed by Mijangos in [9] (see also [6]). k In this procedure qlev is the best function value achieved up to the kth iteration, in our case max0≤j≤k qεj (μj ), plus a positive amount δk , which is adjusted according to algorithm’s progress. k The adjustment procedure obtains qlev as follows: k qlev = max qεj (μj ) + δk , 0≤j≤k
and δk is updated according to k ρδk , if qεk+1 (μk+1 ) ≥ qlev , δk+1 = k+1 k max{βδk , δ}, if qεk+1 (μ ) < qlev , where δ0 , δ, β, and ρ are fixed positive constants with β < 1 and ρ ≥ 1. Dynamic Stepsize Rule with Relaxation-Level Control (DSRLC). Ank other choice to compute an estimate qlev for (18) is to use a dynamic stepsize rule with relaxation-level control, which is based on the algorithm given by Br¨ annlund [2], whose convergence was shown by Goffin and Kiwiel in [5] for εk = 0 for all k. Mijangos in [9] fitted this method to the dual problem of NCNFP (4-5) for {εk } → 0 and analyzed its convergence. k In this case, in contrast to the adjustment procedure, q ∗ is estimated by qlev , which is a target level that is updated only if a sufficient ascent is detected or when the path long done from the last update exceeds a given upper bound B, see [9, 8, 6].
3
Solution to NCNFP
An algorithm is given below for solving NCNFP. This algorithm uses the approximate subgradient methods described in Section 2. The value of the dual function q(μk ) is estimated by minimizing approximately l(x, μk ) over x ∈ F (the set defined by the network constraints) so that the optimality tolerance, τxk , becomes more rigorous as k increases; i.e., the minimization will be asymptotically exact [1]. In other words, we set qεk (μk ) = l(xk , μk ), where xk minimizes approximately the nonlinear network subproblem NNSk minimize l(x, μk ) x∈F
in the sense that this minimization stops when we obtain a xk such that Z t ∇x l(xk , μk ) ≤ τxk where limk→∞ τxk = 0 and Z represents the reduction matrix whose columns form a base of the null subspace of the subspace generated by the rows of the matrix of active network constraints of this subproblem.
A Variant of the Constant Step Rule for Approximate Subgradient Methods
763
Algorithm 1 (Approximate subgradient method for NCNFP) Step 0. Initialize. Set k = 1, Nmax , τx1 , μ and τμ . Set μ1 = 0. Step 1. Compute the dual function estimate, qεk (μk ), by solving NNSk , so that if Z t ∇x l(xk , μk ) ≤ τxk , then xk ∈ F is an approximate solution, qεk (μk ) = l(xk , μk ), and ck = c(xk ) is an εk -subgradient of q in μk . Step 2. Check the stopping rules for μk . (cki )+ T1 : Stop if max < τμ , where (cki )+ = max{0, ci (xk )}. i=1,...,r 1 + (cki )+ k q − (q k−1 + q k−2 + q k−3 )/3 < μ , where q n = qεn (μn ). T2 : Stop if 1 + qk 1 k−i μ − μk−i−1 ∞ < μ . 4 i=0 3
T3 : Stop if
T4 : Stop if k reaches a prefixed value Nmax . If μk fulfils one of these tests, then it is optimal, and the algorithm stops. Without a duality gap, (xk , μk ) is a primal-dual solution. Step 3. Update the estimate μk by means of the iteration μki + sk cki , if μki + sk cki > 0 μk+1 = i 0, otherwise where sk is computed using some stepsize rule. Go to Step 1. In Step 0, for the checking of the stopping rules, τμ = 10−5 , μ = 10−5 and Nmax = 200 have been taken. In addition, τx0 = 10−1 by default. Step 1 of this algorithm is carried out by the code PFNL, described in [10], which is based on the specific procedures for nonlinear networks flows [14] and active set procedures. In Step 2, alternative heuristic tests have been used for practical purposes. T1 checks the feasibility of xk , as if it is feasible the duality gap is zero, and then (xk , μk ) is a primal-dual solution for NCNFP. T2 and T3 mean that μ does not improve for the last N iterations. Note that N = 4. To obtain sk in Step 3, we have alternately used the stepsize rules given in Section 2.1, that is, VCSR, DSR, DSRAP, and DSRLC. The values given above have been heuristically chosen. The implementation in Fortran-77 of the previous algorithm, termed PFNRN05, was designed to solve large-scale nonlinear network flow problems with nonlinear side constraints.
4
Computational Results
The problems used in these tests were created by means of the following DIMACS-random-network generators: Rmfgen and Gridgen, see [4]. These generators provide linear flow problems in networks without side constraints. The inequality nonlinear side constraints for the DIMACS networks were generated
764
E. Mijangos Table 1. Test problems prob. c13e2 c15e2 c17e2 c18e2 c13n1 c15n1 c17n1 c18n1 c21e2 c22e2 c23e2 c24e2 c33e2 c34e2 c35e2 c38e2 c42e2 c47e2 c48e2 c49e2
# arcs # nodes # side const. # actives # jac. entr. # sb. arcs 1524 360 360 10 364 89 1524 360 180 59 307 116 1524 360 216 82 367 132 1524 360 360 152 615 201 1524 360 360 38 364 681 1524 360 180 110 307 620 1524 360 216 129 367 615 1524 360 360 205 615 588 5420 1200 120 3 135 30 5420 1200 600 13 656 46 5420 1200 120 16 135 236 5420 1200 600 91 1348 89 4008 501 150 17 589 83 4008 501 902 108 3645 106 4008 501 251 28 1014 84 4008 501 1253 135 4054 112 12008 1501 15 4 182 174 12008 1501 300 105 3617 327 12008 1501 751 102 4537 236 12008 1501 751 116 4537 258
Table 2. Computational results VCSR DSR Problem oit c∞ t oit c∞ t c13e2 24 10−5 1.2 8 10−3 0.7 c15e2 6 10−8 5.5 8 10−1 96.7 −7 c17e2 7 10 10.7 1007 10−1 32.0 c18e2 11 10−10 33.1 664 10−1 43.6 c13n1 9 10−6 115.5 12 10−3 65.3 −6 c15n1 10 10 82.4 18 10−3 188.4 −6 c17n1 11 10 89.6 129 10−2 525.6 c18n1 11 10−6 123.1 10 10−8 370.8 −6 c21e2 24 10 1.5 28 10−2 1.5 −6 c22e2 65 10 4.4 46 10−2 3.1 c23e2 31 10−5 6.5 47 10−1 6.4 c24e2 158 10−5 15.4 13 10−6 4.8 −6 c33e2 7 10 1.5 10 10−3 1.5 −6 c34e2 9 10 5.7 36 10−3 8.4 c35e2 8 10−6 2.3 11 10−3 2.1 c38e2 8 10−6 8.0 31 10−3 11.5 −7 c42e2 10 10 11.9 10 10−3 18.8 −6 c47e2 15 10 388.6 41 10−3 248.1 c48e2 11 10−6 122.5 43 10−3 98.1 −6 c49e2 12 10 173.3 — — —
oit 7 8 8 11 8 8 9 10 7 9 10 12 8 9 9 8 7 31 11 12
DSRAP c∞ t 10−8 0.7 10−6 1.6 10−5 2.4 10−7 39.1 10−7 109.9 10−6 63.7 10−6 71.7 10−8 332.5 10−7 1.6 10−6 2.1 10−7 6.9 10−10 5.1 10−7 1.6 10−9 5.1 10−9 2.3 10−8 8.6 10−8 14.3 10−6 633.3 10−8 235.0 10−7 268.3
oit 7 8 9 8 8 21 24 44 7 8 11 13 8 9 9 9 8 98 48 87
DSRLC c∞ t 10−9 0.6 10−7 2.6 10−7 3.9 10−6 8.3 10−7 85.0 10−6 157.0 10−8 376.7 10−9 1577.7 10−7 1.6 10−6 1.9 10−7 6.4 10−6 4.7 10−6 1.7 10−6 4.8 10−8 2.5 10−8 5.1 10−10 14.2 10−7 699.8 10−6 138.3 10−8 291.2
A Variant of the Constant Step Rule for Approximate Subgradient Methods
765
through the Dirnl random generator described in [10]. The last two letters indicate the type of objective function that we have used: Namur functions, n1, and EIO1 functions, e2. The EIO1 family creates problems with a moderate number of superbasic variables (i.e., dimension of the null space) at the solution (# sb. arcs). By contrast, the Namur functions [14] generates a high number of superbasic arcs at the optimizer, see Table 1. More details about these problems can be found in [10]. In Table 2, for each method used to compute the stepsize, “oit” represents the number of outer iterations required and c∞ represents the maximum violation of the side constraints in the optimal solution; that is, it offers information about the feasibility of this solution and, hence, about its duality gap. The efficiency is evaluated by means of the run-times in CPU-seconds in the column t. The results point out that in spite of the simplicity of the variant of the constant step rule, VCSR, it is quite efficient and robust, in 8 of the 20 problems VCSR achieves the best CPU-times. However, DSRLC has a similar efficiency, but it is more accurate than VCSR (compare the columns c∞ ).
5
Conclusions
This paper has presented a variant of the constant step rule for approximate subgradient methods applied to solve nonlinearly constrained network flow problems. The convergence of the ε-subgradient method with the variant of the constant step rule has been analyzed. Moreover, its efficiency has been compared with that of other ε-subgradient methods. The results of the numerical tests encourage to carry out further experimentation, which also includes real problems, and to analyze more carefully the influence of some parameters over the performance of this code.
References 1. Bertsekas, D.P. Nonlinear Programming. 2nd ed., Athena Scientific, Belmont, Massachusetts (1999) 2. Br¨ annlund, U. On relaxation methods for nonsmooth convex optimization. Doctoral Thesis, Royal Institute of Technology, Stockholm, Sweden (1993) 3. Correa, R. and Lemarechal, C. Convergence of some algorithms for convex minimization. Mathematical Programming, Vol. 62 (1993) 261–275 4. DIMACS. The first DIMACS international algorithm implementation challenge : The bench-mark experiments. Technical Report, DIMACS, New Brunswick, NJ, USA (1991) 5. Goffin, J.L. and Kiwiel, K. Convergence of a simple subgradient level method. Mathematical Programming, Vol. 85 (1999) 207–211 6. Kiwiel, K. Convergence of approximate and incremental subgradient methods for convex optimization. SIAM J. on Optimization, Vol. 14(3) (2004) 807–840 7. Mijangos, E. An efficient method for nonlinearly constrained networks. European Journal of Operational Research, Vol. 161(3) (2005) 618–635.
766
E. Mijangos
8. Mijangos, E. Efficient dual methods for nonlinearly constrained networks. SpringerVerlag Lecture Notes in Computer Science, Vol. 3483 (2005) 477–487 9. Mijangos, E. Approximate subgradient methods for nonlinearly constrained network flow problems, Journal on Optimization Theory and Applications, Vol. 128(1) (2006) (forthcoming) 10. Mijangos, E. and Nabona, N. The application of the multipliers method in nonlinear network flows with side constraints. Technical Report 96/10, Dept. of Statistics and Operations Research. Universitat Polit`ecnica de Catalunya, 08028 Barcelona, Spain (1996) (downloadable from website http://www.ehu.es/∼mepmifee/) 11. Nedi´c, A. and Bertsekas, D.P. Incremental subgradient methods for nondifferentiable optimization. SIAM Journal on Optimization, Vol. 12(1) (2001) 109–138. 12. Poljak, B.T. Minimization of unsmooth functionals, Z. Vyschisl. Mat. i Mat. Fiz., Vol. 9 (1969) 14–29 13. Shor, N.Z. Minimization methods for nondifferentiable functions, Springer-Verlag, Berlin (1985) 14. Toint, Ph.L. and Tuyttens, D. On large scale nonlinear network optimization. Mathematical Programming, Vol. 48 (1990) 125–159
On the Optimal Buffer Allocation of an FMS with Finite In-Process Buffers Soo-Tae Kwon Department of Information Systems, School of Information Technology and Engineering, Jeonju University
Abstract. This paper considers a buffer allocation problem of flexible manufacturing system composed of several parallel workstations each with both limited input and output buffers, where machine blocking is allowed and two automated guided vehicles are used for input and output material handling. Some interesting properties are derived that are useful for characterizing optimal allocation of buffers for the given FMS model. By using the properties, a solution algorithm is exploited to solve the optimal buffer allocation problem, and a variety of differently-sized decision parameters are numerically tested to show the efficiency of the algorithm. Keywords: FMS, Queueing Network, Throughput, Buffer.
1
Introduction
Flexible manufacturing systems (FMSs) have been introduced in an effort to increase productivity by reducing inventory and increasing the utilization of machining centers simultaneously. An FMS combines the existing technology of NC manufacturing, automated material handling, and computer hardware and software to create an integrated system for the automatic random processing of palletized parts across various workstations in the system. The design of an FMS begins with a survey of the manufacturing requirements of the products produced in the firm with a view to identifying the range of parts which should be produced on the FMS. Then the basic design concepts must be established. In particular the function, capability and number of each type of workstation, the type of material handling system and the type of storage should be determined. At the detailed design stage it will be necessary to determine such aspects as the required accuracy of machines, tool changing systems and the method of feeding and locating parts at machines. Then the number of transport devices, the number of pallets, the capacity of central and local storages must be determined, along with some general strategies for work transport to reduce delays due to interference. One of key questions that the designer face in an FMS is the buffer allocation problem, i.e., how much buffer storage to allow and where to place it within the system. This is an important question because buffers can have a great impact on M. Gavrilova et al. (Eds.): ICCSA 2006, LNCS 3982, pp. 767–776, 2006. c Springer-Verlag Berlin Heidelberg 2006
768
S.-T. Kwon
the efficiency of production. They compensate for the blocking and the starving of the workstations. Unfortunately, buffer storage is expensive both due to its direct cost and due to the increase of the work-in-process(WIP) inventories. Also, the requirement to limit the buffer storage can be a result of space limitations in the factory. Much research has concentrated on queueing network model analyses to evaluate the performance of FMSs, and concerned with mathematical models to address the optimization problems of complex systems such as routing optimization, server allocation, workload allocation, buffer allocation on the basis of the performance model. Vinod and Solberg (1985) have presented a methodology to design the optimal system configuration of FMSs modeled as a closed queueing networks of multiserver queues. Buzacott and Yao (1986) have reviewed the work on modelling FMS with particular focus on analytical models. Shanthikumar and Yao (1989) have solved the optimal buffer allocation problem with increasing concave production profits and convex buffer space costs. Paradopoulos and Vidalis (1999) have investigated the optimal buffer allocation in short balanced production lines consisting of machines that are subject to breakdown. Enginarlar et al. (2002) have investigated the smallest level of buffering to ensure the desired production rate in serial lines with unreliable machines. In the above-mentioned reference models, machines were assumed not to be blocked, that is, not to have any output capacity restriction. These days, the automated guided vehicle (AGV) is commonly used to increase potential flexibility. By the way, it may not be possible to carry immediately the finished parts from the machines which are subject to AGV’s capacity restriction. The restriction can cause any operation blocking at the machines, so that it may be desirable to provide some storage space to reduce the impact of such blocking. In view of reducing work-in-process storage, it is also required to have some local buffers of proper size at each workstation. Sung and Kwon(1994) have investigated a queueing network model for an FMS composed of several parallel workstations each with both limited input and output buffers where two AGVs are used for input and output material handling, and Kwon (2005) has considered a workload allocation problem on the basis of the same model. In this paper, a buffer allocation problem is considered to yield the highest throughput for the given FMS model (Sung and Kwon 1994). Some interesting properties are derived that are useful for characterizing optimal allocation of buffers, and some numerical results are presented.
2
The Performance Evaluation Model
The FMS model is identical to that in Sung and Kwon(1994). The network consists of a set of n workstations. Each workstation i(i = 1, · · · , n) has machines with both limited input and output buffers. The capacities of input and output buffers are limited up to IBi and OBi respectively, and the machines perform in an exponential service time distribution. All the workstations are linked to an automated storage and retrieval system (AS/RS) by AGVs which consist of
On the Optimal Buffer Allocation of an FMS with Finite In-Process Buffers
769
AGV(I) and AGV(O). The capacity of the AS/RS is unlimited, and external arrivals at the AS/RS follow a Poisson process with rate λ. The FCFS (first come first served) discipline is adopted here for the services of AGVs and machines. AGV(I) delivers the input parts from the AS/RS to each input buffer of workstations, and AGV(O) carries the finished parts away from each output buffer of workstations to the AS/RS, with corresponding exponential service time distributions. Specifically, AGV(I) distributes all partsfrom the n AS/RS to the workstations according to the routing probabilities γi ( i=1 γi = 1) which can be interpreted as the proportion of part dispatching from the AS/RS to workstation i. Moreover, any part (material) can be blocked on arrival (delivery) at an input buffer which is already full with earlier-arrived parts. Such a blocked part will be recirculated instead of occupying the AGV(I) and waiting in front of the workstation (block-and-recirculate mechanism). Any finished part can also be blocked on arrival at an output buffer which is already full with earlier-finished parts. Such a blocked part will occupy the machine to remain blocked until a part departure occurs from the output buffer. During such a blocking time, the machine cannot render service to any other part that might be waiting at its input buffer (block-and-hold mechanism). Sung and Kwon(1994) have developed an iterative algorithm to approximate system performance measures such as system throughput and machine utilization. The approximation procedure decomposes the queueing network into individual queues with revised arrival and service processes. These individual queues are then analyzed in isolation. The individual queues are grouped into two classes. The first class consists of input buffer and machine, and the second one consists of output buffers and AGV(O). The first and second classes are called the first-level queue and the second-level queue, respectively. The following notations are used throughout this paper (i = 1, · · · , n): λ λi λ∗i μ μi P (k1 , · · · , kn )
external arrival rate at AS/RS arrival rate at each input buffer i in the first-level queue arrival rate at each output buffer i in the second-level queue service rate of AGV service rate of machine i probability that there are ki units at each output buffer i in the second-level queue with infinite capacity. P (idle) probability that there is no unit in the second-level queue with infinite capacity. (k1 , · · · , kn ) probability that there are ki units at each output buffer i in the second-level queue with finite capacity. (idle) probability that there is no unit in the second-level queue with finite capacity. The second-level queue is independently analyzed first to find the steady-state probability by using the theory of reversibility. The steady-state probability is derived as follows.
770
S.-T. Kwon
Lemma 1. (ref er to Sung and Kwon 1994, T heorem 2) The steady-state probability of the second-level queue is derived as (k1 , · · · , kn ) = P (k1 , · · · , kn )/G (idle) = P (idle)/G
(1)
where, A = {(k1 , · · · , kn )|0 <= ki <= OBi , 1 <= i <= n}, G = (k1 ,···,kn )∈A P (k1 , · · · , kn ) + P (idle), P (k1 , · · · , kn ) = (1 − ρ) · ρ(k1 +···+kn +1) · P (idle) = 1 − ρ, n ρ = i=1 λ∗i /μ, n qi = λ∗i / i=1 λ∗i .
(k1 +···+kn )! k1 !···kn !
· q1 k1 · · · qn kn ,
It is followed by finding the clearance service time accommodating all the possible blocking delays that a part might undergo due to the phenomenon of blocking. The clearance time is derived from the steady-state probability of second-level queue. Then, the first-level queues are analyzed by this expected clearance time in the approach of the M/M/1/K queueing model.
3
The Buffer Allocation Problem
In FMS, a frequently encountered problem is concerned with how to allocate buffer space among several subsystems for maximizing the production rate (system throughput). In this section, the buffer allocation problem is considered to yield the highest throughput for the given performance evaluation model. The optimal buffer allocation problem can be stated as follows : M aximize s.t. where
Z = T H(x1 , · · · xn , xn+1 , · · · x2n ) 2n xi ≤ S
(2)
i=1
T H(x1 , · · · xn , xn+1 , · · · x2n ) = the system throughput, xi = the number of buffers allocated to buffer i(input buffer : 1 ≤ i ≤ n, output buffer : n + 1 ≤ i ≤ 2n), that is(IB 1 , · · · , IB n , OB 1 , · · · , OB n ) S = the maximum total number of buffers to be allocated. Despite of its practical importance, this buffer allocation problem has not been successfully studied in the literature. The major difficulty appears to be lack of known properties regarding the throughput of system as a function of its buffer capacity. Some interesting properties for the associated system throughput are now derived. Property 1. In the first-level queue-alone subsystem, the throughput is a monotonically increasing concave function of its buffer size.
On the Optimal Buffer Allocation of an FMS with Finite In-Process Buffers
771
Proof: Let ρi (= λi /μi ) and IB i denote the utilization and the buffer size of the first-level queue i, respectively. Then the throughput of the first-level queue i, T H(λi , IB i , μi ), can be derived as follows : 1 − ρi T H(λi , IB i , μi ) = μi · (1 − ) i +1 1 − ρIB i By the definition of T H(IB i ), T H(λi , IB i + 1, μi ) − T H(λi , IB i , μi ) 1 − ρi 1 − ρi = μi · ( − ) IB i +1 i +2 1 − ρi 1 − ρIB i 1 1 = μi · ( − ) IB i 2 2 i +1 1 + ρi + ρi + · · · + ρi 1 + ρi + ρi + · · · + ρIB i > 0 for all ρi . And, 2T H(λi , IB i + 1, μi ) − T H(λi , IB i , μi ) − T H(λi , IB + 2, μi ) 1 − ρi 1 − ρi 1 − ρi = 2 · μi · (1 − ) − μi · (1 − ) − μi · (1 − ) IB i +2 IB i +1 1 − ρIBi +3 1 − ρi 1 − ρi 1 1 2 = μi · (1 − ρi )[ + − ] IB i +1 IB i +3 i +2 1 − ρi 1 − ρi 1 − ρIB i μi · (1 − ρi ) i +3 i +1 i +2 = · [ρIB + ρIB − 2ρIB i i i IB i +3 IB i +2 i +1 (1 − ρIB )(1 − ρ )(1 − ρ ) i i i i +5 i +3 i +4 +ρ2·IB + ρ2·IB − 2ρ2·IB ] i i i i +1 i +3 μi · (1 − ρi )3 · (ρIB + ρ2·IB ) i i IB i +1 IB i +3 IB i +2 (1 − ρi )(1 − ρi )(1 − ρi ) > 0 for all ρi . Thus, the throughput of the first-level queue is a monotonically increasing concave function of buffer size. This completes the proof.
=
Also, the throughput of the second-level queue is characterized as follows. Property 2. In the second-level queue-alone subsystem, the throughput is a monotonically increasing concave function of its buffer size. Proof: n λ∗ Let ρ(= i=1 μi ) and (OB 1 , · · · , OB n ) denote the utilization and the output buffer sizes of the second-level queue, respectively. Then the throughput of the second-level queue, T H(λ∗1 , · · · , λ∗n , OB 1 , · · · , OB n , μ), can be derived as follows. T H(λ∗1 , · · · , λ∗n , OB 1 , · · · , OB n , μ) = μ · (1 − (idle)) 1−ρ ) G(λ∗1 , · · · , λ∗n , OB 1 , · · · , OB n , μ) 1 = μ · (1 − ) ∗ ∗ φ(λ1 , · · · , λn , OB 1 , · · · , OB n , μ)
= μ · (1 −
772
S.-T. Kwon
where, G(λ∗1 , · · · , λ∗n , OB 1 , · · · , OB n , μ) = (1 − ρ)[1 +
OB 1
k1 =0 ∗ ∗ φ(λ1 , · · · , λn , OB 1 , · · · , OB n , μ) OB OB 1 n
···
=1+
k1 =0
OB n
···
ρn+1
kn =0
ρn+1
kn =0
(k1 + · · · + kn )! k1 q1 · · · qnkn ], k1 ! · · · kn !
(k1 + · · · + kn )! k1 q1 · · · qnkn k1 ! · · · kn !
n = k1 + · · · + kn And, let ψ1 =
OB 1
···
k1 =0
ψ2 =
OB 1 k1 =0
···
OB i +1
···
ki =OB i +1 OB i +2 ki =OB i +2
···
OB n
ρn+1
kn =0 OB n kn =0
ρn+1
(k1 + · · · + kn )! k1 q1 · · · qnkn , k1 ! · · · kn !
and
(k1 + · · · + kn )! k1 q1 · · · qnkn k1 ! · · · kn !
These lead to the relation ψ1 > ψ2 , and it holds that φ(λ∗1 , · · · λ∗n , OB 1 , · · · , OB i + 1, · · · , OB n , μ) = φ(λ∗1 , · · · λ∗n , OB 1 , · · · , OB i , · · · , OB n , μ) ∗ ∗ φ(λ1 , · · · λn , OB 1 , · · · , OB i + 2, · · · , OB n , μ) = φ(λ∗1 , · · · λ∗n , OB 1 , · · · , OB i , · · · , OB n , μ) By the definition of T H(λ∗1 , · · · , λ∗n , OB 1 , · · · , OB n , μ) T H(λ∗1 , · · · λ∗n , OB 1 , · · · , OB i + 1, · · · , OB n , μ) −T H(λ∗1 , · · · λ∗n , OB 1 , · · · , OB i , · · · , OB n , μ) 1 = μ·[ φ(λ∗1 , · · · λ∗n , OB 1 , · · · , OB i , · · · , OB n , μ) 1 − ] ∗ ∗ φ(λ1 , · · · λn , OB 1 , · · · , OB i , · · · , OB n , μ) + ψ1 > 0 for all OB i And, 2 · T H(λ∗1 ,· · · λ∗n ,OB 1 ,· · · , OB i + 1,· · ·,OB n ,μ) − T H(λ∗1 , · · · λ∗n , OB 1 , · · · , OB i , · · · , OB n , μ) −T H(λ∗1 , · · · λ∗n , OB 1 , · · · , OB i + 2, · · · , OB n , μ) −2 =μ·[ φ(λ∗1 , · · · λ∗n , OB 1 , · · · , OB i , · · · , OB n , μ) + ψ1 1 + ∗ ∗ φ(λ1 , · · · λn , OB 1 , · · · , OB i , · · · , OB n , μ)
+ ψ1 + ψ1 + ψ2
On the Optimal Buffer Allocation of an FMS with Finite In-Process Buffers
+
773
1 ] φ(λ∗1 , · · · λ∗n , OB 1 , · · · , OB i , · · · , OB n , μ) + ψ1 + ψ2
= μ · [φ(λ∗1 , · · · λ∗n , OB 1 , · · · OB n , μ) · (ψ1 − ψ2 ) + ψ12 + ψ1 · ψ2 ] 1 · ∗ ∗ φ(λ1 , · · · λn , OB 1 , · · · , OB i , · · · , OB n , μ) + ψ1 1 · φ(λ∗1 , · · · λ∗n , OB 1 , · · · , OB i , · · · , OB n , μ) 1 · φ(λ∗1 , · · · λ∗n , OB 1 , · · · , OB i , · · · , OB n , μ) + ψ1 + ψ2 > 0 for all OB i Thus, the throughput of the second-level queue is a monotonically increasing concave function of buffer size. This completes the proof. Finally, the following result can be obtained by the comparison of the throughputs for buffer allocation scheme. Property 3. In the second-level queue-alone subsystem, the balanced buffer allocation scheme maximizes the throughput. Proof: For simplification, the proof will be completed only for the case of n = 2, S = 2 and λ1 = λ2 . Let T H b and T H nb be the throughput of the balanced allocation scheme case (OB 1 = OB 2 = 1) and the other one (OB 1 = 2), respectively. T H b − T H nb = μ(1 −
1−ρ 1−ρ Gb − Gnb ) − μ(1 − ) = μ(1 − ρ) Gb Gnb Gb · Gnb
Since q1 = q2 and by the definition of G, Gb − Gnb = (1 − ρ)[(1 − ρ + ρ2 + 2ρ3 q1 q2 ) − (1 + ρ + ρ2 q1 + ρ3 q12 )] = (1 − ρ)[ρ2 (1 − q1 ) + ρ3 (2q1 q2 − q12 )] 1 1 = (1 − ρ)[ ρ2 + ρ3 ] 2 2 Therefore, T H b − T H nb ≥ 0. This complete the proof. On the basis of properties, now consider the optimization problem in (2). Since the throughput is a monotonically increasing concave function of buffer capacities for both first-level and second-level queue as proved, the marginal allocation approach of Fox (1966) can be used to efficiently solve the optimal buffer allocation problem in (2). The idea of this approach is as follows. Let T H(x1 , · · · , xi , · · · , x2n ) = T H(x1 , · · · , xi +1, · · · , x2n )−T H(x1 , · · · , xi , · · · , x2n ) for all i. Allocate the available buffer spaces to the buffer that would yield the largest increase in F (xi ) one at a time. Continue this procedure until the available buffer spaces are exhausted.
774
S.-T. Kwon
The algorithm of the buffer allocation problem can be summarized as follows. Step 1. Step 2.
Set xi = 0, for all i(= 1, · · · , 2n). For all i, calculate T H(x1 , · · · , xi , · · · , x2n ) = T H(x1 , · · · , xi + 1, · · · , x2n ) − T H(x1 , · · · , xi , · · · , x2n ) by using the performance evaluation model. Table 1. The result of the parameter set 1
iteration 0 1 2 3 4 5
x1 = IB 1 0 1 0 2 1 3 2 3 2 4 3
x2 = OB 2 0 0 1 0 1 0 1 1 2 1 2
throughput .5262 .6667 .5846 .7313 .7311 .7677 .7965 .83288 .83287 .8558 .8685
increment .1405 .4941 .0646 .0644 .0364 .0652 .03638 .03637 .0229 .0356
allocation ∗∗ ∗∗ ∗∗ ∗∗ ∗∗
Table 2. The result of the parameter set 2 S x1 x2 x3 x4 TH allocation IB 1 IB 2 IB 3 IB 4 0 0 0 0 0 .6291 1 1 0 0 0 .7043 .0752 ∗∗ 0 1 0 0 .7043 .0752 0 0 1 0 .6641 .035 0 0 0 1 .6641 .035 2 2 0 0 0 .7378 .033 1 1 0 0 .7758 .0715 ∗∗ 1 0 1 0 .731 .0267 1 0 0 1 .7438 .0395 3 2 1 0 0 .8087 .0329 ∗∗ 1 2 0 0 .8087 .0329 1 1 1 0 .807 .0312 1 1 0 1 .807 .0312 4 3 1 0 0 .8268 .0181 2 2 0 0 .8406 .0319 2 1 1 0 .8312 .0225 2 1 0 1 .8425 .0338 ∗∗ 5 3 1 0 1 .8629 .0204 2 2 0 1 .8663 .0238 2 1 1 1 .8725 .03 ∗∗ 2 1 0 2 .855 .0125
S x1 x2 x3 x4 TH IB 1 IB 2 IB 3 IB 4 6
3 2 2 2 7 3 2 2 2 8 4 3 3 3 9 4 3 3 3 10 4 3 3 3
1 2 1 1 2 3 2 2 2 3 2 2 3 4 3 3 3 4 3 3
1 1 2 1 1 1 2 1 1 1 2 1 1 1 2 1 2 2 3 2
1 1 1 2 1 1 1 2 1 1 1 2 1 1 1 2 1 1 1 2
allocation
.8867 .0142 .9003 .0278 .882 .0095 .89 .0175 .9149 .0146 .9149 .0146 .912 .0117 .912 .0117 .9232 .0083 .9293 .0144 .9217 .0068 .9284 .0135 .9376 .0083 .9376 .0083 .9379 .0086 .9379 .0086 .9427 .0048 .9479 .01 .9394 .0015 .9524 .0145
∗∗ ∗∗
∗∗
∗∗
∗∗
On the Optimal Buffer Allocation of an FMS with Finite In-Process Buffers
Step 3.
Step 4.
775
Find k such that T H(x1 , · · · , xk , · · · x2n ) Max = 1≤i≤2n T H(x1 , · · · , xi , · · · x2n ). Set xk = xk + 1, S = S − 1. If S > 0, then go to step 2. Stop.
In order to illustrate the solution procedure, the buffer allocation problem on the basis of the given performance evaluation model is considered with parameter set 1 (λ = 1, μ1 = μ2 = 2, μ = 1, S = 5), parameter set 2 (λ = 1, r1 = r2 = 0.5, μ1 = μ2 = 2, μ = 1, S = 10) and parameter set 3 (λ = 1, r1 = r2 = 0.5, μ1 = 1, μ2 = 2, μ = 1, S = 10), where S, λ, ri , μi , and μ denote the maximum total number of buffers to be allocated, the arrival rate, the routing probability, the machine service rate, and the AGV service rate, respectively. At first, a simple system with a single workstation is considered to illustrate the marginal allocation procedure, which is identical to the two stage transfer line. The results of the system with parameter set 1 are shown in Table 1. The table gives both the amount of increment and throughput results. At second, the buffer allocation problem is considered to test the efficiency of the solution algorithm. For S = 1, · · · , 10, the results of the system with parameter set 2 and 3 are shown in Table 2 and 3, respectively.
Table 3. The result of the parameter set 3 S x1 x2 x3 x4 TH allocation S x1 x2 x3 x4 TH allocation IB 1 IB 2 IB 3 IB 4 IB 1 IB 2 IB 3 IB 4 0 0 0 0 0 .5944 1 1 0 0 0 .6662 .0718 6 3 2 1 0 .8547 .0162 0 1 0 0 .6729 .0785 ∗∗ 2 3 1 0 .8592 .0207 0 0 1 0 .6219 .0275 2 2 2 0 .8502 .0117 0 0 0 1 .6297 .0353 2 2 1 1 .8692 .0307 ∗∗ 2 1 1 0 0 .741 .0681 ∗∗ 7 3 2 1 1 .8869 .0177 ∗∗ 0 2 0 0 .7074 .0345 2 3 1 1 .8835 .0143 0 1 1 0 .7049 .032 2 2 2 1 .8858 .0166 0 1 0 1 .7002 .0273 2 2 1 2 .8789 .0097 3 2 1 0 0 .7731 .0321 8 4 2 1 1 .898 .0111 1 2 0 0 .7745 .0335 ∗∗ 3 3 1 1 .901 .0141 1 1 1 0 .7728 .0318 3 2 2 1 .9014 .0145 ∗∗ 1 1 0 1 .771 .03 3 2 1 2 .8973 .0104 4 2 2 0 0 .8061 .0316 9 4 2 2 1 .9105 .0091 1 3 0 0 .7931 .0186 3 3 2 1 .9177 .0163 ∗∗ 1 2 1 0 .8095 .035 ∗∗ 3 2 3 1 .9068 .0054 1 2 0 1 .796 .0215 3 2 2 2 .9162 .0148 5 2 2 1 0 .8385 .029 ∗∗ 10 4 3 2 1 .9269 .0092 1 3 1 0 .8303 .0208 3 4 2 1 .9275 .0098 1 2 2 0 .8229 .0134 3 3 3 1 .924 .0063 1 2 1 1 .8376 .0281 3 3 2 2 .9281 .0104 ∗∗
776
S.-T. Kwon
The computational results present that the solution algorithm is very efficient. In case of S = s, the solution algorithm generated the optimal solution at the s − th iteration. However, it is impossible in practice to allocate buffer spaces by conventional approach, since the number of allocating combination becomes explosively large as the number of S and workstation increase. And, the results of Table 2 and 3 imply that the balanced buffer allocation scheme maximize the system throughput. That is, in order to maximize the system throughput, the buffer should be allocated depending on the routing probability and service rate of machine and AGV.
4
Conclusions
In this paper, a design aspect of a flexible manufacturing system composed of several parallel workstations each with both limited input and output buffers where two AGVs are used for input and output material handling is considered. The optimal design decision is made on the allocation of buffer spaces on the basis of the given performance evaluation model. Some interesting properties are derived that are useful for characterizing optimal allocation of buffer spaces. The properties are then used to exploit a solution algorithm for allocating buffer spaces. A variety of differently-sized decision parameters are numerically tested to show the efficiency of the algorithm. The results present that the solution algorithm is very efficient. Further research is to consider the cost factor more explicitly, and also to extend these concepts to general production systems.
References 1. Buzacott, J.A. and Yao, D.D.: Flexible Manufacturing Systems: A Review of Analytical Models. Management Science, 32, No. 7, (1986) 890-905. 2. Enginarlar, E., Li, J., Meerkov, S.M., and Zhang, R.Q.: Buffer Capacity for Accommodating Machine Downtime in Serial Production Lines, International Journal of Production Research, 40, No. 3, (2002) 601-624. 3. Fox, B.: Discrete Optimization via Marginal Analysis, Management Science, 13, No. 3, (1966) 210-216. 4. Kwon, S.T.: On the Optimal Workloads Allocation of an FMS with Finite In-process Buffers, Lecture Notes in Computer Science, 3483, (2005) 624-631. 5. Papadopoulos, H.T. and Vidalis, M.I.: Optimal Buffer Allocation in short-balanced unreliable production lines, Computers & Industrial Engineering, 37, (1999) 691-710. 6. Shanthikumar, J. G. and Yao, D. D.: Optimal Buffer Allocation in a Multicell System, The International Journal of Flexible Manufacturing Systems, 1, (1989) 347-356. 7. Sung, C.S. and Kwon, S.T.: Performance Modelling of an FMS with Finite Input and Output Buffers. International Journal of Production Economics, 37, (1994) 161-175. 8. Vinod, B. and Solberg, J.J.: The Optimal Design of Flexible Manufacturing Systems, International Journal of Production Research, 23, No. 6, (1985) 1141-1151.
Optimization Problems in the Simulation of Multifactor Portfolio Credit Risk Wanmo Kang1 and Kyungsik Lee2,, 1
2
Columbia University, New York, NY, USA Hankuk University of Foreign Studies, Yongin, Korea
Abstract. We consider some optimization problems arising in an efficient simulation method for the measurement of the tail of portfolio credit risk. When we apply an importance sampling (IS) technique, it is necessary to characterize the important regions. In this paper, we consider the computation of directions for the IS, which becomes hard in multifactor case. We show this problem is NP-hard. To overcome this difficulty, we transform the original problem to subset sum and quadratic optimization problems. We support numerically that these reformulation is computationally tractable.
1
Introduction
Measurement of portfolio credit risk is an important problem in financial industry. To reserve economic capital or to summarize the potential risk of a company, the portfolio credit risk is calculated frequently. Some of key properties of this measurement are the importance of dependence structure of obligors constituting the portfolio and the rare-event characteristic of large losses. Dependence among obligors incurs large losses more frequently, even though they are still rare. Gaussian copula is one of the most popular correlation structure in practice. (See [5].) Since there is no known analytical or numerical way to compute the tail losses under Gaussian copula framework, Monte Carlo method is a viable way to accomplish this task. (See [1], [4], [6], [8], and [9]) However, the rareness of large losses makes a crude Monte Carlo method impractical. To accelerate the simulation, one effective way is the application of IS. When applying IS, the identification of important region is the key for the efficiency enhancement. In this paper, we consider the problem of identifying important regions. The combinatorial complexity underlying this problem makes it a hard problem. We re-formulate this problem as a combination of quadratic optimizations and subset sum problems. The worst case complexity is not reduced, but the subset sum problems can be solved very fast in practice. Consequently this new approach works very well for actual problem instances.
Corresponding author. This work was supported by Hankuk University of Foreign Studies Research Fund.
M. Gavrilova et al. (Eds.): ICCSA 2006, LNCS 3982, pp. 777–784, 2006. c Springer-Verlag Berlin Heidelberg 2006
778
2
W. Kang and K. Lee
Portfolio Credit Risk and Importance Sampling
We briefly introduce the portfolio credit risk model and Gaussian copula framework. We consider the distribution of losses from defaults over a fixed horizon. We are interested in the estimation of the probability that the credit loss of a portfolio exceeds a given threshold. As it is difficult to estimate correlations among the default events of obligors, latent variables are introduced as default triggers and the dependence structure is imposed on the latent variables indirectly. A linear factor model is adopted for the correlations among them. We use the following notation: m : the number of obligors to which the portfolio is exposed; Yk : default indicator (= 1 for default, = 0 otherwise) for the k-th obligor; pk : marginal probability that the k-th obligor defaults; ck : loss resulting from default of the k-th obligor; Lm = c1 Y1 + · · · + cm Ym : total loss from defaults. We are interested in the estimation of P(Lm > x) for a given threshold x when the event {Lm > x} is rare. We call such one as a large loss event. We introduce latent normal random variables Xk for each Yk . Xk ’s are standard normal random variables and We set Yk = 1{Xk > Φ−1 (1 − pk )}, with Φ the cumulative normal distribution. For a linear factor representation of Xk , we assume the following: There are d factors and t types of obligors. {I1 , . . . , It } is a partition of the set of obligors {1, . . . , m} into types. If k ∈ Ij , then the k-th obligor is of type j and its latent variable is given by Xk = a j Z + bj εk d where aj ∈ R with 0 < aj < 1, Z is a d dimensional standard normal random vector, bj = 1 − a j aj and εk are independent standard normal random vari-
ables. This dependence structure is called a Gaussian copula model. (See [2].) Z represents common m systematic risk factors and εk an idiosyncratic risk factor. We set x = q k=1 ck for a given q, 0 < q < 1. Denote the average m loss of each type by Cj = k∈Ij ck /|Ij | and total average loss by C = k=1 ck /m. Then index sets (sets of types) important for the large losses exceeding qC can be characterized by the following index set J ⊂ {1, . . . , t} (See [3]): max Cj < qC ≤ Cj . (1) J J
j∈J
j∈J
This characterization of an important index set can be interpreted as follows: to observe samples with large losses, the common factors should have values which enable the sum of average losses of the types in the index set to exceed the loss threshold. After identifying these index sets (say J , the point to shift the mean vectors of Gaussian common factors is found by μJ := argmin {z : z ∈ GJ }
(2)
Optimization Problems in the Simulation
where Gj := z ∈ Rd : a j z ≥ dj
and
GJ :=
779
Gj .
j∈J
dj > 0 is a constant for each type calculated from the problem instance. In this paper, the positivity of dj is sufficient. Now returning to the Monte Carlo simulation, we sample the common factors from the mixture distribution of N (μJ , I) for all the J ’s satisfying (1). μJ is the minimum distant point to the important region for the large losses and we sample from the normal distribution shifted to those points. As usual, we compensate this change of measure by multiplying likelihood ratios. (See [3] for details.) Define Sq be the set of index sets satisfying (1). In principle, we can use μ1 , . . . , μK in the simulation as the mean vectors of mixture distribution if K = |Sq |. However, K becomes very large as t increases. The size depends on q, but for q values near 0.5, the order of K follows exponential to t. So identifying Sq first and then finding corresponding μJ ’s are impractical. In the next section, we exploit some structural properties and re-formulate the problem into a tractable one.
3
Re-formulation of Problem
The first idea comes from the fact that we use μJ for the shift of mean vectors but J is not explicitly used. So if μJ = μJ for two different index sets J and J , we don’t need to know what the two index sets are, but just need μJ . Hence we focus on the characterization of V := {μJ : J ⊂ {1, . . . , t}, |J | ≤ d, GJ = ∅} instead of Sq which possibly consists of exponentially many elements with respect to the number of types t. The issue here is how to find the candidate IS distributions as fast as possible when the number of types, t, and the dimension of factors, d, are fixed. 3.1
Reduction of Candidate Mean Vectors
For a given problem instance, t the size of Sq depends on the value q. In the worst case, the size of Sq will be [t/2] , in which case the application of IS is intractable for instances with a large number of types. To avoid this difficulty, we need to devise a method that does not involve an explicit enumeration of the index sets in Sq . The key fact is the following lemma. Lemma 1. For any J ∈ Sq satisfying GJ = ∅, there exists a J ⊂ J with |J | ≤ d such that μJ = μ J .
780
W. Kang and K. Lee
Proof. (Sketch of Proof) Recall that the definition (2) implies that μJ is the optimal solution of linear programming (LP), min{μ for j ∈ J }. J z : aj z ≥ dj The LP duality gives a dual optimal solution π with P := {j : πj > 0} and π∗
j |P| ≤ d. The complementary slackness condition shows that μJ and v ,j∈P satisfy KKT optimality conditions for min{z : aj z ≥ dj for j ∈ P} and its dual. We can take J = P and this completes the proof. Refer [3] for details.
The lemma tells us that we don’t have to spend our effort to solve (2) if |J | > d. This gives a large reduction of our search space. From this lemma, we also have the following upper bound on the number of (2) which we have to solve. Lemma 2. For an instance with d factors and t types,
t
μJ : J ∈ Sq ≤ t + + · · · + t < td . d d−1 Proof. Note that the righthand side of inequality is the number of ways of choosing d or less constraints from t candidates. Combining with Lemma 1, we complete the proof.
3.2
Derivation of the Subset Sum Problem
Recall that {μJ : J ∈ Sq } ⊂ V from Lemma 1. The upper bound in Lemma 2 is also an upper bound on |V|. Our approach is to find V, as reduced candidate mean vectors, and use it to get {μJ : J ∈ Sq } from V. Assume a representation V = {v1 , . . . , vn } and define H(v) := {j : a j v ≥ dj , j = 1, . . . , t} for v ∈ V. H(v) is the maximal index set satisfying v = μH(v) . Consider, for each v ∈ V, all the minimal constraints sets forming the optimization problem whose unique optimal solution is v; denote this family by F (v) = {F : F ⊂ H(v), v = μF , v = μF \{j} for all j ∈ F }. Note
that |F | ≤ d for each F ∈ F(v) by Lemma 1 and hence the cardinality of v∈V F (v) has the same upper bound as the one in Lemma 2. Because we search V by probing all index sets of cardinality less than or equal to d, we get F (v)’s as by-products of the search. To simplify notations, we abuse the symbol V to denote the collection of pairs (v, F ) for each v and each F ∈ F(v). To identify {μJ : J ∈ Sq } from V according to our scheme, we have to decide whether there is a J ∈ Sq such that v = μJ for each v ∈ V. For this decision, we can use some information on v, H(v) and F (v), which can be collected from the computation of V with no additional cost. We formulate this problem as a minimal cover problem (MCP). Then we transform MCP into a knapsack prob lem. To simplify notations, we define CJ := j∈J Cj for any index set J. MCP: An index set N is given. {Ci }i∈N with Ci > 0 and a subset F ⊂ N (F = ∅) are given. For a given positive number b, is there a subset J ⊂ N \ F such that CJ∪F ≥ b
and
CJ∪F \{k} < b for all k ∈ J ∪ F ?
Then we have the following lemma:
Optimization Problems in the Simulation
781
Lemma 3. The answer to MCP is YES if and only if there exists a J ⊂ N \ F such that i) CJ∪F ≥ b, ii) CJ∪F \{k} < b for all k ∈ J, and iii) CJ∪F − min Ci < b. i∈F
Proof. If we notice the relation CJ∪F \{k} < b for all k ∈ F ⇔ CJ∪F − mini∈F Ci < b, then the proof is complete.
Set b := b − CF . Using Lemma 3, we can rewrite the MCP as MCP : {Ci }i∈N with Ci > 0 and a subset F ⊂ N are given. For a given positive number b, is there a subset J ⊂ N \ F such that i) CJ ≥ b , ii) CJ\{k} < b for all k ∈ J, and iii) CJ < b + min Ci ? i∈F
Consider the following 0-1 knapsack problem (KP): ⎧ ⎫ ⎨ ⎬ f ∗ = min Cj xj : Cj xj ≥ b , xj ∈ {0, 1} for all j ∈ N \ F . ⎩ ⎭ j∈N \F
j∈N \F
Any set G ⊂ N \ F corresponding to an optimal solution of (KP) satisfies condition i) of MCP from the feasibility. If CG\{k} ≥ b for some k ∈ G, then G\{k} is another feasible set with strictly less optimal value and this contradicts to the optimality of G. Hence G satisfies ii) of MCP . Therefore, we conclude that f ∗ < b +mini∈F Ci if and only if the answer to MCP is YES. Now set N = H(v) and take an F from F (v). Then by setting b = qC, MCP solves whether there is a J such that F ⊂ J ⊂ H(v), J ∈ Sq , and v = μJ . Hence by checking this question for all F ∈ F(v), we can decide whether v ∈ {μJ : J ∈ Sq }. Note that, with mini∈F Ci = 1, MCP is equivalent to knapsack feasibility problem and hence MCP is NP-complete. By transforming MCP into the maximization form using the minimal index set notations results in the following SSP: ⎧ ⎫ ⎨ ⎬ f ∗ = max Cj xj : Cj xj ≤ CN − qC, xj ∈ {0, 1} for all j ∈ N \F . ⎩ ⎭ j∈N \F
j∈N \F
The procedure identifying {μJ : J ∈ Sq } is described as the following: 1: Identify V by solving the norm minimization problems (2) associated with all possible combinations of type indices, J ⊂ {1, . . . , t}, |J | ≤ d. 2: Given q, solve SSP associated with each N = H(v) and F ∈ F(v) for all v ∈ V. If f ∗ > CN − qC − mini∈F Ci , then include v among the shifting mean vectors.
782
W. Kang and K. Lee
We assume that all Cj ’s are positive integers. This is a necessary assumption for knapsack problems. SSP has a special structure and is called a subset sum problem which is NP-complete. However, knapsack problems arising in practice are solved very fast. (See, e.g., Chapter 4 of Kellerer, Pferschy, and Pisinger [7].) For numerical experiment, we measured the time spent to solve 106 subset sum problems using a code subsum.c available at http://www.diku.dk/∼pisinger. Each instance consists of 100 randomly generated weights (i.e. |N \ F | = 100 in SSP) and the weights have their ranges [1, 104 ] (i.e., 1 ≤ Cj ≤ 104 ). 21.88 seconds were spent to solve all these 106 problems. (All experiments in this paper were executed using a notebook with a CPU of 1.7GHz Intel Pentium M and a 512MB RAM.) This number of problems, 106 , is roughly the upper bound of the cardinality of V for a factor model having 100 types (= |N |) and three factors. In solving a subset sum problem, the range of weights are crucial for the running time of algorithm. The above input ranges imply that the potential loss amount of each obligor will take its value among the multiples up to 104 of some base amount. Table 1 shows the average cardinalities of {μJ : J ∈ Sq } and V for 30 randomly generated 20- or 25-type instances with factor dimension 4 or 5. Note that the values of the upper bound on |V| in Lemma 2 are 6195, 21699, 15275, and 68405, respectively. However we just need to keep a smaller size (at most 2000 on average) of V to get {μJ : J ∈ Sq }. The computing time of V takes about 28, 100, 65, and 300 seconds for each instance, respectively if we use the MATLAB function quadprog for the norm minimization (2). (By a specialized algorithm in Section 3.3, the time can be reduced to 0.3, 1, 1, and 6 seconds for each instance, respectively.) And the total times in solving 9 subset sum problems to find {μJ : J ∈ Sq }’s for q = 0.1, 0.2, . . . , 0.9 from V are at most 0.2, 0.5, 0.4, and 1.5 seconds, respectively. Furthermore, the cardinalities of {μJ : J ∈ Sq } are much smaller than the theoretical upper bound. These observations imply that we can implement the IS efficiently. Table 1. The average number of minimum norm points in Rd . nq denotes the average of |{μJ : J ∈ Sq }|. (This table is taken from [3]). Types 20 20 25 25
3.3
d 4 5 4 5
Bound 6195 21699 15275 68405
|V| 574.6 932.2 1224.9 2036.5
n0.1 16.9 25.0 33.5 39.7
n0.2 36.1 57.0 65.7 96.3
n0.3 48.5 78.8 90.5 138.4
n0.4 52.2 84.4 91.7 157.1
n0.5 44.5 69.0 74.6 137.7
n0.6 29.6 44.2 44.1 79.8
n0.7 14.3 19.5 16.0 28.2
n0.8 3.9 4.9 2.4 3.1
n0.9 0.2 0.4 0.2 0.0
Quadratic Optimizations
To find V, we need to solve (2). We can apply general quadratic programming (QP) algorithms to these problems. However, we can exploit the hierarchy of QP problems further: we characterize V by solving a QP for each J ⊂ {1, . . . , t}, |J | ≤ d. This strategy of the search allows us to do it by
Optimization Problems in the Simulation
783
solving ν J = argmin{z : a j z = dj for all j ∈ J } instead of the original QP contrained by inequalities. This equality constrained problem can be solved by simple Gaussian eliminations. Because of the change of constraints, we have ν J ≥ μJ instead of equality. So we have to detect the case ν J > μJ . For this, we adopt the following procedure: Set L = ∅ for i = 1 to d for all J ⊂ {1, . . . , t} of |J | = i • find ν J • check the existence of J ∈ L so that J ⊂ J and ν J ≤ ν J • if no such J then add J to L. end end Note that there always exists a J ⊂ J such that ν J = μJ if ν J > μJ . Furthermore, ν J = μJ . Since the enumeration is done in increasing order of |J |, ν J exists in the list L (because |J | < |J |). Hence the J is discarded before we solve (2) for J . By this implementation, we can reduce substantial amount of time spent to identify V.
4
Concluding Remarks
We considered an optimization problem arising in the simulation of portfolio credit risk. Our re-formulation has the same worst case computational complexity as the original problem, but it allows tractability in practice. The shifting of sampling distribution based on these points enhances the efficiency of simulation quite impressively.
Acknowledgments The first author thanks Paul Glasserman and the late Perwez Shahabuddin, the coauthors of [3], on which this proceeding is based.
References 1. A. Avranitis and J. Gregory. Credit: The Complete Guide to Pricing, Hedging and Risk Management. Risk Books, London, 2001. 2. P. Glasserman. Monte Carlo Methods in Financial Engineering. Springer-Verlag, New York, 2004. 3. P. Glasserman, W. Kang, and P. Shahabuddin. Fast simulation of multifactor portfolio credit risk. Technical report, Graduate School of Business and IEOR Department, Columbia University, February 2005. 4. P. Glasserman and J. Li. Importance sampling for portfolio credit risk. Management Science, 2005.
784
W. Kang and K. Lee
5. G. Gupton, C. Finger, and M. Bhatia. CreditMetrics Technical Document. J.P. Morgan & Co., New York, NY, 1997. 6. M. Kalkbrener, H. Lotter, and L. Overbeck. Sensible and efficient capital allocation for credit portfolios. RISK, January:S19–S24, 2004. 7. H. Kellerer, U. Pferschy, and D. Pisinger. Knapsack Problems. Springer-Verlag, Berlin · Heidelberg, Germany, 2004. 8. S. Merino and M. A. Nyfeler. Applying importance sampling for estimating coherent credit risk contributions. Quantitative Finance, 4:199–207, 2004. 9. W. Morokoff. An importance sampling method for portfolios of credit risky assets. In R. Ingalls, M. Rossetti, J. Smith, and B. Peters, editors, Proceedings of the 2004 Winter Simulation Conference, pages 1668–1676, 2004. 10. J. Nocedal and S. J. Wright. Numerical Optimization. Springer-Verlag, New York, 1999.
Two-Server Network Disconnection Problem Byung-Cheon Choi and Sung-Pil Hong Department of Industrial Engineering, College of Engineering, Seoul National University, San 56-1, Shillim-9-Dong, Seoul, 151-742, Korea [email protected]
Abstract. Consider a set of users and servers connected by a network. Each server provides a unique service which is of certain benefit to each user. Now comes an attacker, who wishes to destroy a set of edges of the network in the fashion that maximizes his net gain, namely, the total disconnected benefit of users minus the total edge-destruction cost. We first discuss that the problem is polynomially solvable in the single-server case. In the multiple-server case, we will show, the problem is, however, N P -hard. In particular, when there are only two servers, the network disconnection problem becomes intractable. Then a 32 -approximation algorithm is developed for the two-server case.
1
Introduction
Consider a network of servers and their users in which each server provides a unique service to the users. (Each server also can be the user of a service other than her own.) Each user takes a benefit through the connection provided by the network to each server. Now comes an attacker who wishes to destroy a set of edges in the manner that optimizes his own objective, although defined to various circumstances, that explicitly accounts for the disconnected benefits of users resulted from destruction. Such a model, to the author’s best knowledge, was proposed first by Martel et al. (8). They considered the single-server problem in which the total disconnected benefit is maximized under an edge-destruction budget constraint. The problem, as they showed, is N P -hard. They also proposed an exact method enumerating maximum disconnected node sets among minimum cost cuts separating node pairs using cut submodularity. In this paper, we consider the problem of maximizing net gain, namely, the total disconnected benefit of users minus the edge-destroying cost. The problem is polynomially solvable when there is a single server. It is, however, N P -hard in general. We will provide the proofs. In particular, if there are two servers, the problem becomes intractable. Also, we will present a 32 -approximation algorithm for the two-server case.
The research was partially supported by KOSEF research fund R01-2005-00010271-0. Corresponding author.
M. Gavrilova et al. (Eds.): ICCSA 2006, LNCS 3982, pp. 785–792, 2006. c Springer-Verlag Berlin Heidelberg 2006
786
B.-C. Choi and S.-P. Hong
In Section 2, we review the previous models on separating nodes of a graph. Section 3 provides a mathematical model of the problem. It also discusses the polynomiality of the single-server case and N P -hardness for the k-server case with k ≥ 2. A 32 -approximation algorithm is presented in Section 4. Finally, we summarize the results and point out an open problem in Section 5.
2
Node Separation Problems: A Literature Review
In this section we review combinatorial optimization models on the separation of nodes of an undirected graph. There are various such models. (See, e.g. (1; 5).) Among them, the following three models probably have been studied most intensively. Problem 1. k-cut problem: Given an undirected G = (N, E) with nonnegative edge weights, find a minimum weight set of edges E such that the removal of E from E separates graph into exactly k nonempty components. This problem is N P -hard for general k ≥ 3 and approximable within factor 2− k2 within the optimum. Goldscmidt and Hochbaum (7) showed that the problem is polynomially solvable for fixed k. Problem 2. k-terminal cut problem, Multiway cut problem: Given an undirected G = (N, E) with nonnegative edge weights and a set of k specified nodes, or terminals, find a minimum weight set of edges E such that the removal of E from E disconnects each terminal from all the others. k-terminal cut problem is Max-SN P -hard even for k = 3 (4). Therefore, there is some > 0 such that (1 + )-approximation is N P -hard. Naor and Zosin(9) considered the directed version of k multi-terminal cut problem, and presented two 2-approximation algorithms. The current best approximation guarantee is 3 1 alinescu et al. (3). 2 − k by C˘ k-cut problem k-terminal cut problem multicut problem k=2 Pa Pb P (12) k≥3 P (2; 7) Max-SN P -hardcd (4) Max-SN P -harde arbitrary N P -hardf (7) Max-SN P -hardgh (4; 3) Max-SN P -hardi a b c d e f g h i
P means “polynomially solvable” M ax − SN P -hard if there is a constraint on the size of the separated component polynomially solvable if the graph is planar N P -hard even if all edge weights are equal to 1 N P -hard by the reduction from 3-terminal cut problem There is a (2 − k2 )-approximation algorithm There is a ( 32 − k1 )-approximation algorithm N P -hard even if the graph is planar and all edge weights are equal to 1 approximable within O(log k).
Two-Server Network Disconnection Problem
787
Problem 3. Multicut problem: Given an undirected G = (N, E) with nonnegative edge weights and k pairs of nodes, find a minimum weight set of edges E such that the removal of E from E separates all pairs of nodes. This problem is a generalization of the k-terminal cut problem. Hence, it also becomes Max-SN P -hard when k = 3. Currently, this problem is approximable within the factor of O(log k) (6). These works can be summarized as above text table. In the N P -hardness proof of the k-server network disconnection problem, we reduce the (k+1)-terminal problem to the k-server network disconnection problem.
3 3.1
The k-Server Network Disconnection Problem Problem Formulation
Given an undirected graph G = (N, E) with N = {1, 2, . . . , n}, the server set S = {s1 , · · · , sk } ⊆ N , the costs cij ≥ 0, (i, j) ∈ E, the nonnegative vectors di = (d1i , · · · , dki )T , i ∈ N , find a set F ⊆ E that maximizes the total disconnected benefits of nodes minus the edge-destruction cost, (i,j)∈F cij . Let Nl be the set of nodes remaining connected from server l after the destruction of F . Then, it is easy to see that for any i = j, the two sets, Ni and Nj are either identical or mutually exclusive: Ni = Nj or Ni ∩ Nj = ∅. Hence, if we denote by N0 the nodes disconnected from all the servers, then the set N = {N0 , N1 , . . . , Nk } is a partition of N . If N has p distinct sets, we will call N a p-partition. Our problem can be restated as follows: Problem 1. k-server network disconnection problem: Find a partition N = {N0 , N1 , . . . , Nk } with sj ∈ Nj , j = 1, 2, . . . , n that maximizes j z(N ) = di − cij . (1) l:Nl ∈N i∈Nl j ∈N / l
3.2
(i,j)∈E:i∈Np , j∈Nq such that Np =Nq
Polynomiality of the Single-Server Case
We need to find N1 ⊆ N with s1 ∈ N1 so that N = {N \ N1 , N1 } maximizes (1). Define a binary variable x as follows: 1, if j ∈ N1 , xj = 0, otherwise. For notational convenience we assume s1 = 1, cij = cji for all i, j ∈ N and cij = 0 for (i, j) ∈ / E. Then we can formulate the single-server problem as a 0-1 quadratic program as follows. max z(x) = nj=1 d1j (1 − xj ) − 12 ni=1 nj=1 cij (xi − xj )2 (2) sub. to x1 = 1, xj ∈ {0, 1}, j ∈ N \ {1}. x2j ,
Using xj = it is easy to rewrite the objective of (2) as a quadratic function n of x ˆ = (x2 , x3 , . . . , xn ): z(ˆ x) = x ˆT Qˆ x + j=2 (d1j − c1j ), where Q = (qij ) i=2,...,n j=2,...,n is given as follows:
788
B.-C. Choi and S.-P. Hong
qij =
−d1i + ci1 − cij ,
n
k=2 cik ,
if i = j, otherwise.
(3)
Lemma 1. An unconstrained 0-1 quadratic maximization with nonnegative offdiagonal elements is polynomially solvable.
Proof. See Picard and Ratliff (10).
Theorem 1. The single-server network disconnection problem is polynomially solvable. Proof. Since cij ≥ 0, (i, j) ∈ E, the polynomiality of the quadratic program formulation (2), follows from (3) and Lemma 1. 3.3
N P -Hardness of the k-Server Case
Unlike the single-server case, our problem is N P -hard in general. When k = 2, in particular, it becomes N P -hard. To see this, consider the decision version of the 3-terminal cut problem, Problem 2. Problem 2. Decision version of 3-terminal cut problem(3DMC) Given a terminal set T = {t1 , t2 , t3 }, a weight wij ≥ 0, (i, j) ∈ E and a constant W , is there a partition N = (N1 , N2 , N3 ) such that tj ∈ Nj , j = 1, 2, 3, and w(N ) = wij ≤ W ? (i,j)∈E|i∈Np , j∈Nq such that Np =Nq
Theorem 2. The 2-server network disconnection problem is N P -hard. Proof. Given any instance of 3DMC, we can construct an instance of the 2server problem as follows : On the same graph G = (N, E), designate the first two terminals of 3DMC as the two server nodes, s1 = t1 and s2 = t2 while t3 is a nonserver node in the 2-server instance. Now, define the benefit vector for the node of the 2-server instance: For a constant M > W , set ds1 = (d1s1 , d2s1 ) = (0, M ), ds2 = (M, 0), and dt3 = (M, M ). The other nodes are assigned to the benefit vector, (0, 0). Also set Z = 4M −W . Finally, the edge cost is defined as cij = wij , (i, j) ∈ E. We claim that the answer to 3DMC is ‘yes’ if and only if the 2-server instance has a partition whose value is no less than Z. ˆ1, N ˆ2 , N ˆ3 } of 3DMC satisfying w(Nˆ ) ≤ Suppose there is a partition Nˆ = {N ˆ3 , N ˆ1 , N ˆ2 } is a solution of the 2-server W . Then, it is easy to see that N = {N ˆj , j = 1, 2, and instance: sj ∈ N ˆ) = z(N
3 l=1 i∈N ˆl j ∈ ˆl /N
dji −
wij ≥ 4M − W = Z.
ˆp , j∈N ˆq (i,j)∈E:i∈N ˆ p =N ˆq such that N
˜ = {N ˜0, N ˜1 , N ˜2 } is a solution of the 2-server On the other hand, assume N ˜ ˜ instance such that z(N ) ≥ Z. To show that N is a 3-terminal cut, it suffices to
Two-Server Network Disconnection Problem
789
˜j , j = 0, 1, 2 are all pair-wise distinct. But, if any two of the sets are see that N identical, then z(˜ x) ≤ 3M , a contradiction to the assumption. Furthermore, this also implies ˜ ) = 4M − w(˜ z(N x) ≥ Z = 4M − W. Since 4M − w(N˜ ) ≥ Z = 4M − W , we get w(N˜ ) ≤ W .
It is not hard to see that the proof can be extended to show that the (k + 1)terminal cut problem is a special case of the k-server network disconnection problem.
4
A 32 -Approximation of the 2-Server Case
Let N = {N0 , N1 , N2 } be a solution of the 2-server case such that s1 ∈ N1 and s2 ∈ N2 . Define E0 = {(i, j) ∈ E | i ∈ N1 and j ∈ N2 } E1 = {(i, j) ∈ E | i ∈ N1 and j ∈ N0 }, and E2 = {(i, j) ∈ E | i ∈ N2 and j ∈ N0 }. Recall that it may be either N1 = N2 or N0 = ∅ and thus an optimal solution
Fig. 1. 2-server solution
may be 1-, 2-, or 3-partition. If it is trivially a 1-partition, then the optimal objective value is 0. The idea is to approximate the optimum with an optimal 2-partition which is, as we will show, polynomially computable. Algorithm H : Find an optimal 2-partition as an approximation of the optimum. Lemma 1. An optimal 2-partition can be computed in polynomial time. Proof. There are two cases of an optimal 2-partition: N1 = N2 or N1 = N2 .
790
B.-C. Choi and S.-P. Hong
Case 1: N1 and N2 are distinct Then we can formulate the 2-server problem as a quadratic program similarly to the single-server case. Define a binary variable x as follows: 1, if j ∈ N1 , xj = 0, if j ∈ N2 . As before, we adopt the notation, s1 = 1, s2 = 2, cij = cji for all i, j ∈ N and cij = 0, (i, j) ∈ / E. Then, it is easy to see that the 2-server case is equivalent to n n n max z(x) = j=1 d2j xj + d1j (1 − xj ) − 12 i=1 j=1 cij (xi − xj )2 (4) sub. to x1 = 1, x2 = 0, xj ∈ {0, 1}, j ∈ N \ {1, 2}. Then can rewrite (4) in terms of x ˆ = (x3 , x4 , . . . , xn )T : z(ˆ x) = xˆT Qˆ x+ n we 1 2 1 j=1 dj + d1 + d2 − 2c12 , where Q = (qij ) i=3,...,n is given as follows: j=3,...,n
qij =
d1i − d2i + ci1 − ci2 − cij ,
n
k=1 cik ,
if i = j, otherwise.
(5)
Since cij , (i, j) ∈ E are nonnegative, z(ˆ x) can be solved in polynomial time from Lemma 1. Case 2: N1 and N2 are identical In this case, the problem is essentially a single-server problem. Merge s1 and s2 into a single node and replace benefit vector (d1i , d2i ) by a single benefit d1i + d2i . Then, it is easy to see that the 2-server case is equivalent to the single-server case defined on the modified network, implying that the 2-server case can be solved in polynomial time. Case 1 and 2 complete proof. Theorem 1. An optimal 2-partition is factor
3 2
within the optimum.
Proof. Let N ∗ = {N0∗ , N1∗ , N2∗ } and N H = {N0H , N1H , N2H } be an optimal solution and the solution of Algorithm H, respectively. z(N ∗ ) =
2 2 ( dji − dli ) − l=0 i∈Nl∗ j=1
=
1 (( 2
d1i −
d2i + i∈N0∗ ∪N1∗
i∈N2∗
i∈N0∗
=
(d1i + d2i ) −
d2i −
d1i +
cij ) + (
cij
(i,j)∈E0∗ ∪E1∗ ∪E2∗
∗ ∪E ∗ (i,j)∈E0 2
+(
i∈N0∗ ∪N2∗
i∈N1∗
cij ) ∗ ∪E ∗ (i,j)∈E0 1
cij ))
(i,j)∈E1∗ ∪E2∗
1 (z(N0∗ , N1∗ ∪ N2∗ ) + z(N1∗ , N0∗ ∪ N2∗ ) + z(N2∗ , N0∗ ∪ N1∗ )). 2
Two-Server Network Disconnection Problem
791
Since z(N H ) ≥ max{z(N0∗ ; N1∗ ∪ N2∗ ), z(N1∗ ; N0∗ ∪ N2∗ ), z(N2∗ ; N0∗ ∪ N1∗ )}, z(N ∗ ) ≤
3 z(N H ). 2
This completes the proof.
Theorem 2. Algorithm H is a 32 -approximation algorithm for the 2-server case. Proof. From Lemma 1 and Theorem 1.
5
Summary and Further Research
We consider the k-server network disconnection problem. We show that the problem can be solved in polynomial time for the single-server case while the problem is NP-hard in general. The problem is NP-hard even for the two-server case. Also, we propose a 32 -approximation algorithm for two-server case. The approximability of the k-server network disconnection problem for general k ≥ 0 remains open. The quadratic program based approximation algorithm for the 2-server case does not seem to straightforwardly extend to the general case.
Acknowledgment The authors thank Prof. Young-Soo Myung for the fruitful discussion which leads to a tighter analysis of the algorithm.
References [1] G. Ausiello, P. Crescenzi, G. Gambosi, V. Kann, A. Marchetti-Spaccamela, and M. Protasi. Complexity and Approximation, Springer-Verlag, Berlin, 1999. [2] M. Burlet, and O. Goldschmidt. A new and improved algorithm for the 3-cut problem. Operations Research Letters 21(1997) pp. 225-227. [3] G. C˘ alinescu, H. Karloff, and Y. Rabani. An improved approximation algorithm for muliway cut, Proc. 30th ACM Symposium on Theory of Computing, ACM, (1998) pp. 48-52. [4] E. Dahlhaus, D.S. Johnson, C.H. Papadimitriou, P.D. Seymour, and M. Yannakakis. The complexity of multiterminal cuts, SIAM Journal on Computing 23(1994) pp. 864-894. [5] M.R. Garey and D.S. Johnson. Computers and Intractability: A Guide to the Theory of NP-completeness, W.H. Freeman, New York, 1979. [6] N. Garg, V.V. Vazirani, and M. Yanakakis. Approximating max-flow min(multi)cut theorems and their applications, SIAM Journal on Computing 25(1996) pp. 235-251. [7] O. Goldschmidt, and D.S. Hochbaum. A polynomial time algorithm for the k-cut problem for k fixed, Mathematics of Operations Research 19(1994) pp. 24-37.
792
B.-C. Choi and S.-P. Hong
[8] C. Martel, G. Nuckolls, and D. Sniegowski. Computing the disconnectivity of a graph, manuscript, UC Davis (2001). [9] J. Maor, and L. Zosin. A 2-approximation algorithm for the directed multiway cut problem, SIAM Journal on Computing 31(2001) pp. 477-482. [10] J.C. Picard, and H.D. Ratliff. Minimum cuts and related problems, Networks 5(1974) pp. 357-370. [11] H. Saran, and V.V. Vazirani. Finding k-cuts within twice the optimal, Proc. 32nd Annual IEEE Symposium on Foundations of Computer Science, IEEE Computer Society, (1991) pp. 743-751. [12] M. Yannakakis, P.C. Kanellakis, S.C. Cosmadakis, and C.H. Papadmitriou. Cutting and partitioning a graph after a fixed pattern, in Automata, Language and Programming, Lecture Notes and Computer Science 154(1983) pp. 712-722.
One-Sided Monge TSP Is NP-Hard Vladimir Deineko1 and Alexander Tiskin2 1
2
Warwick Business School, The University of Warwick, Coventry CV47AL, UK [email protected] Dept. of Computer Science, The University of Warwick, Coventry CV47AL, UK [email protected]
Abstract. The Travelling Salesman Problem (TSP) is a classical NPhard optimisation problem. There exist, however, special cases of the TSP that can be solved in polynomial time. Many of the well-known TSP special cases have been characterized by imposing special four-point conditions on the underlying distance matrix. Probably the most famous of these special cases is the TSP on a Monge matrix, which is known to be polynomially solvable (as are some other generally NP-hard problems restricted to this class of matrices). By relaxing the four-point conditions corresponding to Monge matrices in different ways, one can define other interesting special cases of the TSP, some of which turn out to be polynomially solvable, and some NP-hard. However, the complexity status of one such relaxation, which we call one-sided Monge TSP (also known as the TSP on a relaxed Supnick matrix ), has remained unresolved. In this note, we show that this version of the TSP problem is NP-hard. This completes the full classification of all possible four-point conditions for symmetric TSP.
1
Introduction
The travelling salesman problem (TSP) is a well-known problem of combinatorial optimisation. In the symmetric TSP, given a symmetric n × n distance matrix C = (cij ), one looks for a cyclic n permutation τ of the set {1, 2, . . . , n} that minimises the function c(τ ) = i=1 ci,τ (i) . The value c(τ ) is called the length of the permutation τ . We will in the following refer to the items in τ as points. The TSP is an NP-hard problem [10]. There exist, however, special cases of the TSP that can be solved in polynomial time. For a survey of efficiently solvable cases of the TSP, see [3, 11, 14]. Many of the well-known TSP special cases result from imposing special conditions on the underlying distance matrix. Probably the most famous of these special cases is the TSP on a Monge matrix. For a number of well-known NP-hard problems, including TSP, restriction to Monge matrices reduces the complexity to polynomial (see survey [5]). We give below two equivalent definitions of a Monge matrix. In what follows, we will always assume that the matrices under considerations are symmetric. Definition 1. An n × n matrix C = (cij ) is a Monge matrix, if it satisfies the Monge conditions: cij + ci j ≤ cij + ci j
1 ≤ i < i ≤ n
1 ≤ j < j ≤ n
M. Gavrilova et al. (Eds.): ICCSA 2006, LNCS 3982, pp. 793–801, 2006. c Springer-Verlag Berlin Heidelberg 2006
(1)
794
V. Deineko and A. Tiskin
Definition 2. An n × n matrix C = (cij ) is a Monge matrix, if cij + ci+1,j+1 ≤ ci,j+1 + ci+1,j
1≤i≤n−1
1≤j ≤n−1
It can be easily seen that the above two definitions define the same set of matrices. The difference arises when we begin to generalize the problem by relaxing the conditions imposed on the matrix. For example, the diagonal entries are not involved in the calculation of the TSP objective function, so one can define cii (i = 1, . . . , n) arbitrarily without affecting the solution of the TSP. If we relax Definition 1 by excluding the inequalities containing the diagonal entries, then the structural properties of the matrix remain essentially unchanged. In fact, given a matrix (cij ) with indeterminate diagonal elements, which satisfies inequalities (1) for i = j, i = j , i = j, i = j , it is always possible to define diagonal elements cii so that the resulting matrix is a Monge matrix (see Proposition 2.13 in [3]). This relaxation of symmetric Monge matrices is known as Supnick matrices. Alternatively, Supnick matrices are defined as matrices satisfying conditions cij + cj+1,l ≤ ci,j+1 + cjl ≤ cil + cj,j+1
1≤i<j <j+1
Supnick [21] has shown that an optimal TSP tour on such matrices is given by σSmin = 1, 3, 5, 7, 9, . . . , 8, 6, 4, 2, 1. The well-known classes of Demidenko [9] and Van der Veen [23] matrices, as well as a class of matrices investigated in [4], can also be viewed as relaxations of Definition 1. A symmetric n × n matrix C = (cij ) is a Demidenko matrix, if cij + ci j ≤ cij + ci j
1 ≤ j < i < i < j ≤ n
and a Van der Veen matrix, if cij + ci j ≤ cij + ci j
1 ≤ i < j < i < j ≤ n
The TSP on these classes of matrices is polynomially solvable: an optimal tour can be found among the so-called pyramidal tours in O(n2 ) time (see e.g. [3]). Now, instead of Definition 1, we consider Definition 2, and relax it by excluding inequalities involving the diagonal elements. Definition 3. An n × n matrix C = (cij ) is a relaxed Supnick matrix, if cij + ci+1,j+1 ≤ ci,j+1 + ci+1,j
1≤i<j−1≤n−2
In contrast with the Supnick relaxation of Definition 1, the relaxation from Definition 2 to Definition 3 may have a very significant effect on the matrix structure. While the Supnick conditions constrain all pairs of matrix elements excluding the main diagonal, the relaxed Supnick conditions only constrain pairs of matrix elements that are on the same side of the main diagonal (i.e. are either both above, or, by symmetry, both below the main diagonal). In calling such matrices “relaxed Supnick”, we follow the terminology of [6]; otherwise, they could naturally be called one-sided Monge matrices.
One-Sided Monge TSP Is NP-Hard
795
Table 1. Classification of four-point conditions (adapted from [6]) A≤B A≤B
2
A≥B
A≤C 2
A≥C
B≤C
B≥C
O(n )
O(1)
O(n )
O(1)
O(1)
O(1)
[9]
[11, 24]
[9, 23]
[15]
[21]
[15]
NP-hard [8]
O(1)
NP-hard [8, 22]
O(n)
O(n)
[6]
[15, 19, 20]
[21]
O(n2 )
O(1)
O(1)
O(1)
[23]
[11, 24]
[6]
[6]
NP-hard [22]
O(1)
O(1)
[6]
[6]
NP-hard (new)
O(n2 )
A≥B
A≤C
A≥C
B≤C
[16, 6]
O(n4 )
B≥C
[6]
Given a relaxed Supnick matrix with indeterminate diagonal elements, it is not always possible to define the diagonal elements so that the resulting matrix is a Monge matrix, as the following example shows: ⎛ ⎞ × 1 0 1 5 ⎜1 × 0 0 3⎟ ⎜ ⎟ ⎜0 0 × 1 1⎟ ⎜ ⎟ ⎝1 0 1 × 0⎠ 5 3 1 0 × Computational complexity of the TSP on the described matrix classes, and other classes of similar type, can be studied systematically by considering fourpoint conditions. Let i, j, k, l be four points with 1 ≤ i < j < k < l ≤ n. A symmetric distance matrix for these points contains six entries above the main diagonal, which correspond to the six edges connecting these points. It is possible to form three pairs of non-incident edges: {(i, j), (k, l)}, {(i, k), (j, l)}, {(i, l), (j, k)}. We denote the combined lengths of these pairs by A, B, C, respectively: A = cij + ckl
B = cik + cjl
C = cil + cjk
A four-point condition is an inequality relation among the values A, B, C, which has to be satisfied for all possible choices of indices i, j, k, l with 1 ≤ i < j < k < l ≤ n. Using this notation, Supnick matrices are defined as matrices satisfying conditions A ≤ B, B ≤ C, Demidenko matrices as matrices satisfying condition A ≤ B, Van der Veen matrices as matrices satisfying conditions A ≤ C, and relaxed Supnick matrices as matrices satisfying condition B ≤ C. Nearly all possible four-point conditions and their pairwise combinations have been
796
V. Deineko and A. Tiskin
classified in [6] (see Table 1) according to the polynomial solvability or NPhardness of the arising TSP. The only gap that has remained in this classification is the TSP on relaxed Supnick matrices. In the following section we show that this problem is NP-hard, thus making a final point in the classification of fourpoint conditions for symmetric TSP.
2
The TSP on Relaxed Supnick Matrices
As before, we assume that all the considered matrices are symmetric. Theorem 1. The TSP on a relaxed Supnick matrix is NP-hard. Proof. The proof is by reduction from the Hamiltonian cycle problem in grid graphs [13]. We follow the definitions from [13]. Let G∞ be the infinite graph, whose vertex set consists of all the integer points in the plane, and in which two vertices are connected, if and only if the Euclidean distance between them is equal to one. A grid graph is a finite node-induced subgraph of G∞ . It is shown in [13] that the Hamiltonian cycle problem in a grid graph is NP-hard. Given a grid graph on n nodes, we will construct an n × n relaxed Supnick matrix with non-negative entries, such that there exists a Hamiltonian cycle in the graph, if and only if the optimal TSP tour on the corresponding relaxed Supnick matrix has zero length. Consider an arbitrary grid graph G on n nodes. We may assume that G is connected (otherwise, it is guaranteed not to contain a Hamiltonian cycle). First, we embed G in a square grid of size m × m, where m is sufficiently large. Due to the connectivity of G, we may assume m ≤ n. We then extend the square grid to a parallelogram grid, and number the nodes as shown in Figure 1. The upperleft point in the square grid will be numbered m, the two neighbouring points 2m − 1 and 2m, etc. Let N = m2 + m(m − 1) be the number of points in the parallelogram grid. We then construct an N × N relaxed Supnick matrix C (N ) as follows. For two neighbouring nodes i, j, i < j, in the parallelogram grid, we (N ) define cij = 0, if both i and j belong to G. Notice that the numbering of points in the parallelogram grid is chosen so that all these entries are placed on two m m−1 i
i+m
i+m−1 2 1
m+2
m+1
Fig. 1. Embedding a square grid in a parallelogram
One-Sided Monge TSP Is NP-Hard
797
adjacent diagonals: j − i ∈ {m − 1, m}. For all the remaining entries in these two (N ) (N ) diagonals, we define cij = 1. Further, we define c1j = 1 for j = 1, . . . , m − 1, (N )
and ciN = 1 for i = N − m + 2, . . . , N . Notice that so far there exist no four defined entries in the upper triangular part of C (N ) that would form an inequality from Definition 2, therefore none of the one-sided Monge conditions are violated. We keep filling in the upper triangular part of the matrix, respecting the Monge conditions. We define (N )
cij
(N ) cij
(N )
(N ) (N ) = max ci−1,j + ci,j+1 − ci−1,j+1 , 1 (N )
(N ) (N ) = max ci,j−1 + ci+1,j − ci+1,j−1 , 1
j − i = 1, . . . , m − 2 j − i = m + 1, . . . , N − 1
for each element, as soon as the right-hand side of this element’s definition has (N ) itself been defined. We always have cij ≥ 1 unless i, j are neighbouring nodes in G, and the one-sided Monge condition is always preserved. The lower triangular part of C (N ) is defined by symmetry. The construction of the relaxed Supnick matrix C (N ) is now completed. It can be easily checked that, given a relaxed Supnick matrix C, the symmetric matrix obtained by deleting a row and a corresponding column from C is still relaxed Supnick. By induction, the same is true after deleting an arbitrary subset of rows and corresponding columns. By deleting all rows and columns from C (N ) , except those indexed by points in the original grid graph G, we obtain an n × n relaxed Supnick matrix C (n) . Clearly, there exists a Hamiltonian cycle in graph G, if and only if the optimal TSP tour on the matrix C (n) has zero length. In order to prove that the above problem reduction is polynomial, it only remains to show that the growth of individual matrix elements is appropriately bounded. Observe that each new element created in C (N ) cannot exceed the largest existing element by more than a factor of 2. Consequently, the largest 2 2 2 4 4 element in C (N ) has value1 at most 2N ≤ 2(2m ) = 24m ≤ 24n , and can therefore be represented by a number of bits that is polynomial in n. Hence, even if our computational model does not support unit-cost arithmetic on arbitrarily long integers, arithmetic operations on matrix elements can be emulated bitwise in polynomial time. The reduction is completed.
3
Relaxed Supnick Matrices and Exponential Neighbourhoods
The research into polynomially solvable cases for intractable problems is partly motivated by the hope to identify new approaches to constructing generalpurpose heuristics. One of the by-products of this research are exponential neighbourhoods (see surveys [1, 7]), which are being intensively studied in relation to 1
Much tighter estimates are possible. However, this crude bound is sufficient for our proof.
798
V. Deineko and A. Tiskin
local search algorithms. Among new families of exponential neighbourhoods presented in [6], there is a neighbourhood of strongly balanced permutations, which are related to relaxed Supnick matrices. To justify the introduction of this neighbourhood, the authors of [6] considered a special subclass of relaxed Supnick matrices. Definition 4. A relaxed Supnick matrix C = (cij ) is strong, if ctp − ctz − ckp ≤ csy − cky − csz
t
It can be shown that the inequalities in Definition 4 are equivalent to the system of inequalities cij − ci,j+1 − ci+1,j ≤ cj−1,N − ci+1,N − cj−1,j
i<j
and can therefore be checked in time O(n2 ). As shown in [6], an optimal tour for the TSP on a strong relaxed Supnick matrix can be found in the set of so-called strongly balanced tours. We first give some preliminary definitions. An index i ∈ {1, . . . , n} is a peak of a permutation τ , if i > max{τ −1 (i), τ (i)}, and a valley, if i < min{τ −1 (i), τ (i)}. An index which is neither peak nor valley is called intermediate. Informally, in a strongly balanced tour, all intermediate nodes are “evenly spread” on the slopes between peaks and valleys. We now give a formal definition. We will consider partially constructed tours on the sets of indices {1, 2, 3, . . . , m − 1, m} with m = 1, 2, . . . , n. For a fixed m, a partially constructed tour will consist of a set of finite index sequences; indices from each sequence are placed in the tour in consecutive positions. We refer to each of these sequences i1 , . . . , j1 as fragment [i1 , j1 ], stressing that i1 is the initial, and j1 is the final element in the corresponding sequence. Notice that it is not necessary for a fragment [i1 , j1 ] with i1 < j1 to contain, for example, i1 + 1. For a one-element fragment we use the notation [i, i]. For example, if we start with a tour where 1 and 2 are two valleys, we can represent this initial tour by two fragments [1, 1] and [2, 2]. Mutual placement of fragments is not fixed, i.e. they can be permuted. The fragments can also be reversed, i.e. fragment [i, j] can be replaced by fragment [j, i]. Definition 5 ([6]). A tour is strongly balanced, if it can be constructed as follows. Start with an initial tour [1, 1] and repeat the folowing step for m = 2, . . . , n − 1: Given a partial tour on the set of indices {1, . . . , m − 1}, the tour is represented by fragments [i1 , j2 ], [i2 , j2 ], . . ., [ip , jp ]. Let imin1 = min{i1 , j1 , i2 , j2 , . . . , ip , jp } imin2 = min{i1 , j1 , i2 , j2 , . . . , ip , jp } \ {imin1 } imin3 = min{i1 , j1 , i2 , j2 , . . . , ip , jp } \ {imin1 , imin2 }
One-Sided Monge TSP Is NP-Hard
799
Add index m to the partial tour by choosing one of the options below: – m is placed as a new valley; this creates a new fragment [m, m]; – m is placed as an intermediate index adjacent to imin1 ; fragment [imin1 , s] in the partially constructed tour is replaced by the new fragment [m, s]; – m is placed as a new peak merging two fragments; m is adjacent to imin1 and to either imin2 or to imin3 . In the first case, the fragments [imin1 , j] and [imin2 , s] are merged into [j, s]; in the second case, the fragments [imin1 , imin2 ] and [imin3 , s] are merged into [imin2 , s]. The final node n can only be added to a partial tour consisting of one fragment. As an example, the reader can check that the tour 1, 4, 10, 6, 2, 7, 12, 9, 3, 8, 11, 5, 1 with the intermediate nodes 4, 5, 6, 7, 8, 9 evenly spread on the slopes, is a strongly balanced tour. Proposition 1 ([6]). An optimal tour for the TSP with a strong reduced Supnick matrix can be found in the set of strongly balanced tours. Despite the relatively simple structure of strongly balanced tours, the problem of finding an optimal tour in this set appears to be difficult. To avoid the difficulties, the authors of [6] have further restricted the special class of matrices, and identified a subset of strongly balanced tours, where an optimal tour can be found in polynomial time. Proposition 2 ([6]). Consider strongly balanced tours for which the maximum number of fragments in the construction of Definition 5 is bounded by a constant. An optimal tour in this special subset of strongly balanced tours can be found in polynomial time. We can now explain why the problem of finding an optimal strongly balanced tour is difficult for an unbounded number of fragments. Theorem 2. The TSP on a strong relaxed Supnick matrix is NP-hard. Proof. The proof is similar to the proof of Theorem 1. We define the initial (N ) values cij in two adjacent diagonals: j − i ∈ {m − 1, m} exactly as before. The only difference is in the way we define the remaining entries. To ensure that the matrix is a strong relaxed Supnick matrix, it is sufficient to define (N )
cij
(N )
(N )
(N )
= max{cj,i−1 + cj+1,i−1 − cj+1,i−1 , (N )
(N )
(N )
(N )
(N )
j <j+m
(N )
i < p < i+m
cj,i−1 + ci−1,i + cj+1,N − ci−1,N − cj+1,i−1 , 1} and (N )
(N )
(N )
(N )
(N )
(N )
cip = max{ci−1,p + ci,i+1 − ci−1,p+1 , (N )
(N )
ci−1,p + cp,p+1 + ciN − cpN − ci−1,p+1 , 1}
800
V. Deineko and A. Tiskin
The rest of the proof is identical to the proof of Theorem 1, with the only 2 4 difference that the largest element in C (N ) has now value at most 3N ≤ 34n . Corollary 1. The problem of finding an optimal strongly balanced tour is NPhard. The two statements above justify the introduction of a special subset of strongly balanced tours (Proposition 2), where the optimal permutation can be found in polynomial time.
4
Conclusion
We have shown that the TSP on a relaxed Supnick matrix, which can also be viewed as a one-sided Monge matrix, is NP-hard. This completes the classification of all possible four-point conditions for symmetric TSP [6]. Our results justify imposing further restrictions on relaxed Supnick matrices in order to identify new polynomially solvable cases of the TSP. We hope that the constructions in this note, which reduce the TSP on a grid graph to the TSP with a special matrix, in combination with the special polynomially solvable case described in Proposition 2 (see [6] for further details), may lead to identifying special types of grid graphs, where the optimal tour can be found in polynomial time. Some polynomially solvable cases of the TSP on grid graphs have already been identified in [2, 18].
References ¨ Orlin, J.B., Punnen, A.: A survey of very large-scale 1. Ahuja, R.K., Ergun, O., neighborhood search techniques. Discrete Applied Mathematics, 123 (2002) 75–102 2. Arkin, E.M., Bemder, M.A., Demaine, E., Fekete, S.P., Mitchell, J.S., Sthia, S.: Optimal covering tours with turn costs. In Proc. 13th ACM–SIAM Symposium on Discrete Algorithms, SODA01 (2001) 138–147 3. Burkard, R.E., Deineko, V.G., van Dal, R., van der Veen, J.A.A., Woeginger, G.J.: Well-solvable special cases of the TSP: A survey. SIAM Review, 40, 3 (1998) 496–546 4. Burkard, R.E., Deineko, V.G.: On the traveling salesman problem with a relaxed Monge matrix. Information Processing Letters, 67 (1998) 231–237 5. Burkard, R.E., Klinz, B.,Rudolf, R.: Perspectives of Monge Properties in Optimization. Discrete Applied Mathematics, 70 (1996) 95–161 6. Deineko, V.G., Klinz, B., Woeginger, G.J.: Four point conditions for symmetric TSP and exponential neighbourhoods. In Proc. 17th ACM–SIAM Symposium on Discrete Algorithms, SODA 06 (2006) 544–553 7. Deineko, V.G., Woeginger, G.J.: A study of exponential neighborhoods for the travelling salesman problem and for the quadratic assignment problem. Mathematical Programming 87 (2000) 519–542 8. Deineko, V.G., Woeginger, G.J.: The maximum travelling salesman problem on symmetric Demidenko matrices. Discrete Applied Mathematics, 99 (2000) 413–425
One-Sided Monge TSP Is NP-Hard
801
9. Demidenko, V.M.: A special case of traveling salesman problems. Izv. Akad. Nauk. BSSR, Ser. Fiz.-mat. Nauk, 5 (1976) 28–32 (in Russian) 10. Garey, M.R., Johnson, D.S.: Computers and intractability. Freeman and Co., San Francisco (1979) 11. Gilmore, P.C., Lawler, E.L., Shmoys, D.B.: Well-solved special cases. Chapter 7 in [17] (1985) 207–249 12. Gutin, G., Punnen, A.P. (eds.): The travelling salesman problem and its variations. Kluwer Academic Publishers (2002) 13. Itai, A., Papadimitiou, C.H., Szwarcfiter, J.L.: Hamiltonian paths in grid graphs. SIAM Journal on Computing, 11, 4 (1982) 676–686 14. Kabadi, S.N.: Polynomially solvable cases of the TSP. Chapter 11 in [12] (2002) 489–583 15. Kalmanson, K.: Edgeconvex circuits and the traveling salesman problem. Canadian Journal of Mathematics, 27 (1975) 1000–1010 16. Lawler, E.L.: A solvable case of the traveling salesman problem. Mathematical Programming, 1 (1971) 267–269 17. Lawler, E.L., Lenstra, J.K., Rinnooy Kan, A.H.G., Shmoys, D.B.: The Traveling Salesman Problem. Wiley, Chichester (1985) 18. Lenhart, W., Umans, U.: Hamiltonian cycles in solid grid graphs. In Proceedings of 38 Annual Symposium on Foundations of Computer Science, FOCS ’97 (1997) 496–596 19. Quintas, L.V., Supnick, F.: Extreme Hamiltonian Circuits: Resolution of the convex-odd case. Proc. Amer. Math. Soc., 15 (1964) 454–459 20. Quintas, L.V., Supnick, F.: Extreme Hamiltonian Circuits: Resolution of the convex-even case. Proc. Amer. Math. Soc., 16 (1965) 1058–1061 21. Supnick, F.: Extreme Hamiltonian lines. Annals of Math., 66 (1957) 179–201 22. Steiner, G., Xue, X.: The maximum traveling salesman problem on van der Veen matrices. Discrete Applied Mathematics, 146 (2005) 1–2 23. van der Veen, J.A.A.: A new class of pyramidally solvable symmetric traveling salesman problems. SIAM J. Discr. Math., 7 (1994) 585–592 24. Volodarskiy, Y.M., Gabovich, E.Y., Zacharin, A.Y.: A solvable case in discrete programming. Izv. Akad. Nauk. SSSR, Ser. Techn. Kibernetika, 1 (1976) 34–44 (in Russian). English translation in Engineering Cybernetics, 14 (1977) 23–32
On Direct Methods for Lexicographic Min-Max Optimization ´ Wlodzimierz Ogryczak and Tomasz Sliwi´ nski Warsaw University of Technology, Institute of Control & Computation Engineering, 00-665 Warsaw, Poland {wogrycza, tsliwins}@ia.pw.edu.pl
Abstract. The approach called the Lexicographic Min-Max (LMM) optimization depends on searching for solutions minimal according to the lex-max order on a multidimensional outcome space. LMM is a refinement of the standard Min-Max optimization, but in the former, in addition to the largest outcome, we minimize also the second largest outcome (provided that the largest one remains as small as possible), minimize the third largest (provided that the two largest remain as small as possible), and so on. The necessity of point-wise ordering of outcomes within the lexicographic optimization scheme causes that the LMM problem is hard to implement. For convex problems it is possible to use iterative algorithms solving a sequence of properly defined Min-Max problems by eliminating some blocked outcomes. In general, it may not exist any blocked outcome thus disabling possibility of iterative Min-Max processing. In this paper we analyze two alternative optimization models allowing to form lexicographic sequential procedures for various nonconvex (possibly discrete) LMM problems. Both the approaches are based on sequential optimization of directly defined artificial criteria. The criteria can be introduced into the original model with some auxiliary variables and linear inequalities thus the methods are easily implementable.
1
Lexicographic Min-Max
There are several multiple criteria decision problems where the Pareto-optimal solution concept is not powerful enough to resolve the problem since the equity or fairness among uniform individual outcomes is an important issue [10, 11, 17]. Uniform and equitable outcomes arise in many dynamic programs where individual objective functions represent the same outcome for various periods [9]. In the stochastic problems uniform objectives may represent various possible values of the same nondeterministic outcome ([15] and references therein). Moreover, many modeling techniques for decision problems first introduce some uniform objectives and next consider their impartial aggregations. The most direct models with uniform equitable criteria are related to the optimization of systems
The research was supported by the Ministry of Science and Information Society Technologies under grant 3T11C 005 27 “Models and Algorithms for Efficient and Fair Resource Allocation in Complex Systems.”
M. Gavrilova et al. (Eds.): ICCSA 2006, LNCS 3982, pp. 802–811, 2006. c Springer-Verlag Berlin Heidelberg 2006
On Direct Methods for Lexicographic Min-Max Optimization
803
which serve many users. For instance, efficient and fair way of distribution of network resources among competing demands becomes a key issue in computer networks [5] and the telecommunication networks design, in general [20]. The generic decision problem we consider may be stated as follows. There is given a set I of m clients (users, services). There is also given a set Q of feasible decisions. For each service j ∈ I a function fj (x) of the decision x is defined. This function, called the individual objective function, measures the outcome (effect) yj = fj (x) of the decision for client j. An outcome can be measured (modeled) as service time, service costs, service delays as well as in a more subjective way as individual utility (or disutility) level. In typical formulations a smaller value of the outcome means a better effect (higher service quality or client satisfaction). Therefore, without loss of generality, we can assume that each individual outcome yi is to be minimized. The Min-Max solution concept depends on optimization of the worst outcome min{ max fj (x) : x ∈ Q } x
j=1,...,m
and it is regarded as maintaining equity. Indeed, for a simplified resource allocation problem min{maxj yj : j yj ≤ b } the Min-Max solution takes the form y¯j = b/m for all j ∈ I thus meeting the perfect equity. In the general case with possibly more complex feasible set structure this property is not fulfilled. Actually, the distribution of outcomes may make the Min-Max criterion partially passive when one specific outcome is relatively large for all the solutions. For instance, while allocating clients to service facilities, such a situation may be caused by existence of an isolated client located at a considerable distance from all facilities. Minimization of the maximum distance is then reduced to that single isolated client leaving other allocation decisions unoptimized. This is a clear case of inefficient solution where one may still improve other outcomes while maintaining fairness (equitability) by leaving at its best possible value the worst outcome. The Min-Max solution may be lexicographically regularized according to the Rawlsian principle of justice [22]. Applying the Rawlsian approach, any two states should be ranked according to the accessibility levels of the least well–off individuals in those states; if the comparison yields a tie, the accessibility levels of the next–least well–off individuals should be considered, and so on. Formalization of this concept leads us to the lexicographic Min-Max optimization. Let a = (a1 , a2 , . . . , am ) denote the vector obtained from a by rearranging its components in the non-increasing order. That means a1 ≥ a2 ≥ . . . ≥ am and there exists a permutation π of set I such that ai = aπ(i) for i ∈ I. Comparing lexicographically such ordered vectors y one gets the so-called lex-max order. The general problem we consider depends on searching for the solutions that are minimal according to the lex-max order: lex min {(θ1 (f (x)), . . . , θm (f (x))) : x ∈ Q} x
where θj (y) = yj
(1)
The lexicographic Min-Max under consideration is related to the problems with outcomes being minimized. Similar consideration of the maximization problems
804
´ W. Ogryczak and T. Sliwi´ nski
leads to the lexicographic Max-Min solution concept. Obviously, all the results presented further for the lexicographic Min-Max can be adjusted to the lexicographic Max-Min while preserving assumption that the outcomes are ordered from the worst one to the best one. The lexicographic Min-Max solution is known in game theory as the nucleolus of a matrix game. It originates from an idea [6] to select from the optimal strategy set those which allow one to exploit mistakes of the opponent optimally. It has been later refined to the formal nucleolus definition [21]. The concept was early considered in the Tschebyscheff approximation [23] as a refinement taking into account the second largest deviation, the third one and further to be hierarchically minimized. Similar refinement of the fuzzy set operations has been recently analyzed [7]. Within the telecommunications or network applications the lexicographic Max-Min approach has appeared already in [2] and now under the name Max-Min Fairness (MMF) is treated as one of the standard fairness concepts [16, 20]. The LMM approach has been used for general linear programming multiple criteria problems [1, 12], as well as for specialized problems related to (multiperiod) resource allocation [9, 11]. Note that the lexicographic minimization in the LMM is not applied to any specific order of the original criteria. Nevertheless, in the case of linear programming (LP) problems (or generally convex optimization), there exists a dominating objective function which is constant (blocked) on the entire optimal set of the Min-Max problem [12]. Hence, having solved the Min-Max problem, one may try to identify the blocked objective and eliminate it to formulate a new restricted Min-Max problem on the former optimal set. Therefore, the LMM solution to LP problems can be found by the sequential Min-Max optimization with elimination of the blocked outcomes. The LMM approach has been considered also for various discrete optimization problems [3, 4, 8] including the location-allocation ones [14]. In discrete models, due to the lack of convexity there may not exist any blocked outcome [13] thus disabling possibility of the sequential Min-Max algorithm. In this paper we analyze capabilities of an effective use of earlier developed ordered cumulated outcomes methodology [17, 18, 19] to solve the LMM problem by sequential optimization of directly defined criteria. We develop and analyze two alternative approaches allowing to form lexicographic sequential procedures for various nonconvex (possibly discrete) LMM problems. Both the approaches are based on criteria directly introduced with some LP expansion of the original model.
2 2.1
Direct Models Ordered Outcomes
The ordered outcomes yk used in definition of the LMM solution concept can be expressed with a direct formula, although requiring the use of integer variables [24]. Namely, for any k = 1, 2, . . . , m the following formula is valid: yk = min {tk : tk − yj ≥ −M zkj , zkj ∈ {0, 1} ∀j, tk ,zkj
m j=1
zkj ≤ k − 1}
On Direct Methods for Lexicographic Min-Max Optimization
805
where M is a sufficiently large constant (larger than the span of individual outcomes yj ) which allows one to enforce inequality tk ≥ yj for zkj = 0 while ignoring it for zkj = 1. Note that for k = 1 all binary variables z1j are forced to 0 thus reducing the optimization in this case to the standard LP model. However, for any other k > 1 all m binary variables zkj are an important part of the model. Nevertheless, with the use of auxiliary integer variables, any LMM problem (either convex or non-convex) can be formulated as the standard lexicographic minimization with directly given objective functions lex min (t1 , t2 , . . . , tm ) s.t. x ∈ Q, x,tk ,zkj
m
zkj ≤ k − 1 ∀ k
j=1
(2)
tk − fj (x) ≥ −M zkj , zkj ∈ {0, 1} ∀ j, k Let us consider cumulated criteria θ¯k (y) = ki=1 yi expressing, respectively: the worst (largest) outcome, the total of the two worst outcomes, the total of the three worst outcomes, etc. When normalized by k the quantities μk (y) = θ¯k (y)/k can be interpreted as the worst conditional means [17]. Within the lexicographic optimization a cumulation of criteria does not affect the optimal solution. Hence, the LMM problem can be formulated as the standard lexicographic minimization with cumulated ordered outcomes as objective functions lex min {(θ¯1 (f (x)), . . . , θ¯m (f (x))) : x ∈ Q} x
k
where θ¯k (y) = i=1 yi . This simplifies dramatically the optimization problem since quantities θ¯k (y) can be optimized without use of any integer variables. First, let us notice that for any given vector y, the cumulated ordered value θ¯k (y) can be found as the optimal value of the following LP problem: θ¯k (y) = max { ukj
m
yj ukj :
j=1
m
ukj = k,
0 ≤ ukj ≤ 1 ∀ j }
(3)
j=1
The above problem is an LP for a given outcome vector y while it becomes nonlinear for y being a vector of variables. This difficulty can be overcome by taking advantage of the LP dual to (3). Introducing dual variable tk corresponding to the equation m j=1 ukj = k and variables dkj corresponding to upper bounds on ukj one gets the following LP dual of problem (3): θ¯k (y) = min {ktk + tk ,dkj
m
dkj : tk + dkj ≥ yj , dkj ≥ 0 ∀ j}
(4)
j=1
Due to the duality theory, for any given vector y the cumulated ordered coefficient θ¯k (y) can be found as the optimal value of the above LP problem. It follows from (4) that θ¯k (f (x)) = min {ktk + m j=1 (fj (x) − tk )+ : x ∈ Q }, where (.)+ denotes the nonnegative part of a number and tk is an auxiliary (unbounded) variable. This is equivalent to the computational formulation of the
806
´ W. Ogryczak and T. Sliwi´ nski
k–centrum model introduced in [19]. Hence, formula (4) provides an alternative proof of that formulation. Following formula (4), the following assertion is valid for any LMM problem. Theorem 1. Every optimal solution to the LMM problem (1) can be found as an optimal solution to a standard lexicographic optimization problem with predefined linear criteria: lex min [t1 + x,tk ,dkj
m j=1
s.t. x ∈ Q,
d1j , 2t2 +
m j=1
d2j , . . . , mtm +
tk + dkj ≥ fj (x), dkj ≥ 0
m j=1
dmj ]
(5)
∀ j, k
This direct lexicographic formulation remains valid for nonconvex (e.g. discrete) feasible sets Q, where the standard sequential approaches [11, 12] are not applicable [14]. Note that model (5) does not use integer variables and it can be considered as an LP expansion of the original Min-Max problem. Thus, this model preserves the problem’s convexity if the original problem is defined with a convex feasible set Q and a linear objective functions fj . The size of the problem is quadratic with respect to the number of outcomes (m2 + m auxiliary variables and m2 constraints). 2.2
Ordered Values
For some specific classes of discrete, or rather combinatorial, optimization problems, one may take advantage of the finiteness of the set of all possible values of functions fj on the finite set of feasible solutions. The ordered outcome vectors may be treated as describing a distribution of outcomes generated by a given decision x. In the case when there exists a finite set of all possible outcomes of the individual objective functions (or the set of outcome values can be restricted to its finite approximation, i.e. with fuzzy modeling), one can directly describe the distribution of outcomes with frequencies of outcomes. Let V = {v1 , v2 , . . . , vr } (where v1 > v2 > . . . > vr ) denote the set of all attainable outcomes (all possible values of the individual objective functions fj for x ∈ Q). We introduce integer functions hk (y) (k = 1, 2, . . . , r) expressing the number of values vk in the outcome vector y. Having defined functions hk we can introduce cumulative ¯ k (y) = k hl (y) where h ¯ r (y) = m for any outcome distribution functions h l=1 ¯ k expresses the number of outcomes larger or equal to vk . Since vector. Function h we want to minimize all the outcomes, we are interested in the minimization of ¯ k for k = 1, 2, . . . , r − 1. Indeed, the LMM solution concept can all functions h be expressed in terms of the standard lexicographic minimization problem with ¯ k (f (x)) [13]. objectives h Theorem 2. In the case of finite outcome set f (Q) = V m , the LMM problem (1) is equivalent to the standard lexicographic optimization problem with r − 1 criteria: ¯ 1 (f (x)), . . . , h ¯ r−1 (f (x))) : x ∈ Q} lex min {(h (6) x
On Direct Methods for Lexicographic Min-Max Optimization
807
¯ k there is no simple analytical formula allowing to Unfortunately, for functions h minimize them without use of some auxiliary integer variables. This difficulty can be overcome by taking advantages of possible weighting and cumulating criteria in lexicographic optimization. Namely, for any positive weights wi the lexicographic optimization ¯ 1 (f (x)), w1 h ¯ 1 (f (x)) + w2 h ¯ 2 (f (x)), . . . , lex min {(w1 h x
r−1
¯ i (f (x))) : x ∈ Q} wl h
i=1
¯ is equivalent to (6). Let us cumulate vector h(y) weights wi = vi − vi+1 to get ˆ k (y) = h
k−1
¯ i (y) = (vi − vi+1 )h
i=1
k−1
(vi − vk )hi (y) =
i=1
m
(yj − vk )+
j=1
ˆ k (f (y)) for k = 2, 3, . . . , r represent the total exceed of outwhere quantities h comes over the corresponding values vk . Due to the use of positive weights wi > 0, the lexicographic problem (6) is equivalent to the lexicographic minimization ˆ 2 (f (x)), . . . , h ˆ r (f (x))) : x ∈ Q} lex min {(h x
Moreover, criteria defined this way are piecewise linear convex functions [13] which allow to compute them directly by the minimization: ˆ k (y) = min { h vk ,hkj
m
hkj : hkj ≥ yj − vk , hkj ≥ 0 ∀ j}
j=1
Therefore, the following assertion is valid for any LMM problem. Theorem 3. In the case of finite outcome set f (Q) = V m , every optimal solution to the LMM problem (1) can be found as an optimal solution to a standard lexicographic optimization problem with predefined linear criteria: m m m lex min [ h2j , h3j , . . . , hrj ] x,vk ,hkj
j=1
s.t. x ∈ Q,
j=1
j=1
hkj ≥ fj (x) − vk , hkj ≥ 0
(7) ∀ j, k
Formulation (7) does not use integer variables and can be considered as an LP expansion of the original problem. Thus, this model preserves the problem’s convexity if the original problem is defined with a convex feasible set Q and objective functions fj . The size of the problem depends on the number of different outcome values. For many models with not too large number of outcome values, the problem can easily be solved directly and even for convex problems such an approach may be more efficient than the sequential algorithms. Note that in many problems of systems optimization the objective functions express the quality of service and one can easily consider a limited finite scale (grid) of the corresponding outcome values (possibly fuzzy values).
808
3
´ W. Ogryczak and T. Sliwi´ nski
Computational Experiments
We have run initial tests to analyze the computational performances of both direct models for the LMM problem. For this purpose we have solved randomly generated (discrete) location problems defined as follows. There is given a set of m clients. There is also given a set of n potential locations for the facilities, in particular, we considered all points representing the clients as valid potential locations (n = m). Further, the number (or the maximal number) p of facilities to be located is given (p ≤ n). The main decisions to be made in the location problem can be described with the binary variables: xi equal to 1 if location i is to be used and equal to 0 otherwise (i = 1, 2, . . . , n). The allocation decisions are modeled with the additional allocation variables: xij equal to 1 if location i is used to service client j and equal to 0 otherwise (i = 1, 2, . . . , n; j = 1, 2, . . . , m). n
n
xi = p,
i=1 xij ≤
i=1
xj ,
xj ,
xij = 1
xij
j = 1, 2, . . . , m
∈ {0, 1}
(8)
i = 1, 2, . . . , n; j = 1, 2, . . . , m
For each client j a function fj (x) of the location pattern x has been defined to measure the outcome (effect) of the location pattern for client j. Individual objective functions fj depend on effects of several allocations decisions. It means they depend on allocation effect coefficients dij > 0 (i = 1, . . . , m; j = 1, . . . , n), called hereafter simply distances as they usually express the distance (or travel time) between location i and client j. For the standard uncapacitated location problem it is assumed that all the potential facilities provide the same type of service and each client is serviced by the nearest located facility. With the explicit use of the allocation variables and the corresponding constraints the individual n objective functions fj can be written in the linear form: fj (x) = i=1 dij xij . There should be found the location pattern x lexicographically minimaximizing the vector of individual objective functions (fj (x))j=1,...,m . For the tests we used two-dimensional discrete location problems. The locations of the clients were defined by coordinates being the multiple of 5 generated as random numbers uniformly distributed in the interval [0, 100]. In the computations we used rectilinear distances. We tested solution times for different size Table 1. Computation times (in seconds) for the ordered outcomes approach
number of clients (m) 2 5 10 15 20 25
number of facilities (p) 1 2 3 5 7 10 15 0.1 0.0 0.0 0.1 0.7 1.5 1.4 0.9 0.4 4.7 15.2 14.6 10.8 6.5 4.4 13.2 54.0 118.6 84.7 60.9 26.9 11.0 36.6 − − − − − 67.4
On Direct Methods for Lexicographic Min-Max Optimization
809
parameters m and p. All the experiments were performed on the PC computer with Pentium 4, 1.7 Ghz processor employing the CPLEX 9.1 package. Tables 1 and 2 present solution times for two approaches being analyzed. The times are the averages of 10 randomly generated problems. The empty cell (minus sign) shows that the timeout of 120 seconds occurred. One can see fast growing times for the ordered outcomes approach with increasing number of clients (criteria). The growth is faster than the corresponding growth of the Table 2. Computation times (in seconds) for the ordered values approach
number of clients (m) 2 5 10 15 20 25
number of facilities (p) 1 0.0 0.1 0.2 0.7 1.7 3.4
2
3
5
7
10
15
0.0 0.1 0.7 2.5 8.8
0.0 0.1 0.3 2.6 2.7
0.0 0.1 1.2 0.8
0.0 0.1 0.3 1.7
0.0 0.1 0.4
0.1 0.2
Table 3. Computation times (in seconds) of algorithms steps (m = 20)
ordered outcomes ordered values step approach approach number p = 1 p = 3 p = 5 p = 1 p = 3 p = 5 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 − 30 31 − 34
0.8 2.0 1.2 1.1 0.8 0.9 0.8 0.7 0.8 0.7 0.9 0.7 0.4 0.3 0.3 0.2 0.2 0.2 0.1 0.1
3.5 5.2 4.4 4.5 4.3 4.7 5.1 5.9 5.9 12.5 7.7 6.0 6.8 7.8 8.1 6.8 6.6 3.7 6.3 2.8
3.2 4.4 4.1 4.4 4.3 4.7 4.9 5.6 5.0 5.5 6.5 6.2 5.7 5.0 4.4 3.3 2.8 1.8 1.5 1.4
0.0 0.1 0.1 0.1 0.0 0.1 0.1 0.0 0.1 0.0 0.1 0.0 0.1 0.0 0.0 0.1 0.0 0.0 0.1 0.0 0.6 0.0
0.0 0.1 0.1 0.2 0.2 0.2 0.4 0.3 0.3 0.3 0.3 0.1 0.1 0.0
0.1 0.0 0.2 0.1 0.3 0.2 0.2 0.1 0.0
810
´ W. Ogryczak and T. Sliwi´ nski
problem sizes. Actually, it turns out that the solution times for the ordered outcomes model (5) are not significantly better (and in some instances even worse) that those for model (2), despite the latter uses auxiliary integer variables. On the other hand, the ordered values approach performs very well particularly with the number of the clients increasing. In fact, in this approach the number of steps depends not on the number of clients but on the number of different values of distances and this is constant. Moreover, the ordered values approach requires less steps for bigger number of facilities. This is due to the fact that the largest distance in the experiments does not exceed 200/p. Table 3 shows how introduction of the auxiliary constraints affects the performance in consecutive steps of the algorithms. One can notice that the ordered values technique generates MIP problems that are solved below 0.3 sec. (below 0.1 for p = 1) while the ordered outcomes problems require much longer computations. Despite a similar structure of auxiliary constraints in both approaches, the ordered values problems are much easier to solve. This property additionally contributes to the overall outperformance of the former and supports the attractiveness of the ordered values algorithm even for problems with the large number of different outcome values.
4
Concluding Remarks
The point-wise ordering of outcomes causes that the Lexicographic Min-Max optimization problem is, in general, hard to implement. We have analyzed optimization models allowing to form lexicographic sequential procedures for various nonconvex (possibly discrete) LMM optimization problems. Two approaches based on some LP expansion of the original model remain relatively simple for implementation independently of the problem structure. However, the ordered outcomes model performs similarly to the classical model with integer variables used to implement ordering and it is clearly outperfomed by the ordered values approach. Further work on specialized algorithms (including heuristics) for the ordered values approach to various classes of discrete optimization problems seems to be a very promising research direction.
References 1. Behringer, F.A.: A simplex based algorithm for the lexicographically extended linear maxmin problem. Eur. J. Opnl. Res. 7 (1981) 274–283. 2. Bertsekas, D., Gallager, R.: Data Networks. Prentice-Hall, Englewood Cliffs (1987). 3. Burkard, R.E., Rendl, F.: Lexicographic bottleneck problems. Oper. Res. Let. 10 (1991) 303–308. 4. Della Croce, F., Paschos, V.T., Tsoukias, A.: An improved general procedure for lexicographic bottleneck problem. Oper. Res. Let. 24 (1999) 187–194. 5. Denda, R., Banchs, A., Effelsberg, W.: The fairness challenge in computer networks. Lect. Notes Comp. Sci. 1922 (2000) 208–220. 6. Dresher M.: Games of Strategy. Prentice-Hall, Englewood Cliffs (1961).
On Direct Methods for Lexicographic Min-Max Optimization
811
7. Dubois, D., Fortemps, Ph., Pirlot, M., Prade, H.: Leximin optimality and fuzzy set-theoretic operations. Eur. J. Opnl. Res. 130 (2001) 20–28. 8. Ehrgott, M.: Discrete decision problems, multiple criteria optimization classes and lexicographic max-ordering. Trends in Multicriteria Decision Making, T.J. Stewart, R.C. van den Honert (eds.), Springer, Berlin (1998), 31–44. 9. Klein, R.S., Luss, H., Smith, D.R.: A lexicographic minimax algorithm for multiperiod resource allocation. Math. Progr. 55 (1992) 213–234. 10. Kostreva M.M., Ogryczak W. (1999) Linear optimization with multiple equitable criteria. RAIRO Oper. Res., 33, 275–297. 11. Luss, H.: On equitable resource allocation problems: A lexicographic minimax approach. Oper. Res. 47 (1999) 361–378. 12. Marchi, E., Oviedo, J.A.: Lexicographic optimality in the multiple objective linear programming: the nucleolar solution. Eur. J. Opnl. Res. 57 (1992) 355–359. 13. Ogryczak, W.: Linear and Discrete Optimization with Multiple Criteria: Preference Models and Applications to Decision Support (in Polish). Warsaw Univ. Press, Warsaw (1997). 14. Ogryczak, W.: On the lexicographic minimax approach to location problems. Eur. J. Opnl. Res. 100 (1997) 566–585. 15. Ogryczak, W.: Multiple criteria optimization and decisions under risk. Control & Cyber. 31 (2002) 975–1003. 16. Ogryczak, W., Pi´ oro, M., Tomaszewski, A.: Telecommunication network design and max-min optimization problem. J. Telecom. Info. Tech. 3/05 (2005) 43–56. ´ 17. Ogryczak, W., Sliwi´ nski, T.: On equitable approaches to resource allocation problems: the conditional minimax solution, J. Telecom. Info. Tech. 3/02 (2002) 40–48. ´ 18. Ogryczak W., Sliwi´ nski, T. (2003) On solving linear programs with the ordered weighted averaging objective. Eur. J. Opnl. Res., 148, 80–91. 19. Ogryczak, W., Tamir, A.: Minimizing the sum of the k largest functions in linear time. Inform. Proc. Let. 85 (2003) 117–122. 20. Pi´ oro, M., Medhi, D.: Routing, Flow and Capacity Design in Communication and Computer Networks. Morgan Kaufmann, San Francisco (2004). 21. Potters, J.A.M., Tijs, S.H.: The nucleolus of a matrix game and other nucleoli. Math. of Oper. Res. 17 (1992) 164–174. 22. Rawls, J.: The Theory of Justice. Harvard University Press, Cambridge (1971) . 23. Rice, J.R.: Tschebyscheff approximation in a compact metric space. Bull. Amer. Math. Soc. 68 (1962) 405–410. 24. Yager, R.R.: On the analytic representation of the Leximin ordering and its application to flexible constraint propagation. Eur. J. Opnl. Res. 102 (1997) 176–192.
Multivariate Convex Approximation and Least-Norm Convex Data-Smoothing Alex Y.D. Siem1 , Dick den Hertog1 , and Aswin L. Hoffmann2 1
2
Department of Econometrics and Operations Research/Center for Economic Research (CentER), Tilburg University, P.O. Box 90153, 5000 LE Tilburg, The Netherlands {a.y.d.siem, d.denhertog}@uvt.nl Department of Radiation Oncology, Radboud University Nijmegen Medical Centre, Geert Grooteplein 32, 6525 GA Nijmegen, The Netherlands [email protected] Abstract. The main contents of this paper is two-fold. First, we present a method to approximate multivariate convex functions by piecewise linear upper and lower bounds. We consider a method that is based on function evaluations only. However, to use this method, the data have to be convex. Unfortunately, even if the underlying function is convex, this is not always the case due to (numerical) errors. Therefore, secondly, we present a multivariate data-smoothing method that smooths nonconvex data. We consider both the case that we have only function evaluations and the case that we also have derivative information. Furthermore, we show that our methods are polynomial time methods. We illustrate this methodology by applying it to some examples.
1
Introduction
In the field of discrete approximation, we are interested in approximating a function y : Rq → R, given a discrete dataset {(xi , y i ) : 1 ≤ i ≤ n}, where xi ∈ Rq and y i = y(xi ) ∈ R, and n is the number of data points. It may happen that we know beforehand that the function y(x) is convex. However, many approximation methods do not make use of the information that y(x) is convex and construct approximations that do not preserve the convexity. For the univariate case there is some literature on convexity preserving functions; see e.g. [1] and [2]. In [1], Splines are used, and in [2], polynomial approximation is considered. For the multivariate case, in [3], convex quadratic polynomials are used to approximate convex functions. Furthermore, there is a lot of literature on so-called Sandwich algorithms; see e.g. [4], [5], [6], [7], and [8]. In these papers, upper and lower bounds for the function y(x) are constructed, based on the discrete dataset, and based on the knowledge that y(x) is convex. A problem that may occur in practice is that one may have a dataset that is subject to noise, i.e., instead of the data y i we have y˜i = y(xi ) + εiy , where εiy is (numerical) noise. There may also be noise in the input data, i.e., x˜i = xi + εix , and if derivative information is available, it could also be subject to noise, i.e., i = ∇y i + εig , where ∇y i = ∇y(xi ). Note that we assume y(x) to be convex. ∇y However, due to the noise, the perturbed data might loose the convexity of M. Gavrilova et al. (Eds.): ICCSA 2006, LNCS 3982, pp. 812–821, 2006. c Springer-Verlag Berlin Heidelberg 2006
Multivariate Convex Approximation and Least-Norm Convex Data-Smoothing
813
y(x), i.e., the noise could be such that it is not possible to fit a convex function through the perturbed data. Therefore, we are interested in data-smoothing, i.e., in shifting the data points, such that they obtain convexity, and such that the amount of movement of the data is minimized. This problem has already been tackled in literature for the univariate case; see e.g. [9], [10], and [11]. Also in isotonic regression, this problem is dealt with for the univariate case; see [12]. In this paper, we will consider two problems. First, we consider how to construct piecewise linear upper and lower bounds to approximate the output for the multivariate case. This extends the method in [7] to the multivariate case. If derivative information is available it is easy to construct upper and lower bounds. However, derivative information is not always available, e.g., in the case of black-box functions. In this paper, it turns out that these upper and lower bounds can be found by solving linear programs (LPs). Second, we will consider the multivariate data-smoothing problem. We consider both the case that we have only function evaluations and the case that we also have derivative information. We will show that, if we only consider errors in the output data, the first problem can be solved by using techniques, which are from linear robust optimization; see [13]. It turns out that this problem can be tackled by solving an LP. If we also have derivative information, we can also consider errors in the gradients and in the input variables. We then obtain a nonlinear optimization problem. However, if we assume that there are only errors in the gradients and in the output data, we obtain an LP. Also, if we assume that there are only errors in the input data and in the output data, we also obtain an LP. The remainder of this paper is organized as follows. In Sect. 2, we consider the problem of constructing upper and lower bounds. In Sect. 3, we consider multivariate data-smoothing, and in Sect. 4, we give some examples of the application of the data-smoothing techniques, considered in Sect. 3. Finally, in Sect. 5, we present possible directions for further research.
2
Bounds Preserving Convexity
In this section we assume that y(x) is convex and that the data (xi , y(xi )) for i = 1, . . . , n are convex as well, i.e., there are no (numerical) errors, and there exists a convex function that fits through the data points. 2.1
Upper Bounds
We are interested in finding the smallest upper bound n for y(x), given nconvexity, and the data (xi , y(xi )), for i = 1, . . . , n. Let x = i=1 αi xi , where i=1 αi = 1, and 0 ≤ αi ≤ 1, i.e., x is a convex combination of the input data xi . Then, it is well-known that convexity gives us the following inequality: n n i y(x) = y αi x ≤ αi y(xi ) . n
i=1 i
i=1
This means that i=1 αi y(x ) is an upper bound for y(x). To find the smallest upper bound we should therefore solve
814
A.Y.D. Siem, D. den Hertog, and A.L. Hoffmann
u(x) :=
min
α1 ,...,αn
n
αi y(xi )
i=1
s.t. x =
n
αi xi
i=1
(1)
0 ≤ αi ≤ 1 n αi = 1, i=1
where we put the decision variables underneath ’min’. 2.2
Lower Bounds
If we have derivative information, it is easy to construct a lower bound. It is well-known that if y(x) is convex, we have that y(x) ≥ y(xi ) + ∇y(xi )T (x − xi ), ∀x ∈ Rq , ∀i = 1, . . . , n . i Therefore, (x) = max y(x ) + ∇y(xi )T (x − xi ) is a lower bound. i=1,...,n
If we do not have derivative information, we have to do something else. We are interested in finding the largest lower bound for y(x), given convexity k i k and the data (xi , y(xi )), for i = 1, . . . , n. Let xk = i =k αi x + α x, where k k k k i =k αi + α = 1, with 0 ≤ αi ≤ 1, and 0 < α ≤ 1, for all k = 1, . . . , n, i.e., xk is a convex combination of xi , i = k, and x. Then the following holds for all k ∈ {1, . . . , n}: ⎛ ⎞ y(xk ) = y ⎝ αki xi + αk x⎠ ≤ αki y(xi ) + αk y(x) . (2) i =k
i =k
Without loss of generality we may assume that αk > 0. Then we can rewrite (2) as k i y(xk ) − i =k αi y(x ) y(x) ≥ , for k = 1, . . . , n . αk This inequality gives us a lower bound for y(x). To obtain the largest lower bound we should solve the following problem: ⎧ ⎫ i y(xk )− i=k αk i y(x ) ⎪ ⎪ ⎪ max ⎪ ⎪ ⎪ αk ⎪ ⎪ αk ,αk ⎪ ⎪ i ⎪ ⎪ ⎪ ⎪ k i k k ⎪ ⎪ ⎪ ⎪ s.t. x = α x + α x i ⎪ ⎪ ⎨ ⎬ i = k max . (3) k=1,...,n ⎪ ⎪ αki + αk = 1 ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ i =k ⎪ ⎪ ⎪ ⎪ k ⎪ ⎪ ⎪ ⎪ 0 ≤ α ≤ 1 i ⎪ ⎪ ⎩ ⎭ k 0<α ≤1
This comes down to solving n nonlinear optimization problems, and taking the value of the largest solution. Note that the nonlinear optimization problems have
Multivariate Convex Approximation and Least-Norm Convex Data-Smoothing
815
linear constraints and a fractional objective with linear numerator and denominator. These kinds of optimization problems can be rewritten into an LP; see [14]. This can be done as follows. Define tk := 1/αk . We can now rewrite the inner optimization problem in (3) as k k i max tk y(xk ) − i =k αi t y(x ) k k k α ,αi ,t s.t. xk tk = αki tk xi + αk tk x
i =k
αki tk + αk tk = tk
i =k
αki tk ≥ 0 αk tk = 1, where we multiplied all constraints by tk . Now we define zik := αki tk and z k := αk tk . We then get k i max tk y(xk ) − i =k zi y(x ) z k ,zik ,tk s.t. xk tk = zik xi + z k x
i =k
zik + z k = tk
(4)
i =k
zik ≥ 0 zk = 1 . Note that (4) is an LP. Therefore, for the lower bound (x) we obtain the following: ⎧ k i ⎫ max tk y(xk ) − i ⎪ =k zi y(x ) ⎪ ⎪ k ⎪ k k z ,zi ,t ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ k i k k k ⎪ ⎪ s.t. x t = z x + z x ⎪ ⎪ i ⎪ ⎪ ⎨ ⎬ i = k (x) := max . (5) k k k zi + z = t ⎪ k=1,...,n ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ i =k ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ k ⎪ ⎪ z ≥ 0 ⎪ ⎪ i ⎩ ⎭ k z = 1. Note that the number of constraints in (5) is q + 1. The number of variables in (5) is also q + 1. Therefore it takes polynomial time to find the lower bound.
3
Convex Data-Smoothing
If the dataset is not convex, we first have to smooth the data such that it becomes convex. We distinguish between the case that we only have function evaluations and the case that we also have derivative information. 3.1
Function Value Information
We only consider movement of the output data y˜i . So, we want to minimally shift the perturbed output data y˜i such that they become convex. In the following
816
A.Y.D. Siem, D. den Hertog, and A.L. Hoffmann
optimization problem, we minimize the upward shifts (δy+ )i and the downward shifts (δy− )i such that the new shifted output data points ysi become convex: min
n + i (δy ) + (δy− )i
δy+ ,δy− ,ys i=1 s.t. ysi = ysi ≤
y ˜i + (δy+ )i − (δy− )i ∀i = 1, . . . , n λik ysk (∀λik ∈ [0, 1]|xi = λik xk , λik = 1), k =i
k =i
δy+ ∈ Rn+ , δy− ∈ Rn+ .
(6)
k =i
∀i = 1, . . . , n
We minimize the 1 -norm. It is easy to see that in the optimum either (δy+ )i = 0 or (δy− )i = 0. The second constraint forces the shifted output data points ysi to become convex. Note that (6) is an LP with infinitely many constraints, i.e., it is a semi-infinite LP, which can also be seen as a robust linear programming problem. We can solve this problem from [13]. Since the ”uncer with methods i k i i i tainty region”: {∀λk ∈ [0, 1]|x = λk x , λk = 1} of the second constraint k =i
k =i
in (6) is a polytope, we can rewrite this semi-infinite programming constraint as a collection of linear constraints without an uncertainty region. We follow the reasoning of the proof of Theorem 1 in [13] to show this. Let us consider the second constraint for a certain value of i. We can write this constraint as λik ysk − ysi ≥ 0 ∀(λik ∈ [0, 1]|xi = λik xk , λik = 1) . (7) k =i
k =i
k =i
Note that this constraint is satisfied if and only if the solution of the minimization problem min λik ysk − ysi i λk
k =i
s.t. xi =
k =i λik =
k =i λik ≥
λik xk
(8)
1
0
is nonnegative. The dual of this LP is given by: max (xi )T ri + v i − ysi r i ,v i
s.t. (xk )T ri + v i ≤ ysk ∀k = i ri ∈ Rq , v i ∈ R,
(9)
where en−1 is the (n − 1)-dimensional all-one vector. Since (9) is the dual of (8), both LP’s have the same optimal solution. Note that the optimal value of (9) is nonnegative if and only if there exists a feasible solution for (9) such that the objective function of (9) is nonnegative. We can now conclude that (7) is satisfied if and only if there exist ri and v i , which are feasible for (9) and have a nonnegative objective, i.e. if the following inequalities are satisfied:
Multivariate Convex Approximation and Least-Norm Convex Data-Smoothing
⎧ i T i ⎨ (x ) r + v i ≥ ysi (xk )T ri + v i ≤ ysk ∀k = i ⎩ i r ∈ Rq , v i ∈ R .
817
(10)
We can now finally rewrite the second constraint in (6) as (10) for every i = 1, . . . , n. This means that we can rewrite (6) as min −
δy+ ,δy ,ys ,r i ,v i
n + i (δy ) + (δy− )i i=1
s.t. ysi = y˜i + (δy+ )i − (δy− )i (xi )T ri + v i ≥ ysi (xk )T ri + v i ≤ ysk δy+ ∈ Rn+ , δy− ∈ Rn+ , r i ∈ Rq , v i ∈ R
∀i = 1, . . . , n ∀i = 1, . . . , n ∀k = i, ∀i = 1, . . . , n
(11)
∀i = 1, . . . , n,
which is an LP. Note that, after substituting the equality constraints for ysi , the number of constraints in (11) is n(n − 1) + n = n2 . The number of variables in (11) is (q + 3)n. Above, we minimized the sum of the absolute values of the shifts, i.e. the 1 -norm. However, we can also choose to minimize other norms, such as e.g., the ∞ -norm or the 2 -norm. Using the ∞ -norm, we also obtain an LP, which is similar to (11). 3.2
Derivative Information
Next, we consider the case in which we also have gradient information. Suppose that the underlying function is convex, but the data are not convex, due to (numerical) errors. Again, we are interested in shifting the data such that they become convex. We consider perturbed output values y˜i , perturbed gradients i ∇y(x ), and perturbed input values x˜i . Therefore in this case we want to minimize the shifts in the output values, in the gradients, and in the inputs. So, in the following optimization problem, we minimize the sum of upward and downward shifts (δy+ )i and (δy− )i of the output values, the upward and downward shifts (δg+ )i and (δg− )i of the gradient, and the upward and downward shifts (δx+ )i and (δx− )i of the input values such that the data become convex: min − i
n + i (δy ) + (δy− )i + eTq (δg+ )i + eTq (δg− )i + eTq (δx+ )i + eTq (δx− )i
(δy+ )i ,(δy ) ,(δg+ )i , i=1 + i − i (δg− )i ,(δx ) ,(δx ) , xis ,ysi ,(∇y i )s
s.t.
i + (δ + )i − (δ − )i (∇y i )s = ∇y g g i i xs = x˜ + (δx+ )i − (δx− )i ysi = y˜i + (δy+ )i − (δy− )i (∇y i )Ts (xjs − xis ) + ysi ≤ ysj (δy+ )i ∈ R+ , (δy− )i ∈ R+ , (δg+ )i ∈ Rq+ (δg− )i ∈ Rq+ , (δx+ )i ∈ Rq+ , (δx− )i ∈ Rq+
∀i = 1, . . . , n ∀i = 1, . . . , n ∀i = 1, . . . , n ∀i, j = 1, . . . , n, i = j ∀i = 1, . . . , n ∀i = 1, . . . , n, (12)
818
A.Y.D. Siem, D. den Hertog, and A.L. Hoffmann
where ∇y i = ∇y(xi ), and eq is the q-dimensional all-one vector. The 4-th constraint in (12) is a necessary and sufficient condition for convexity of the data; see page 338 in [15]. However, (12) is a nonconvex optimization problem, and therefore not tractable. However, if there is no uncertainty in the input values x1 , . . . , xn , we can omit the variables (δx+ )i and (δx− )i in (12), and we obtain an LP. Similarly, if there is no uncertainty in the values of the gradients, we can omit (δg+ )i and (δg− )i in (12), and we also obtain an LP. An example of a problem, where the gradient information is exact, and we only have errors in the input variables and output variables is in the field of multiobjective optimization. In the so-called weighted sum method, to determine a point on the Pareto curve/surface the weights determine the exact value of the gradient, whereas due to numerical errors of the solver, the input value and the output value might be subject to noise. Note that in the formulation of (12) we have minimized the shifts (δy+ )i , (δy− )i , + i (δg ) , (δg− )i , (δx+ )i , (δx− )i , and have given them all equal importance. However, we might want to give one type of the error more weight than the other type.
4
Numerical Examples
In this section we will consider some examples of the theory discussed in Sect. 3. Example 1 (artificial, no derivative information). In this example we apply the theory that we developed in Sect. 3.1. We consider the function y : R2 → R, given by y(x1 , x2 ) = x21 +x22 . We take a sample of 10 input data points x1 , . . . , x10 from [−2, 2] × [−2, 2], and calculate the output values y(x1 ), . . . , y(x10 ). Furthermore, we add some noise to it, i.e., we add a noise εiy , where the εiy ’s are independent and uniformly distributed on [−2.5, 2.5], such that the data become nonconvex. We obtain values y˜i = y i + εiy . The values are given in Table 1. We solved (11) for this problem, and the shifted data ysi are also given in Table 1. The values that are really shifted, are shown in italics.
Table 1. Data and results of smoothing in Example 1 number 1 2 3 4 5 6 7 8 9 10
x1 -0.0199 0.0925 1.4427 -1.8056 -0.4435 -1.2952 1.7826 0.8074 -0.8714 0.5779
x2 -1.9768 1.3411 0.3253 -1.1961 -0.3444 0.8811 1.6795 -1.3585 0.5089 -0.7205
y 3.9081 1.8071 2.1872 4.6908 0.3153 2.4539 5.9984 2.4974 1.0183 0.8531
y˜ 6.1588 0.4628 2.7214 4.6208 2.2718 3.7644 5.7807 0.0899 2.6254 0.5766
ys 6.1588 0.4628 2.7214 4.6208 2.0578 3.7644 5.7807 0.4842 2.6254 0.5766
Multivariate Convex Approximation and Least-Norm Convex Data-Smoothing
819
Example 2 (radio therapy, no derivative information). In radiotherapy the main goal is to give the tumour enough radiation dose, such that the surrounding organs do not receive too much radiation dose. This problem can be formulated mathematically by a multiobjective optimization problem. With the tumour and each healthy surrounding organ, an objective function is associated. One of the problems is that the calculation of a point on the Pareto surface can be very timeconsuming. Therefore, we are interested in approximating the Pareto surface; see e.g. [16]. Under certain conditions, we may assume that this Pareto surface is convex. However, due to numerical errors, the Pareto points may not be convex. Therefore we should first smooth them to make them convex. We have data from a patient of the Radboud University Nijmegen Medical Centre, in Nijmegen, the Netherlands. This data is from a multiobjective optimization problem with 3 objectives, and has 69 data points. The data are shown in Fig. 1. The Pareto surface is a convex and decreasing function. However, it turned out that the data is not convex. By solving (11), the data is smoothed such that the data becomes convex. The smoothed data points are also shown in Fig. 1.
perturbed smoothed 14
12
y
10
8
6
4
2 2 4 6 8 10 12 x1
10
15
25
20
30
35
40
x2
Fig. 1. The the perturbed data and the smoothed data of Example 2
5
Further Research
As interesting topics for further research we mention several possible applications of the methods developed in this paper. Possible applications for the construction of the bounds in Sect. 2 are: – Extend the Sandwich algorithms as exist for the approximation of univariate convex functions to the multivariate case by using the lower and upper bounds. More specifically, this may be useful for approximating convex Pareto surfaces and black-box functions (e.g. deterministic computer simulation).
820
A.Y.D. Siem, D. den Hertog, and A.L. Hoffmann
– Use the lower bounds in convex optimization. For each new candidate proposed by the nonlinear programming solver, we can calculate the lower bound, and if this lower bound is larger than the best known objective value up to now, we reject the candidate before evaluating its function value. This may reduce computation time, especially when the function evaluation is time-consuming. In [17] promising results are shown for the univariate case. Possible applications for the data-smoothing methods of Sect. 3 are: – Apply data-smoothing before applying Sandwich type algorithms. This may be necessary because of (numerical) noise. This noise occurs e.g., when we want to estimate a Pareto surface in the field of multiobjective optimization. – Apply data-smoothing to multivariate isotonic regression problems. – Apply the data-smoothing techniques to sampling methods to assess the convexity/concavity of multivariate nonlinear functions; see [18].
References 1. Kuijt, F.: Convexity preserving interpolation – Stationary nonlinear subdivision and splines. PhD thesis, University of Twente, Enschede, The Netherlands (1998) 2. Siem, A.Y.D., de Klerk, E., den Hertog, D.: Discrete least-norm approximation by nonnegative (trigonometric) polynomials and rational functions. CentER Discussion Paper 2005-73, Tilburg University, Tilburg (2005) 3. den Hertog, D., de Klerk, E., Roos, K.: On convex quadratic approximation. Statistica Neerlandica 563 (2002) 376–385 4. Burkard, R.E., Hamacher, H.W., Rote, G.: Sandwich approximation of univariate convex functions with an application to separable convex programming. Naval Research Logistics 38 (1991) 911–924 5. Fruhwirth, B., Burkard, R.E., Rote, G.: Approximation of convex curves with application to the bi-criteria minimum cost flow problem. European Journal of Operational Research 42 (1989) 326–338 6. Rote, G.: The convergence rate of the Sandwich algorithm for approximating convex functions. Computing 48 (1992) 337–361 7. Siem, A.Y.D., den Hertog, D., Hoffmann, A.L.: A method for approximating univariate convex functions using only function evaluations. Working paper, Tilburg University, Tilburg (2005) 8. Yang, X.Q., Goh, C.J.: A method for convex curve approximation. European Journal of Operational Research 97 (1997) 205–212 9. Cullinan, M.P.: Data smoothing using non-negative divided differences and 2 approximation. IMA Journal of Numerical Analysis 10 (1990) 583–608 10. Demetriou, I.C., Powell, M.J.D.: Least squares smoothing of univariate data to achieve piecewise monotonicity. IMA Journal of Numerical Analysis 11 (1991) 411–432 11. Demetriou, I.C., Powell, M.J.D.: The minimum sum of squares change to univariate data that gives convexity. IMA Journal of Numerical Analysis 11 (1991) 433–448 12. Barlow, R.E., Bartholomew, R.J., Bremner, J.M., Brunk, H.D.: Statistical inference under order restrictions. Wiley, Chichester (1972) 13. Ben-Tal, A., Nemirovski, A.: Robust optimization - methodology and applications. Mathematical Programming, Series B 92 (2002) 453–480
Multivariate Convex Approximation and Least-Norm Convex Data-Smoothing
821
14. Charnes, A., Cooper, W.W.: Programming with linear fractional functional. Naval Research Logistics Quarterly 9 (1962) 181–186 15. Boyd, S., Vandenberghe, L.: Convex optimization. Cambridge University Press, Cambridge (2004) 16. Hoffmann, A.L., Siem, A.Y.D., den Hertog, D., Kaanders, J.H.A.M., Huizenga, H.: Dynamic generation and interpolation of pareto optimal IMRT treatment plans for convex objective functions. Working paper, Radboud University Nijmegen Medical Centre, Nijmegen (2005) 17. den Boef, E., den Hertog, D.: Efficient line searching for convex functions. CentER Discussion Paper 2004-52, Tilburg University, Tilburg (2004) 18. Chinneck, J.W.: Discovering the characteristics of mathematical programs via sampling. Optimization Methods and Software 17(2) (2002) 319–352
Linear Convergence of Tatˆ onnement in a Bertrand Oligopoly Guillermo Gallego1 , Woonghee Tim Huh1 , Wanmo Kang1, , and Robert Phillips2 1
2
Columbia University, New York, USA Stanford University and Nomis Solutions, USA
Abstract. We show the linear convergence of the tatˆ onnement scheme in a Bertrand oligopoly price competition game using a possibly asymmetric attraction demand model with convex costs. To demonstrate this, we also show the existence of the equilibrium.
1
Introduction and Model
In the Bertrand oligopoly price competition model for differentiated products, a variety of demand models and cost models have been used. The choice of these models affects the profit of each firm. We let n be the number of firms, which are indexed by i = 1, . . . , n. The demand for each firm is specified as a function of prices. Let pi denote the price of firm i, and define the price vector of competing firms by p−i , which is a shorthand for (p1 , . . . , pi−1 , pi+1 , . . . , pn ). Also denote the vector of all prices by p = (p1 , . . . , pn ) = (pi , p−i ). The demand for each firm i is given by di = di (p). We assume that firm i’s demand is strictly decreasing in its price (i.e., ∂di /∂pi < 0), and that products are gross substitutes (i.e., ∂di /∂pj ≥ 0 whenever j = i). In this paper we consider a generalization of the logit demand model called the attraction demand model: di (p) :=
ai (pi ) a j j (pj ) + κ
(1)
where κ is either 0 or strictly positive. As discussed in [2] and [14], the attraction model has successfully been used in estimating demand in econometric studies, and is increasingly accepted in marketing, e.g., [5]. For its applications in the operations management community, see [21], [4] and the references therein. Without any loss of generality, we normalize the total demand of the market to 1. The attraction function ai (·) of firm i is a positive and strictly decreasing function of its price. For the cost model, we assume that cost is not a function of price, but of demand alone. We denote firm i’s cost function by Ci (di ) defined on [0, 1] and
For the full version of this paper, see [11]. Corresponding author.
M. Gavrilova et al. (Eds.): ICCSA 2006, LNCS 3982, pp. 822–831, 2006. c Springer-Verlag Berlin Heidelberg 2006
Linear Convergence of Tatˆ onnement in a Bertrand Oligopoly
823
assume Ci is increasing and convex. The profit of firm i is the difference between its revenue and cost, given by πi := πi (p) := pi · di (p) − Ci (di (p)).
(2)
Each firm’s objective is to maximize πi . We impose technical conditions on the attraction demand and cost models as outlined in Section 2. The study of oligopolistic interaction is a classical problem in economics. In the model proposed by Cournot, firms compete on production output quantities, which in turn determine the market price. In Bertrand’s model, however, competition is based on prices instead of production quantities. In the price competition models by Edgeworth, each firm decides how much of its demand is satisfied, in which case an equilibrium solution may or may not exist. Price competition with product differentiation has also been studied by [12], [20] and [8]. An extensive treatment of the subject is found in [24]. We provide a brief summary of results regarding the existence, and stability of equilibrium, followed by their application in the operations management literature. Existence. There are two common methods to show existence of an equilibrium in price competition games. The first method is to obtain existence through the quasi-concavity of πi in pi . See [7]. The second method shows existence through supermodular games. See [22] for the existence of an equilibrium in supermodular games, and [16], for monotone transformation of supermodular games. Thus, if the price competition game is supermodular, it has at least one equilibrium. Similarly, [17] shows the existence of a Nash equilibrium for a generalization of supermodular games, called games with strategic complementarities. Such games include instances of price competition. These however are not applicable to our model. Stability. By definition, a set of actions at equilibrium is a fixed point of the best response mapping. A simultaneous discrete tatˆ onnement is a sequence of actions in which the current action of each firm is the best response to the previous actions of other firms. An equilibrium is globally stable if the tatˆ onnement converges to this equilibrium regardless of the initial set of actions. [23] shows that if a supermodular game with continuous payoffs has a unique equilibrium, it is globally stable. To our knowledge, there is no known result regarding the provable convergence rate of the tatˆ onnement in the price competition game. Operations Management Literature. There has been a growing interest in oligopolistic price competition in the operations literature. To predict and study market outcomes, the existence and the uniqueness of equilibrium are often required. Stability and convergence rate indicate both the robustness of equilibrium and the efficiency of computational algorithms. For example, see [4], [3], [1], and [6].
2
Assumptions
This section lists our assumptions on the attraction function ai (·) in (1), and the profit function πi and the cost function Ci (·) in (2).
824
G. Gallego et al.
We let ρi := inf{p : ai (p) = 0} be the upper bound on price pi . Firm i’s action space for price is an open interval (0, ρi ). Let P := (0, ρ1 ) × · · · × (0, ρn ). Let ηi (p) := −p·ai (p)/ai (p) be the elasticity of firm i’s attraction function. We adopt the following simplifying notation: f (x+) := limh↓x f (h), f (x−) := limh↑x f (h), y inf ∅ = ∞, and y+k = 1 if y = ∞ and k is finite. Condition A: (A1) ai (·) is positive, strictly decreasing and continuously differentiable, i.e., ai (p) > 0 and ai (p) < 0 for all p ∈ (0, ρi ). (A2) The elasticity ηi (·) is nondecreasing. (A3) If ai (0+) < ∞, then ai (0+) > −∞. Condition B: (B1) Ci (·) is strictly increasing, continuously differentiable, and convex on [0, 1], i.e., ci (·) := Ci (·) is positive and increasing, and satisfies ci (0+) > 0. Condition C: (C1) ci (0) < ρi · (1 − 1/ηi (ρi )), i.e., the Lerner index [pi − ci (di )]/pi at price pi = ρi and demand di = 0 is strictly bounded below by 1/ηi (ρi ). (C2) If κ = 0, then ci (1) < ρi . (C3) If κ = 0, the following technical condition holds: n 1 1− >1. ηi (ρi ) · (1 − ci (1)/ρi ) i=1 Examples of Ci (·) include the linear function and exponential function. More examples are provided in Section 3. We remark that attraction functions do not need to be identical. Furthermore, even the form of the attraction function may not be same among firms. Analogously, the cost functions need not have the same form. For the rest of this paper, we assume Conditions A, B and C hold. In Section 5, we introduce an additional assumption that both Ci (·) and ai (·) are twice continuously differentiable.
3
Examples
In this section, we list price competition models in which the convex cost model is applicable. We present some examples from an inventory-capacity systems and a service system based on queues. Inventory-Capacity System: In the first example, consider the pricing problem in the stochastic inventory system with exogenously determined stocking levels. We denote stochastic demand of firm i by Di (p), and its expected demand by di (p). Demand is a function of the price vector p = (p1 , . . . , pn ). We represent firm i’s stochastic demand by Di (p1 , . . . , pn ) = ϕ(di (p), εi ), where εi is a random variable. (We can allow ϕ to be dependent on i). We suppose the continuous density function fi (·) for εi exists, and let Fi (·) denote its cumulative density function. Let yi be the exogenously fixed stocking level of firm i. For the first yi units, the per-unit materials cost is wi . If realized demand is at most yi , the per-unit
Linear Convergence of Tatˆ onnement in a Bertrand Oligopoly
825
salvage value of wi − hi > 0 is obtained. Otherwise, the excess demand is met through an emergency supply at the cost of wi + bi per unit, where bi ≥ 0. The profit of firm i is the difference between its revenue and costs, and the expected profit is πi (p|yi ) = pi · di (p) − Ci (di (p), yi ), where Ci (di , yi ) = wi di + hi E[yi − ϕ(di , εi )]+ + bi E[ϕ(di , εi ) − yi ]+ , and hi and bi are the per-unit inventory overage and underage costs, respectively. Our goal is to show that for fixed yi , this function satisfies condition (B1). We achieve this goal with two common demand uncertainty models. – Additive Demand Uncertainty Model: ϕ(di , εi ) = di + εi where E[εi ] = 0. Then, ∂Ci (di , yi ) = wi − hi P [yi ≥ di + εi ] + bi P [yi ≤ di + εi ] ∂di = wi − hi Fi (yi − di ) + bi (1 − Fi (yi − di )). – Multiplicative Demand Uncertainty Model: ϕ(di , εi ) = di · εi where εi is positive and E[εi ] = 1. Then, yi /di ∞ ∂Ci (di , yi ) = wi − hi εdFi (ε) + bi εdFi (ε) ∂di 0 yi /di ∞ = wi − hi + (hi + bi ) εdFi (ε). yi /di
In both cases, ∂Ci (di , yi )/∂di is positive since wi > hi and nondecreasing in di . We conclude that for fixed yi , Ci (di , yi ) is strictly increasing, twice continuously differentiable, and convex in di . Furthermore, ∂Ci (di , yi )/∂di > 0 at di = 0. Service System: In the second example, we model each firm as a single server queue with finite buffer, where the firms’ buffer sizes are given exogenously. Let κi denote the size of firm i’s buffer; no more than κi customers are allowed to the system. We assume exponential service times and the Poisson arrival process. The rate μi of service times are exogenously determined, and the rate di of Poisson arrival is an output of the price competition. In the queueing theory notation, each firm i is a M/M/1/κi system. We assume that the materials cost is wi > 0 per served customer, and the diverted customers’ demand due to buffer overflow is met by an emergence supply at the cost of wi + bi unit per customer, where bi > 0. The demand arrival rate di = di (p) is determined as a function of the price vector p. It follows that firm i’s time-average revenue is pi · di − Ci (di ), where Ci (di ) is the sum of wi · di and the time-average number of customers diverted from the system is multiplied by bi . Thus, according to elementary queueing theory (see, for example, [15]), di · (1 − di /μi )(di /μi )κi , 1 − (di /μi )κi +1 di = wi · di + bi · , κi + 1
Ci (di ) = wi · di + bi ·
if di = μi if di = μi .
826
G. Gallego et al.
Algebraic manipulation shows that Ci (·) is convex and continuously twice differentiable, satisfying ci (0) = wi > 0.
4
Existence of Equilibrium
In this section, we show that the oligopoly price competition has a unique equilibrium. Given the price vector, each firm’s profit function is given by expression (2), where its demand is determined by (1). We first show that the first order condition ∂πi /∂pi = 0 is sufficient for the Nash equilibrium (Proposition 1). For each value of a suitably defined aggregate attraction δ, we show that there is at most one candidate for the solution of the first order condition (Proposition 2). Then, we demonstrate that there exists a unique value δ of the aggregator such that this candidate indeed solves the first order condition (Propositions 3 and 4). We proceed by assuming both Conditions A, B and C. Let ςi (p) := ηi (pi )·(1−di (p)). Proposition 1. Firm i’s profit function πi is strictly quasi-concave in pi ∈ (0, ρi ). If p∗ = (p∗1 , . . . , p∗n ) ∈ P satisfies ∂πi (p∗ )/∂pi = 0 for all i, p∗ is the Nash equilibrium in P, and p∗i > ci (0) for each i. Furthermore, the condition ∂πi /∂pi = 0 is equivalent to ci (di (p))/pi = 1 − 1/ςi (p) . (3) n Given a price vector, let δ := j=1aj (pj ) be the aggregate attraction. The n support of δ is Δ := 0, j=1 aj (0+) . From (A1), it follows that δ ∈ Δ. Then, −1 di = ai (pi )/(δ+κ). Since a−1 ((δ+κ)di ). i is well-defined by (A1), we get pi = ai Thus, (3) is equivalent to
ci (di ) 1 =1− . ai −1 ((δ + κ)di ) ηi ◦ ai −1 ((δ + κ)di ) · (1 − di )
(4)
Observe that there is one-to-one correspondence between p = (p1 , . . . , pn ) and d = (d1 , . . . , dn ), given δ (and of course, κ). Let Di (δ) be the solution to (4) given δ (and κ). The existence and uniqueness of Di (δ) are guaranteed by Proposition 2 below. The Di (δ)’s may not sum up to the “correct” value of δ/(δ + κ) unless a set of conditions is satisfied (Propositions 3). Proposition 4 shows the existence of a unique δ such that the D i (δ)’s sum up to δ/(δ + κ). (0+) Let di (δ) := min aiδ+κ , 1 be an upper bound on the market share of firm i. For each fixed δ ∈ Δ, we define the following real-valued functions on 0, di (δ) : ci (xi ) 1 and Ri (xi |δ) := 1 − . ai −1 ((δ + κ)xi ) ηi ◦ ai −1 ((δ + κ)xi ) (1 − xi ) We remark that both Li (xi |δ) and Ri (xi |δ) are continuous in xi in 0, di (δ) . Li (xi |δ) :=
Proposition 2. For each i and each δ ∈ Δ, Li (·|δ) is positive and strictly increasing, and Ri (·|δ) is strictly decreasing. Furthermore, Li (xi |δ) = Ri (xi |δ) has a unique solution in 0, di (δ) , i.e., Di (δ) is a well-defined function of δ.
Linear Convergence of Tatˆ onnement in a Bertrand Oligopoly
827
For any aggregate attraction δ ∈ Δ, Proposition 2 shows that there is a unique solution xi satisfying Li (xi |δ) = Ri (xi |δ), and this solution is Di (δ). It represents demand that maximizes firm i’s profit provided that the aggregate attraction remains at δ. Also, define D(δ) := D1 (δ) + · · · + Dn (δ). Note that Di (δ) is a strictly decreasing function since any increase in δ lifts the graph of Li (xi |δ) and drops that of Ri (xi |δ). Therefore D(δ) is also a strictly decreasing function. Furthermore, Di (δ) is a continuous function of δ. Thus, D(δ) is continuous. δ Proposition 3. For fixed δ ∈ Δ, D(δ) = δ+κ holds if and only if there exist p = (p1 , . . . , pn ) and d = (d1 , . . . , dn ) such that the following set of conditions n hold: (i) δ = j=1 aj (pj ), (ii) di = ai (pi )/(δ + κ) for each i, and (iii) Li (di |δ) = Ri (di |δ) for each i. In such case, furthermore, the price vector corresponding to δ any δ satisfying D(δ) = δ+κ is unique. δ If there is δ ∈ Δ satisfying D(δ) = δ+κ , then by Proposition 3, the corresponding price vector satisfies ∂πi /∂pi = 0 for all i. By Proposition 1, this price vector is a Nash equilibrium. For the unique existence of the equilibrium, it suffices to show the result of the following proposition.
Proposition 4. There exists a unique δ ∈ Δ such that D(δ) = δ/(δ + κ). Theorem 1. There exists a unique positive pure strategy Nash equilibrium price vector p∗ ∈ P. Furthermore, p∗ satisfies p∗i > ci (0) for all i = 1, · · · , n.
5
Convergence of Tatˆ onnement Scheme
In this section, we show that the unique equilibrium is globally stable under the tatˆonnement scheme. Suppose each firm i chooses a best-response pricing strategy: choose pi maximizing his profit πi (p1 , . . . , pn ) while pj ’s are fixed for all j = i. By Theorem 1, there exists a unique equilibrium vector, which is denoted by p∗ = (p∗1 , · · · , p∗n ) ∈ P. Define Q := (0, a1 (0+)) × · · · × (0, an (0+)). Let q∗ = (q1∗ , · · · , qn∗ ) ∈ Q be the corresponding attraction vector where qi∗ := ai (p∗i ). Let qˆi := j =i qj be the sum of attraction quantities of firms other than i. Set θi∗ := qi∗ /(ˆ qi∗ + κ) and d∗i := qi∗ /(qi∗ + qˆi∗ + κ), which are both positive. Suppose we fix the price pj for all j = i, and let qi := ai (pi ) be the corresponding attraction. Since ai is one-to-one and δ = qi + qˆi , condition (4) is equivalent to qi ci qi +ˆ qi +κ 1 qi = 1 − 1 + . (5) ai −1 (qi ) ηi ◦ ai −1 (qi ) qˆi + κ Using an argument similar to Proposition 2 and ensuing discussion, it can be shown that there is a unique solution qi to (5) for each qˆi given by any pos itive number less than j a (0+). We call this solution qi the best response i =i function ψ (ˆ q ) for firm i. The unique equilibrium satisfies ψi (ˆ qi∗ ) = qi∗ where i i ∗ ∗ qˆi = j =i qj . Furthermore, it is easy to show that ψi (·) is strictly increasing.
828
G. Gallego et al.
Proposition 5. ψi (·) is a strictly increasing function. From the definition of θi∗ and ψi (ˆ qi∗ ) = qi∗ , we know ψi (ˆ qi )/(ˆ qi +κ) = θi∗ at qˆi = qˆi∗ . The following proposition characterizes the relationship between ψi (ˆ qi )/(ˆ qi +κ) and θi∗ . Proposition 6. ψi (ˆ qi )/(ˆ qi + κ) is strictly decreasing in qˆi , and satisfies ψi (ˆ qi∗ )/ ∗ ∗ (ˆ qi + κ) = θi . Thus, ⎧ > θi∗ , for qˆi < qˆi∗ ψi (ˆ qi ) ⎨ = θi∗ , for qˆi = qˆi∗ qˆi + κ ⎩ < θi∗ , for qˆi > qˆi∗ . Furthermore, ψi (ˆ qi ) is continuous, and satisfies 0 < ψi (ˆ qi∗ ) < θi∗ . Let q = (q1 , . . . , qn ). We denote the vector by Ψ (q) = of best response functions ∗ ∗ ∗ (ψ1 (ˆ q1 ), · · · , ψn (ˆ qn )) ∈ Q, where qˆi = q . Note that q = (q j 1 , · · · , qn ) j =i ∗ ∗ 1 is a fixed point of Ψ , i.e., Ψ (q ) = q . By Proposition 5, we have Ψ (q ) < Ψ (q2 ) whenever two vectors q1 and q2 satisfy q1 < q2 . (The inequalities are component-wise.) We now show that best-response pricing converges to the unique equilibrium. We define the sequence {q(0) , q(1) , q(2) , · · · }⊂ Q by q(k+1) := Ψ (q(k) ) for k ≥ 0. Theorem 2. If each firm employs the best response strategy based on the prices of other firms in the previous iteration, the sequence of price vectors converges to the unique equilibrium price vector. Proof. Let q(0) ∈ Q denote the attraction vector associated with the initial price vector. Choose q(0) , q(0) ∈ Q such that q(0) < ˆb(0) < q(0) and q(0) < ˆb∗ < q(0) . Such q(0) and q(0) exist since Q is a box-shaped open set. For each k ≥ 0, we define q(k+1) := Ψ (q(k) ) and q(k+1) := Ψ (q(k) ). From the monotonicity of Ψ (·) (Proposition 5) and Ψ (ˆb∗ ) = ˆb∗ , we get q(k) < ˆb(k) < q(k)
and q(k) < ˆb∗ < q(k) .
(6)
(k)
Let u(k) := maxi ˆq i / qˆi∗ . Clearly, u(k) > 1 for all k by (6). We show that ∞ the sequence u(k) k=0 is strictly decreasing. For each i, (k+1) qi
(k) (k) (k) = ψi (ˆq i ) < ˆq i + κ · θi∗ = ˆ qi + κ ·
ˆq (k) qi∗ ≤ i ∗ · qi∗ ≤ u(k) qi∗ , qˆi∗ + κ qˆi
where the first inequality comes from Proposition 6, the second one from (6), and the last one from the definition of u(k) . Thus (k+1) ˆq (k+1) = qj < u(k) qj∗ = u(k) qˆi∗ , (7) i j =i
j =i
Linear Convergence of Tatˆ onnement in a Bertrand Oligopoly
829
∞ (k+1) ∗ and u(k+1) = maxi {ˆqi /ˆ qi } < u(k) . Since u(k) k=0 is a monotone and bounded sequence, it converges. Let u∞ := limk→∞ u(k) . We claim u∞ = 1. Suppose, by way of contradiction, u∞ > 1. By Proposition 6, ψi (ˆ qi )/(ˆ qi + κ) is strictly decreasing in qˆi . Thus, for any qˆi ≥ 12 (1 + u∞ ) · qˆi∗ , there exists ∈ (0, 1) such that for each i, we have ψi (ˆ qi ) ψi (ˆ q∗ ) q∗ ≤ (1 − ) · ∗ i = (1 − ) · ∗ i . (ˆ qi + κ) qˆi + κ qˆi + κ (k) For any k, if qˆi ≥ 12 (1 + u∞ ) · qˆi∗ , then (k) (k) (k+1) qi = ψi ˆq i ≤ ˆq i + κ · (1 − ) ·
≤
(k)
(k+1)
qi∗ +κ
ˆq (k) i · qi∗ · (1 − ) ≤ (1 − ) · u(k) · qi∗ . qˆi∗
Otherwise, we have qˆi∗ < ˆq i qi
qˆi∗
< 12 (1 + u∞ ) · qˆi∗ . By Proposition 6,
(k) (k) = ψi qˆi < ˆq i + κ ·
ˆq (k) qi∗ 1 i ≤ · qi∗ ≤ (1 + u∞ ) · qi∗ . ∗ ∗ qˆi + κ qˆi 2
Therefore, we conclude, using an argument similar to (7), u(k+1) ≤ max{(1−) · u(k) , (1+u∞ )/2}. From u(k+1) > u∞ > (1+u∞ )/2, we obtain u(k+1) ≤ (1−)·u(k) , implying u(k) → 1 as k → ∞. This is a contradiction. Similarly, we can show that l(k) := mini ˆq (k) / qˆi∗ is a strictly increasing sequence converging to 1. i The following proposition shows the linear convergence of tatˆ onnement in the space of attraction values. Proposition 7. The sequence {q(k) }k≥0 converges linearly.
∞ ∞ Proof. Consider q(k) k=0 and q(k) in the proof of Theorem 2. Recall k=0 q(k) < ˆb(k) < q(k) and q(k) < ˆb∗ < q(k) . We will show that q(k) and q(k) converges to ˆb∗ linearly. Since Q is a box-shaped open set, there exists
∞a convex (k) ∞ (k) compact set B ⊂ Q containing all elements of q and q . From k=0 k=0 the proof of Proposition 6, there exists δ > 0 such that for any ˆb ∈ B, we have d ψ(ˆ qi ) ≤ −δ. dˆ qi qˆi + κ (k) From integrating both sides of the above expression from qˆi∗ to ˆq i , (k) ψi (ˆq i )
ˆq (k) i
+κ
−
(k) ψ(ˆ qi∗ ) ≤ −δ ˆq i − qˆi∗ ∗ qˆi + κ
since the line segment connecting ˆb∗ and q(k) lies within B.
830
G. Gallego et al.
Define δ1 := δ · mini minˆb∈B {(ˆ qi + κ) · qˆi∗ /qi∗ } > 0. We choose δ > 0 is (k) (k+1) sufficiently small such that δ1 < 1. Recall q i = ψi (ˆq i ) and qi∗ = ψ(ˆ qi∗ ). (k) Rearranging the above inequality and multiplying it by (ˆq i + κ)/qi∗ , (k+1)
qi
(k)
/qi∗ ≤ (ˆq i
(k)
+ κ)/(ˆ qi∗ + κ) − δ · (ˆq i
(k)
− qˆi∗ ) · (ˆq i
+ κ)/qi∗
(k) ∗ (k) ∗ (k) ∗ ≤ˆ q i /ˆ qi − δ1 · (ˆq i /ˆ qi − 1) = (1 − δ1 ) · ˆq i /ˆ qi + δ1 . (k) where the second inequality comes from ˆq i > qˆi∗ and the definition of δ1 . (k) (k) Let ρ(k) := maxi {q i /qi∗ }. Thus, q j ≤ ρ(k) · qj∗ holds for all j, and summing (k) (k+1) ∗ this inequality for all j = i, we get qˆi ≤ ρ(k) · qˆi∗ . Thus, q i /qi is bounded above by (1−δ1 )·ρ(k)+δ1 for each i, and we obtain ρ(k +1) ≤ (1−δ1 )·ρ(k)+δ1 . Using induction, it is easy to show
ρ(k) ≤ (1 − δ1 )k · (ρ(0) − 1) + 1 Therefore, we obtain
(k) max (q i − qi∗ )/qi∗ = ρ(k) − 1 ≤ (1 − δ1 )k · (ρ(0) − 1) i
(0) = (1 − δ1 )k · max (q i − qi∗ )/qi∗ and i
(k) (0) max q i − qi∗ ≤ (1 − δ1 )k · max{qi∗ } · max (q i − qi∗ )/qi∗ , i
i
i
showing the linear convergence of the upper bound sequence q(k) . A similar argument shows the linear convergence of q(k) . The linear convergence in the above theorem is not with respect to the price vector but with respect to the attraction vector, i.e. not in p but in q. Yet, the following theorem also shows the linear convergence with respect to the price (k) (k) vector. Let {p(k) }k≥0 be the sequence defined by pi := a−1 i (qi ). Theorem 3. The sequence {p(k) }k≥0 converges linearly.
∞ ∞ Proof. Consider q(k) k=0 and q(k) in the proof of Proposition 7. Let k=0
∞ (k) ∞ p and p(k) be the corresponding sequences of price vectors. k=0 k=0
(0)
Within the compact interval [pi , p(0) ], the derivative of ai is continuous and i its infimum is strictly negative. By the Inverse Function Theorem, the derivative −1 ∗ (0) (0) ∗ of a−1 i (·) is continuous in the compact domain of [q i , q i ]. Recall pi = ai (qi ). There exists some bound M > 0 such that −1 ∗ ∗ |pi − p∗i | = |a−1 i (qi ) − ai (qi )| ≤ M · |qi − qi | (0)
whenever ai (pi ) = qi ∈ [q (0) , q i ] for all i. From the proof of Proposition 7, i (k)
(k)
(0)
we obtain qi ∈ (q (k) , q i ) ⊂ [q (0) , q i ]. Therefore, the linear convergence of i i (k) {q }k≥0 implies the linear convergence of {p(k) }k≥0 .
Linear Convergence of Tatˆ onnement in a Bertrand Oligopoly
831
References 1. G. Allon and A. Federgruen. Service competition with general queueing facilities. Technical report, Working Paper, 2004. 2. S. P. Anderson, A. De Palma, and J.-F. Thisse. Discrete Choice Theory of Product Differentiation. The MIT Press, 1996. 3. F. Bernstein and A. Federgruen. Comparative statics, strategic complements and substitutes in oligopolies. Journal of Mathematical Economics, 40(6):713–746, 2004. 4. F. Bernstein and A. Federgruen. A general equilibrium model for industries with price and service competition. Operations Research, 52(6):868–886, 2004. 5. D. Besanko, S. Gupta, and D. Jain. Logit demand estimation under competitive pricing behavior: An equilibrium framework. Management Science, 44:1533–1547, 1998. 6. G. P. Cachon and P. T. Harker. Competition and outsourcing with scale economies. Management Science, 48(10):1314–1333, 2002. 7. A. Caplin and B. Nalebuff. Aggregation and imperfection competition: On the existence of equilibrium. Econometrica, 59:25–59, 1991. 8. E. H. Chamberlin. The Theory of Monopolistic Competition. Harvard University Press, 1933. 9. P. Dubey, O. Haimanko, and A. Zapechelnyuk. Strategic complements and substitutes, and potential games. Technical report, Working Paper, 2003. 10. J. Friedman. Oligopoly Theory. Cambridge University Press, 1983. 11. G. Gallego, W. T. Huh, W. Kang, and R. Philips. Price Competition with Product Substitution: Existence of Unique Equilibrium and Its Stability. Technical report, Working Paper, 2005. 12. H. Hotelling. Stability in competition. Economic Journal, 39:41–57, 1929. 13. R. D. Luce. Individual Choice Behavior: A Theoretical Analysis. Wiley, 1959. 14. S. Mahajan and G. Van Ryzin. Supply chain contracting and coordination with stochastic demand. In S. Tayur, M. Magazine, and R. Ganeshan, editors, Quantitive Models for Supply Chain Management. Kluwer, 1998. 15. J. Medhi. Stochastic Models in Queueing Theory. Academic Press, second edition, 2003. 16. P. Milgrom and J. Roberts. Rationalizability, learning, and equilibrium in games with strategic complementarities. Econometrica, 58:1255–1277, 1990. 17. P. Milgrom and C. Shannon. Monotone comparative statics. Econometrica, 62:157– 180, 1994. 18. T. Mizuno. On the existence of a unique price equilibrium for models of product differentiation. International Journal of Industrial Organization, 62:761–793, 2003. 19. J. Roberts and H. Sonnenschein. On the foundations of the theory of monopolistic competition. Econometrica, 45:101–113, 1977. 20. J. Robinson. The Economics of Imperfect Competition. Macmillan, 1933. 21. K. C. So. Price and time competition for service delivery. Manufacturing & Service Operations Management, 2:392–407, 2000. 22. D. M. Topkis. Equilibrium points in nonzero-sum n-person submodular games. SIAM Journal on Control and Optimization, 17:773–787, 1979. 23. X. Vives. Nash equilibrium with strategic complementarities. Journal of Mathematical Economics, pages 305–321, 1990. 24. X. Vives. Oligopoly Pricing: Old Ideas and New Tools. The MIT Press, 1999.
Design for Using Purpose of Assembly-Group Hak-Soo Mok1, Chang-Hyo Han1, Chan-Hyoung Lim2, John-Hee Hong3, and・Jong-Rae Cho3 1 Dept.
of Industrial Engineering, Pusan National University, 30 Changjeon-Dong, Kumjeong-Ku, Busan, 609-735, Korea Tel.: (051)510-1435; Fax: (051)512-7603 [email protected] 2 Nemo Solutions Consulting, 7th floor, Korea Electric Power Corporation Building, 21, Yeoido-dong, Yeongdeungpo-gu, Seoul, 150-875, Korea Tel.: (02)3454-0340; Fax: (02)786-0349 [email protected] 3 Hyundai Kia Motors co, Corporate Research & Development Division 104, Mabuk-Ri, Guseong-Eup, Yongin-Si, Gyenonggi-Do, 449-912, Korea Tel.: (031) 899-3056; Fax: (031)368-6786 [email protected]
Abstract. In this paper, the disassemblability is determined by the detail weighting factors according to the using purpose of assembly-group. Based on the disassembly mechanism and the characteristics of parts and assemblygroups, the disassembly functions are classified into three categories; accessibility, transmission of disassembly power and disassembly structure. To determine the influencing parameters, some assembly-groups of an automobile are disassembled. The weighting values for the influencing factors are calculated by using of AHP (Analytic Hierarchy Process). Using these weighting values, the point tables for the using purpose are determined. Finally, an optimal design guideline for the using purpose of an assembly-group can be decided.
1 Introduction The shortage of landfill and waste burning facilities constantly reminds us that our products do not simply disappear after disposal. It is currently acknowledged that the most ecologically sound way to treat a worn out product is recycling. Because disassembly is related to recycling and is a necessary and critical process for the endof life (EOL) of a product, the design methodology should be developed in terms of environmentally conscious designs (Eco-design). Disassembly can be defined as a process of systematic removal of desirable parts from an assembly while ensuring that parts are not impaired during the process.[1] The goal of disassembly for recycling is to separate different materials with less effort. There should be a countermeasure for companies to reduce the recycling expenses. There is also increased demand for products that can be easily maintained. In other words, by the reducing the disassembly time we can decrease the man-hours. Now the environmental problems are seriously discussed among many companies. Fig. 1 shows the life cycle of a worn M. Gavrilova et al. (Eds.): ICCSA 2006, LNCS 3982, pp. 832 – 840, 2006. © Springer-Verlag Berlin Heidelberg 2006
Design for Using Purpose of Assembly-Group
833
out product. In order to design a product, which is environmentally benign, the life cycle of the product should be well understood. The disassembly technology should be systematized to reduce the recycling cost and time, because the worn out products transported from the logistics center can be reused or recycled after disassembly processes.[2],[3].
Fig. 1. Location of disassembly in life cycle of a product
The EU policy is calling for mandatory environmental policies for EU-member countries to regulate scrapped vehicles recycling. From early 2006, 85% of the weight of a scrapped vehicle should be recycled and 95% including 10% energy of a vehicle should be recycled after 9 years. [4],[5] In order to recycle an assembled good, there must be disassembly and sorting processes. To conduct these processes easily, the structure of a product and assemblygroups should be oriented to disassembly. Furthermore, the product and assemblygroups have to be designed with consideration of their using purposes. In this paper, the using purposes are divided into (1) user aspect, (2) A/S aspect, (3) reuse aspect and (4) recycling aspect. According to these four categories, a new product and assembly-groups should be designed. Then, the main activities for this research are showed in the followings: ∗ Analysis of the mechanism of disassembly and understanding of the weak disassembly processes ∗ Determination of the influencing parameters on the disassembly ∗ Determination of the detail weighting factors for detail disassembly functions ∗ Determination of the weighting values for using purposes of assemblygroups ∗ Evaluation of the disassemblability points ∗ Establishment of the point tables of the disassemblability ∗ Selection of an optimal alternative among several design guidelines
2 Detail Functions of Disassembly Definition of Disassemblability In this paper, we determined that five detail functions are normally needed for disassembly: fixing of the object, accessing the disassembly point, transmitting the
834
H.-S. Mok et al.
disassembly power, grasping and handling of the object. Fig. 2 shows the definitions of these detail functions. In order to improve the disassemblability of a product, the previous five detail functions should be simplified and be conducted more easily. D is a s s e m b ly m e c h a n is m
G G G G G
G G
F ix in g w G G G G G GG G G G G G G G G
G ra s p in g p G G G G SG G G G SG G G G G G G G
Access w G G G G G G G G G G G G G
3 4
2
Tra n s m is s io n o f d is a s s e m b ly p o w e r
H a n d lin g w G G G G G G G G G G G 1 G
5
t G G G G GOk G G G G G G G PG
Fig. 2. Definition of detail functions of disassembly
G
iG G
In Fig. 3, the detail functions of disassembly are classified according to the required disassembly time and the object of the assembly-group [6]. G
k GGG tT G
z z z
G
nG hGG {GG GG G
k GGG z z z z
k GGG
k GGG
hG G G
z
tT G
z z
z
zG G
z
hGOGPG G {GG GG
mG G hG G oG G {GG GG
z
hG G OGSGPG {GG GG G OGPG
wGG G OGGGTSGGGGPG h GG G OGG GSGGGGSGGGG G PG G
Fig. 3. Classification of the disassembly functions
The object of disassembly-groups is divided into the disassembly of product and the disassembly of part. In the disassembly of product, it is not needed the fixing processes, because the product is fixed and stable by the weight of the product. In this case, only three detail functions would be used: (1) access into the object, (2) transmission of the disassembly power and (3) structure of the product (Table. 1).[7]
Design for Using Purpose of Assembly-Group
835
Table 1. Definition of detait disassembly function
Category
Definition Degree of ease of access to joining point or
accessibility
Process
specification position for disassembly.
Transmission of disassembly power Disassembly
Structure
Degree of ease of transmit the disassembly power at disassembly position Structural properties of objects that influence on
Structure
diassemblability
For the analysis of these disassembly functions, the geometrical properties of parts, the applied fastening methods, the connecting parts and the number of parts are checked.
3 Determination of the Detail Weighting Factors In order to determine the weighting factors of the detail disassembly function for the product and the assembly-group, several disassembly experiments were performed in the lab. Using these experimental data, the criteria for the evaluation of the disassemblability and the levels for each criterion should be determined. The difficulty of disassembly function, the needed personal motion, the average disassembly time, the frequency of disassembly process and the frequency of weak process are considered as the criteria. The evaluation level is determined by the analysis of the disassembly experiment of assembly-groups. Using these criteria and the levels, we can find the weighting factors for the detail disassembly process. Table 2 shows the procedure to find the detail weighting factors. The detail weighting factor for the accessing process is the value in 0.342 and for the transmission of disassembly power is 0.658. If the weighting factor for the structural property will be 0.5, the weighting factor for accessibility will be 0.171 and the weighting factor for the transmission of disassembly power is 0.329. Because the characteristics of product structure play an important role in the disassembly process. Table 2. Detail weighting factors of detail disassemblability Evaluation criteria Frequency Difficulty Grade of Average Frequency of of necessary in work time of process bottleneck process Body part process Accessibility
3
3
1
3
3
5
5
5
5
5
Transmission of disassembly power
Total
13/38 =0.342 25/38 =0.658
z The weight of accessibility:
0.342 * 0.5 = 0.171 z The weight of transmission of
disassembly power : 0.658 * 0.5 = 0.329 z The weight of disassembly structure : 0.500
836
H.-S. Mok et al.
4 Determination of an Optimal Alternative of Design Guideline Using the Point Table for Disassembly The following steps are used to determine the point of disassemblability: • Determination of the detail weighting factors ; (in 3chapter) • Determination of the weighting values(wi) for the influencing parameters • Multiplication of the detail weighting factor(wj) for the disassemblability by the weighting value for the influencing parameter ; (This equals total weighting factor) • Multiplication of the score for the level of the detail influencing parameter for the detail influencing parameter by the total weighting factor The disassembly experiments are performed to determine the detail weighting factors, which are used to identify the weak disassembly process and more complex parts that cannot be disassembled easily. Then using these results, the detail weighting factors for the detail influencing parameters can be calculated by the AHP (Analytic Hierarchy Process). The total weighting factor is a main contribution of the higher quantitative evaluation of the disassemblability. The score for the level of the detail influencing parameter can be decided by the given disassembly conditions and the level number. The following step is the procedure to estimate the disassemblability points for the disassembly purpose according to the user of a product. Step 1. Definition the weighting factors(wi) for the detail disassemblability (ex. the weighting for accessibility : 0.171 ) Step 2. Definition the weighting value(wj) for each influencing parameter
z
Size of access space : 0.281
z
Visuality of access route : 0.260
z
Self-location : 0.299
z
Number of access direction : 0.160
Step 3. Multiplication < wi * wj *100> (The weight of access space: 0.171 * 0.281 *100) Step 4. Multiplication the detail disassemblability for
each influencing
parameter and the score value of Step 3. ( here, the detail disaaaemblability with 3 grade : 1 , 3, 5 point and 2 grade : 1, 5 point ) Step 5. As the disassemblability point is determined the sum of Step 4 for each influencing parameter
Using the point of the disassemblability, the point table of the disassemblability for each using purpose of the product can be established. In the point table of the disassemblability we can find the position according to the given disassembly condition: the property of access, the visual ability, the existence of self-location and the number of access directions. From this position we can have four possible alternatives to improve the disassemblability of a product. Here we assume that it is possible that an alternative can be obtained by the changing only one parameter at a time. This assumption could give us the simple solution.
Design for Using Purpose of Assembly-Group
837
This suggests that the possible direction of the improvement can not be found in a diagonal line. Now in this step, an optimal alternative can be determined by choosing the alternative that has the highest score of the disassemblability. Fig. 4 shows the step used to determine the possible alternative of the design guideline. In the first step, the disassemblability point is determined according to the disassembly condition. Then, we can find the position in the point table. For example, the disassemblability point is 41 in the given disassembly condition: when the access is direction-restricted, a visuality is medium, the selflocation is zero, and the change of disassembly direction is one. In the second step, the alternative is determined using our algorithm to get a design guideline. In this paper, there are four alternatives. In the third step, an optimal design alternative is determined from the comparison of the point difference among the occurred alternatives. In this case, the fourth alternative is selected as the best solution from among four alternatives, because the rising gap of the score of the disassemblability is the highest point (here, 21 point). In other words, the most valuable design guideline is: a self-location should be established in the product.
Point table of disassemblability for a user in aspect of accessibility Step 1
Accessibility Score table
Existence Self-location Nonexistence self-location
free
0 1 2 0 1 2
restriction
difficult
high
middle
low
high
middle
low
high
middle
low
86 80 75 65 60 54
77 71 66 56 51 45
68 62 57 47 42 36
76 70 65 55 50 45
67 62 56 47 41 36
58 53 47 38 32 27
66 61 55 46 40 35
57 52 46 37 31 26
49 43 38 28 23 17
Step 2 Accessibility Score table Existence Self-location Nonexistence self-location
free 0 1 2 0 1 2
restriction
z Point 41 position :
-
access space : restriction access visuality : middle self-location : Nonexistence access direction : 1
z Point 41 position to detail design principle :
difficult
high
middle
low
high
middle
low
high
middle
low
86 80 75 65 60 54
77 71 66 56 51 45
68 62 57 47 42 36
76 70 65 55 50 45
67 62 56 47 41 36
58 53 47 38 32 27
66 61 55 46 40 35
57 52 46 37 31 26
49 43 38 28 23 17
low
high
low
high
-
alternative 1 : Nonexistence self-location Æ Existence
-
alternative 2 : Number of access direction ; 1 Æ 0
-
alternative 3 : visuality low Æ high
-
alternative 4 : size of access space restriction Æ free
Step 3 Accessibility Score table Existence Self-location Nonexistence self-location
free high
0 1 2 0 1 2
middle
restriction
86 77 68 80 71 62 75 66 57 65 Alternative 56 447 60 51 42 54 45 36
76 70 65 55 50 45
middle
difficult
67 58 66 Alternative 1 62 53 61 56 47 55 47 Alternative 38 246 41 32 40 Alternative 3 36 27 35
z Alternative 1 : 41 point Æ 62 point
middle
low
57 52 46 37 31 26
49 43 38 28 23 17
21 point improvement z Alternative 2 : 41 point Æ 47 point 6 point improvement z Alternative 3 : 41 point Æ 50 point 9 point improvement z Alternative 4 : 41 point Æ 51 point 10 point improvement
Fig. 4. The point table of disassemblability for the user according to the accessibility
838
H.-S. Mok et al.
5 Case Study As a case study we considered the air cleaner of an automobile. By disassembling air cleaner, we found several weak disassembly processes. The condition of the disassembly process and the characteristics of three assembly-groups are analyzed according to their using purpose. Fig. 5 shows the disassembly condition and the position in the point table. And we can have four possible design guidelines in the aspects of a user of product.
΅ΣΒΟΤΞΚΤΤΚΠΟ͑ΠΗ͑
·ΚΤΦΒΝΚΥΪ͑ΙΚΘΙ͑
ΕΚΤΒΤΤΖΞΓΝΪ͑ ΡΠΨΖΣ͑ΡΠΚΟΥ͑ΥΒΓΝΖ͑
ͷΣΖΖ͑ΤΡΒΔΖ͑
ΞΚΕΕΝΖ͑ ΙΚΘΙ͑ ΝΠΨ͑ ΞΚΕΕΝΖ͑ ΙΚΘΙ
ͷΣΖΖ͑ΤΡΒΔΖ͑
ΖΤΥΣΚΔΥΚΠΟ͑ΤΡΒΔΖ͑
ΝΠΨ
ΞΚΕΕΝΖ
ΙΚΘΙ
ΝΠΨ͑
ΞΚΕΕΝΖ
ΙΚΘΙ
ͧͦ͢
ͥͤ͑͢
ͣͣ͑͢ ͤͤ͑͢
ͣ͑͢͢
ͪ͡
ͥͣ͢
ͣ͑͢͢
͢͡͡
͑͢͢͢
ͩͪ͑
ͧͩ
͑ ΓΠΝΥ͑ ΞΚΕΕΝΖ͑ ͥͨ͢
ͣͦ͑͢
ͥ͑͢͡ ͦ͑͢͢
ͪͥ͑
ͨͤ
ͣͥ͢
ͤ͑͢͡
ͩͣ
ͪͤ͑
ͨͣ͑
ͦ͢
ͦͦ
ͨ͢͡
ͩͦ͑
ͧͥ
ͨͦ͑
ͦͥ͑
ͤͤ
ΙΚΘΙ͑
ΝΠΨ
G vGGyG
·ΚΤΦΒΝΚΥΪ͑ΝΠΨ͑
ΖΤΥΣΚΔΥΚΠΟ͑ΤΡΒΔΖ͑
ΝΠΨ͑
ͣͪ͢
ͩ͑͢͡
ͩͨ͑
ͪͨ͑
ͨͧ͑
ΙΚΘΙ͑
ͧͦ͢
ͥͣ͑͢
ͣ͑͢͡ ͥͣ͑͢
ͪ͑͢͢
ͪͨ
ͤͣ͢
͑͢͢͡
ͩͩ
͑͢͢͡
ͩͨ͑
ͧͦ
ΚΟΤΖΣΥ͑ ΞΚΕΕΝΖ͑ ͥͩ͢
ͣͧ͑͢
ͥ͑͢͡ ͣͧ͑͢
ͤ͑͢͡
ͩ͢
ͨ͢͢
ͪͥ͑
ͨͣ
ͪͥ͑
ͨ͑͢
ͥͪ ͤͤ
΄ΟΒΡ͑ ͷΚΥ͑
ΝΠΨ͑
ͤͣ͢
͑͢͢͡
ͩͩ͑
͑͢͢͡
ͩͨ͑
ͧͦ
͢͢͡
ͨͩ͑
ͦͧ
ͨͩ͑
ͦͦ͑
ΙΚΘΙ͑
ͧͦ͢
ͤͪ͑͢
ͥ͑͢͢ ͤͤ͑͢
ͩ͑͢͡
ͩͣ
ͥͣ͢
ͨ͑͢͢
ͪ͢
͑͢͢͢
ͩͦ͑
ͧ͡
ΞΚΕΕΝΖ͑ ͦ͢͢
ͣͧ͑͢
͑͢͡͡ ͣ͑͢͡
ͪͥ͑
ͧͪ
ͣͩ͢
ͤ͑͢͡
ͨͩ
ͪͨ͑
ͨͣ͑
ͥͨ
jGaGk GGGhGi G
ΝΠΨ͑
ͤͨ͢
ͣ͑͢͢
ͩͨ͑
ͧ͑͢͡
ͩ͑͢
ͦͦ
ͦ͢͢
ͪ͑͡
ͧͥ
ͩͥ͑
ͦͩ͑
ͤͤ
T
GaG TG
ΙΚΘΙ͑
ͧͦ͢
ͥͩ͑͢
ͤͣ͑͢ ͣͨ͑͢
͑͢͢͡
ͪͥ
ͥ͢͢
ͣͦ͑͢
ͪ͢͡
ͤ͑͢͡
ͩͨ͑
ͨ͢
T
GaGG
ʹΝΒΞΡ͑ ΞΚΕΕΝΖ͑ ͥͧ͢
ͣͪ͑͢
ͤ͑͢͢ ͩ͑͢͡
ͪ͑͢
ͨͦ
ͣͣ͢
ͧ͑͢͡
ͪ͡
ͩͥ͑
ͧͩ͑
ͦͣ
T
GGaGG
͑͢͢͡
ͪͥ͑
ͨͣ͑
ͦͧ
ͤ͢͡
ͩͨ͑
ͨ͢
ͧͦ͑
ͥͪ͑
ͤͤ
T
aGG
ΝΠΨ͑
ͣͨ͢
ͩͪ͑
Fig. 5. The current status of the transmission of disassembly power in the aspects of User and the design principle Table 3. The design principle in the purpose of usage from disassemblability Detail disassemblibility Accessibility
Purpose of using User
Design
A/S
of principle
Reuse
Recycling
z
Existence
self-
location
z
Existence
self-
Transmission of
Disassembly
disassembly power
structure
z
Valuality high
z
Disassembly
location
z
Existence
self-
z
Existence location
Grasp-ability
z
Disassembly power down
Connect part add
z
high self-
Connect part add
z
power down
location
z
z
Assembly point down
z
Assembly point down
In this case study, the disassembly conditions are: the assembly method is bolting, the gripping ability is low, the visual ability is low, and the working area is not enough (limited), the disassembly force is high needed. Based on these experimented data, we could determine the position in the point table of the disassemblability. The disassemblability point is 33 for the user-oriented purpose.
Design for Using Purpose of Assembly-Group
839
In this position, we could find four possible design guidelines. From among these four alternatives we choose one with the largest value (here, 75). It is the optimal design guideline in this case. In order to prepare an alternative for a design guideline of subassembly (air cleaner), some design guidelines according to the purpose of usage are shown in Table 3. In this case, its visuality is low at the point of bolting to the body, transmission of disassembly power is in low condition, also the visuality of the parts of duct is low and assembly point is hard to confirm. Since it was difficult to access the disassembly point with tools, the transmission of disassembly power was low. Fig. 6 shows an alternative of the design guideline from the aspects of the user. User
Principle: Reduce the disassembly power
Disassembly process : Air Body
Assembly factor Re-bolt
Disassembly time 19.68 sec
Alternative
Assembly factor Re-bolt
Disassembly time 12.27 sec
Problem • Assembly point: Not visible • Disassembly time: Much • Interference: Exist • Handling: Bad Alternative • Remove the interference • Move the assembly point • Reduce the number of assembly factors
Fig. 6. Design guidelines in the aspects of User for Duct and Air Body
6 Conclusion In this paper, the using purpose of an assembly-group is divided into four categories: aspect of user, A/S, reuse and recycling. According to these categories, the detail disassembly function and the influencing parameters were considered. The total weighting factors were used to evaluate the disassemblability. To choose an optimal alternative for a disassembly-oriented design we established the point table of disassemblability. Using the point table we found the position. This position shows the quantitative disassemblability for a given disassembly condition. The suggested algorithm was used to find an optimal design guideline systematically.
840
H.-S. Mok et al.
Acknowledgment This work was supported by Cleaner Production Technology Development of Ministry of Commerce, Industry and Energy (20040521) and Pusan National Research Grant.
References 1. Mok, H. S, Moon, K. S, Kim, S. H, Moon, D, S.: The Complexity Evaluation System of Automobile Subassembly for Recycling, Journal of the Korean society of Precision Engineering 16(5), (1999) 132-144 2. Mok, H. S, Kim, S. H, Yang, T. I.: Evaluation of the Product Complexity Considering the Disassemblability, Journal of the Korean society of Precision Engineering 16(5), (1999) 14-24 3. Mok, H. S, Cho, J. R.: Development of Product Design Methodology for Assemblability and Disassemblability Considering Recycling, Journal of the Korean society of Precision Engineering 18(7), (2001) 72-84 4. Autorecycling Wohin fuehrt der Weg, RECYCLING magazin, 4, (2003) 5. European Union, Speeding up the vehicle recycling, EU Kommission, 3 (2003) 6. Moon, K. S.: Analysis and Evaluation System for Disassembly, Ph.D thesis, Pusan… National University, Busan, South Korea, (2003) 7. Mok H. S, Hwang Hoon, Yang Tae-Il.: Module design of a product considering disassemblability, Korean Institute of Industrial Engineers/The Korean Operations and Management Science Society collaborated art and science contest in, Spring. A collection of a treaties. (2000)
A Conditional Gaussian Martingale Algorithm for Global Optimization Manuel L. Esqu´ıvel Departamento de Matem´ atica and CMA, FCT/UNL, Quinta da Torre, 2829-516 Caparica, Portugal [email protected] http://ferrari.dmat.fct.unl.pt/personal/mle/
Abstract. A new stochastic algorithm for determination of a global minimum of a real valued continuous function defined on K, a compact set of n , having an unique global minimizer in K is introduced and studied, a context discussion is presented and implementations are used to compare the performance of the algorithm with other algorithms. The algorithm may be thought to belong to the random search class but although we use Gaussian distributions, the mean is changed at each step to be the intermediate minimum found at the preceding step and the standard deviations, on the diagonal of the covariance matrix, are halved from one step to the next. The convergence proof is simple relying on the fact that the sequence of intermediate random minima is an uniformly integrable conditional Gaussian martingale.
Ê
1
Introduction
Quite some attention has been recently devoted to stochastic algorithms, as more than 300 bibliographic entries in the reference textbook [9] testifies. Highly schematized global optimization methods using randomized search strategies are object of a thorough synthetic theoretical study in [12] which also presents applications of these methods to engineering problems. Negative results as those in [10] show that overconfidence on the effectiveness of stochastic methods is not desirable but, nevertheless, it is natural to speculate that an adequate randomized algorithm may perform better than a deterministic one in global optimization, at least in most of the situations. Theoretical results such as those in [2], [11] and [7], indicate that stochastic algorithms may be thought to be as reliable as deterministic ones and efforts in order to find better performing algorithms continue to be pursued as in [1] and [6]. The main feature of the new algorithm presented here, allows to recover some interesting properties of other stochastic algorithms such as the clustering and adaptiveness properties simultaneously with the property of continuing to search the whole domain at each step, which is a characteristic feature of simulated annealing.
This work was done with partial support of FCT (Funda¸ca ˜o para a Ciˆencia e Tecnologia) program POCTI (Portugal/FEDER-EU). I hereby express my gratitude to Ana Lu´ısa Cust´ odio for enlightening discussions and suggestions and to the Mathematics Department for allowing an intensive use of one of its computer laboratories.
M. Gavrilova et al. (Eds.): ICCSA 2006, LNCS 3982, pp. 841–851, 2006. c Springer-Verlag Berlin Heidelberg 2006
842
2
M.L. Esqu´ıvel
The Solis and Wets Approach to Random Search
We recall next the powerful meta-approach of Solis and Wets (see [8]) in order to generalize its formulation to the case of adaptive random search and almost sure convergence. The original formulation of the authors solves the problem for non adaptive random search and convergence in probability, as noted in the remark 1 ahead. According to [2, p. 22], with the exception of [8] there were no synthesis studies of random search algorithms prior to 1986. Consider f : K ⊆ n −→ , K a Borel set, where we suppose that for x ∈ K c we have f (x) = +∞ and, (Ω, A, P) a complete probability space. The following general selection scheme is the nuclear part of the random algorithm. Let ψ : K × n −→ K be such that the following hypothesis [H1] is verified. ∀x, t f (ψ(t, x)) ≤ f (t) (2.1) ∀x ∈ K f (ψ(t, x)) ≤ f (x) . The general conceptual algorithm of Solis and Wets is as follows. S. 0: Take t0 ∈ K and set j = 0. S. 1: Generate a point xj from ( n , B( n ), Pj ). S. 2: Set tj+1 = ψ(tj , xj ) choose Pj+1 , set j = j + 1 and return to step 1 (S.1). Observe that in adaptive methods, xj is a point with distribution Pj which depends on tj−1 , tj−2 , . . . , t0 , thus being a conditional distribution. Let now λ denote the Lebesgue measure over ( n , B( n )) and α be the essential infimum of f over K, that is: α := inf{t ∈ : λ({x ∈ K : f (x) < t}) > 0}. Consideration of the essential infimum is mandatory to deal correctly with non continuous or unbounded functions such as 1I[0,1]\{1/2} defined in [0, 1], or ln(|x|)1IÊ\{0} + (−∞)1I{0} . Let Eα+,M denote the level set of f having level α + defined by: {x ∈ K : f (x) < α + } if α ∈ Eα+,M := (2.2) {x ∈ K : f (x) < M } if α = −∞ . A Solis and Wets’s type convergence theorem may now be formulated and proved. Theorem 1. Suppose that f is bounded from below. Let the sequence of random variables (Tj )j≥0 be defined inductively by using the sequence (Xj )j≥0 which, in turn, depends on the family of probability laws (Pj )j≥0 given by the algorithm above and verifying: T0 = X0 such that X0 P0 , Xj Pj (to mean that Xj has Pj as law) and Tj+1 = ψ(Tj , Xj ). If we have that the following hypothesis [H2()] is verified: for some ≥ 0 and M ∈ lim
inf
k→+∞ 0≤j≤k−1
c Pj [Eα+,M ] = lim
inf
k→+∞ 0≤j≤k−1
c P[Xj ∈ Eα+,M ]=0,
(2.3)
then, lim P[Tj ∈ Eα+,M ] = 1 .
k→+∞
(2.4)
If, for all > 0 [H2()] is true, (f (Tj ))j≥0 converges almost surely to a random variable Y∗ such that:
A Conditional Gaussian Martingale Algorithm for Global Optimization
P[Y∗ ≤ α] = 1 .
843
(2.5)
Proof. Observe first that by hypothesis [H1] in formula 2.1 we have that if Tk ∈ Eα+,M or Xk ∈ Eα+,M then for all n ≥ 0, Tk+n ∈ Eα+,M . As a consequence, c c c {Tk ∈ Eα+,M } ⊆ {T1 , T2 , . . . , Tk−1 ∈ Eα+,M }∩{X1 , X2 , . . . , Xk−1 ∈ Eα+,M }.
So, for all j ∈ {0, 1, . . . , k − 1}: ⎡ c P[Tk ∈ Eα+,M ] ≤ P⎣
⎤ c c {Tl ∈ Eα+,M } ∩ {Xl ∈ Eα+,M }⎦ ≤
0≤l≤k−1 c c ≤ P[Xj ∈ Eα+,M ] = Pj [Eα+,M ], c c which implies P[Tk ∈ Eα+,M ] ≤ inf 0≤j≤k−1 Pj [Eα+,M ]. We may now conclude that: c 1 ≥ P[Tk ∈ Eα+,M ] = 1 − P[Tk ∈ Eα+,M ]≥1−
inf
0≤j≤k−1
c Pj [Eα+,M ],
which, as a consequence of formula 2.3 implies the conclusion in formula 2.4. Define now the filtration G = (Gj )j≥0 by Gj = σ(T0 , T1 , . . . , Tj ). It is then clear, from hypothesis [H1] in formula 2.1, that: E[f (Tj+1 ) | Gj ] = E[f (ψ(Tj , Xj ) | Gj ] ≤ E[f (Tj ) | Gj ] = f (Tj ) , thus showing that (f (Tj )j≥0 is a supermartingale bounded from below which we know to be almost surely convergent to some random variable which we will denote by Y∗ . This conclusion together with formula 2.4 already proved shows that formula 2.5 yields, as a consequence of lemma 1. Remark 1. Hypothesis [H2] given by formula 2.3 mean that the more the algorithm progresses in its steps, the more mass of the distributions Pj should be concentrated in the set Eα+,M where the interesting points are. Our formulation of hypothesis [H2] differs from the one presented in [8] which reads: ∀A ∈ B(
n
) λ(A) = 0 ⇒
+∞
(1 − Pj [A]) = 0 .
(2.6)
j=0
+∞ Formula 2.3 implies that, for > 0, we have j=0 (1 − Pj [Eα+,M ]) = 0 and so, our hypothesis is stronger then the one in [8]. The hypothesis given by formula 2.6 is more appealing as it does not use a condition on the set Eα+,M which, in general, is not explicitly known and, in almost every case, will be difficult to use computationally. On the other hand, hypothesis given by formula 2.6 does not allow the conclusion of the Convergence Theorem (Global Search) in [8, p. 20] to hold in full generality. The theorem is true, with the proof presented there, if the sequence (Xj )j≥0 is a sequence of independent random variables. The authors do not mention this caveat and the phrase . . . Nearly all random search methods
844
M.L. Esqu´ıvel
are adaptive by which we mean that μk 1 depends on the quantities . . . generated by the preceding iterations . . . may induce the reader in the opposite belief. In fact, the inequality on the right in the third line in the proof of the theorem (see [8, p. 21]) is, in the general case of dependent draws of the (Xj )j≥0 , the reversed one as a consequence of the elementary fact that if A ⊂ B and 0 < P[B] < 1 then: P[A ∩ B] = P[A] = P[B] · P[A]/P[B] ≥ P[A] · P[B]. If f is continuous over K, a compact subset of n , then f attains its minimum and it is verified that α = minx∈K f (x). The conceptual algorithm above furnishes a way of determining this minimum. This is a simple consequence of the following result, for which the proof is easily seen. Corolary 1. Under the same hypothesis, if in addition for all x ∈ K we have f (x) ≥ α, then: P[Y∗ = α] = 1. For the reader’s commodity, we state and prove the lemma used above. Lemma 1. Let (Zj )j≥0 be a sequence of random variables such that almost surely we have limj→+∞ Zj = Z and for some δ > 0 we have limj→+∞ P[Zj < δ] = 1. Then, P[Z ≤ δ] = 1. Proof. It is a consequence of a simple observation following Fatou’s lemma. P[Z > δ] = P[(lim inf Zj ) > δ] = P[lim inf {Zj > δ}] ≤ lim inf P[{Zj > δ}] ≤ j→+∞
j→+∞
j→+∞
≤ lim sup P[{Zj ≥ δ}] = lim P[{Zj ≥ δ}] = 0 . j→+∞
3
j→+∞
The Conditional Gaussian Martingale (CGM) Algorithm
The algorithm presented here may be included, on a first approximation, in the class of random search methods as this class consists of algorithms which generate a sequence of points in the feasible region following some prescribed probability distribution or sequence of probability distributions, according to [3, p. 835]. The main idea of the method studied here is to change, at each new step, the location and dispersion parameters of the probability Gaussian distribution in order to concentrate the points, from which the new intermediate minimum will be selected, in the region where there is a greater chance of finding a global minimum not precluding, however, a new intermediate minimum to be found outside this region. We use a sequence of Gaussian distributions taking at each step the mean equal to the intermediate minimum found in the preceding step and the standard deviation diagonal elements of the covariance matrix equal to half the ones taken in the preceding step. We now briefly describe the algorithm and after we will present the almost sure convergence result. The goal is to find a global minimum of a real function f defined over a compact set K ⊂ n having 1
In our notation, the Pj .
A Conditional Gaussian Martingale Algorithm for Global Optimization
845
diameter c. From now on, U(K) will denote the uniform distribution over K and N (m, σ) denotes the Gaussian distribution with mean m and covariance matrix with equal diagonal elements σ. With the presentation protocol of [5] the algorithm is as follows. S. 0 Set j = 0; S. 1 Generate x01 , x02 , . . . , x0N from the uniform distribution over the domain K. S. 2 Choose t0 = x0i0 such that f (x0i0 ) is equal to min{f (x0i ) : 1 ≤ i ≤ N }. Increment j. S. 3 Generate xj1 , xj2 , . . . , xjN from the normal distribution N (tj−1 , c/2j ) having mean tj−1 and diagonal covariance matrix elements c/2j , c being the diameter of K. S. 4 Choose tj = ti−1 if f (ti−1 ) is strictly inferior to min{f (xji ) : 1 ≤ i ≤ N } and choose tj = xji0 if f (xji0 ) is less or equal to min{f (xji ) : 1 ≤ i ≤ N } ≤ f (ti−1 ). S. 5 Perform a stopping test and then: either stop, or increment j and return to step 3 (S. 3). Observe that steps 1 and 2 are useful in order to choose a starting point for the algorithm in K. The repetition of steps 3, 4 and 5, provide a sort of clustering of the random test points around the intermediate minimizer found at the preceding step while continuing to search the whole space. This algorithm’s core is easily implemented in a programming language allowing symbolic computation (all implementations presented in this text are Jota 400; Uba ; Ruba ; Forj 1, j Jota, j , Tuba 0; Luba ; minimo Module ptminX xm, ptminY ym, ptmaxX xM, ptmaxY yM, cont 850, cont1 5000, alea1, alea, T1, M1, Tes, eMes, NunOr, alea1 TableRandomReal, ptminX, ptmaxX, RandomReal, ptminY, ptmaxY, i, 1, cont1; T1 Tablealea1i, falea1i1, alea1i2, i, 1, cont1; NunOr Tablei, i, 1, cont1; M1 SelectNunOr, Applyf, ColumnT1, 1# MinColumnT1, 2 &; M1 MinM1; eMes FlattenColumnT1, 1M1; PrinteMes; Fori 1, Andi 52, AbsApplyf, x0, y0 Applyf, eMes 10 ^ 10, Normx0, y0 eMes 10 ^ 10, i , alea TableRandomMultinormalDistributioneMes, ptmaxX ptminX 2 ^ i 2, 0, 0, ptmaxY ptminY 2 ^ i 2, j, 1, cont; Tes Tablealeak, faleak1, aleak2, k, 1, cont; M2 SelectTablei, i, 1, cont, Applyf, ColumnTes, 1# MinColumnTes, 2 &; M2 MinM2; eMes IfApplyf, FlattenColumnTes, 1M2 Applyf, eMes, FlattenColumnTes, 1M2, eMes; Tuba MaxAppendTuba, i; Luba AbsApplyf, eMes fx0, y0 Absfx0, y0, NormeMes; ; Uba AppendUba, Tuba1; Ruba AppendRuba, Tuba1, Luba;
Fig. 3.1. The Mathematica implementation of the CGM algorithm
846
M.L. Esqu´ıvel
fully downloadable from the author’s web page). For general purposes, we may extend f to the whole space by defining f (x) = A (A large enough) for x ∈ / K. The algorithm introduced converges to a global minimizer under the hypothesis of continuity of the function f defined on a compact set. For notational purposes given three random variables X, Y , Z and (a, b) ∈ n × n×n we will write X ∈ N (Y, Z) to mean that conditionally on Y = a, Z = b, X ∈ N (a, b), that is, X has Gaussian distribution with mean a and covariance matrix b. Theorem 2. Let f : K −→ be a real valued continuous function defined over K, a compact set in n , and let z be an unique global minimizer of f in K, that is: f (z) = minx∈K f (x) and for all x ∈ K we have f (z) < f (x). For each N ∈ \ {0} fixed, define almost surely and recursively the sequence (TjN )j∈Æ by:
T0N
0 0 0 0 0 = Xi0 : f (Xi0 ) = min f (Xi ) X1 , . . . , XN ∈ U(K) i.i.d. . 1≤i≤N
Next, for all j ≥ 1 ⎧ c ⎪ TjN if f (TjN ) < min f (Xij+1 ) : Xij+1 ∈ N (TjN , j+1 ) i.i.d. ⎪ ⎪ 1≤i≤N 2 ⎪ ⎨ N c j+1 j+1 j+1 j+1 N Tj+1 := X if f (Xi0 ) = min f (Xi ) : Xi ∈ N (Tj , j+1 ) i.i.d. i0 ⎪ 1≤i≤N ⎪ 2 ⎪ ⎪ ⎩ N ≤ f (Tj ) . Then, for all N ≥ 1 fixed, the sequence (TjN )j≥0 is a uniformly integrable martingale which converges almost surely to a random variable T N and the sequence (T N )N ≥1 converges almost surely to z, the unique global minimizer of f . Proof. For all j ≥ 1 define the σ algebras GjN = σ(T0N , . . . TjN ) and the sets: AN j+1 =
f (TjN ) < min f (Xij+1 ) : Xij+1 ∈ N (TjN , 1≤i≤N
) i.i.d. ⊂Ω. j+1 c
2
N N As a first fact, we have obviously that AN j+1 ∈ Gj . Let us remark that (Tj )j≥0 N is a a martingale with respect to the filtration (Gj )j≥0 . This is a consequence of: N [Tj+1 | GjN ] = [TjN 1IA
N j+1
N + Xij+1 1I(AN c | Gj ] = 0 j+1 )
= TjN 1IAN + 1I(AN [Xij+1 | GjN ] = TjN , c 0 j+1 j+1 ) as we have, by the definitions,
[Xij+1 | GjN ] = [Xij+1 | T0N , . . . , TjN ] = [Xij+1 | TjN ] = TjN . 0
0
0
As a third fact, we notice that as TjN ∈ K, which is a compact set, we then have for some constant M > 0 that TjN 1 ≤ M . As a consequence of these
A Conditional Gaussian Martingale Algorithm for Global Optimization
847
three facts (TjN )j≥0 is a uniformly integrable martingale which converges almost surely to an integrable random variable T N . Observe now that, by construction, N f (Tj+1 ) ≤ f (TjN ) almost surely and so we have that (f (TjN ))j≥0 decreases to f (T N ). Let us remark that for all i, j we have f (T N ) ≤ f (Xij ). In fact, by definition: f (TjN0 ) in AN j0 +1 j0 +1 min f (Xi )≥ = c 1≤i≤N f (Xij00 +1 ) in (AN j0 +1 ) N = f (TjN0 +1 )1IAN + f (TjN0 +1 )1I(AN c = f (Tj +1 ) , 0 j +1 j +1 ) 0
0
so, if for some i0 , j0 we had f (T N ) > f (Xij00 +1 ) = min f (Xij0 +1 ) ≥ f (TjN0 +1 ), 1≤i≤N
we would also have f (T N ) > f (TjN0 +1 ), which contradicts the properties of (f (TjN ))j≥0 . We will now show that the sequence (f (T N ))n≥1 converges to f (z) in probability. For that purpose, we recall the definition and some simple properties of Et := {x ∈ K : f (x) < t} the set of points of K having a level, given by f , less then t. First, the monotony: t < s ⇒ Et ⊆ Es ; secondly, Es is open: x0 ∈ Et ⇒ ∃δ > 0 BÊn (x0 , δ) ⊆ Et ; finally, Et = ∅ ⇒ z ∈ Et . Observe now that for all ω ∈ Ω and all η > 0: f (T N (ω)) < f (z) − η which is impossible; N | f (T (ω)) − f (z) |> η ⇔ f (T N (ω)) > f (z) + η ⇒ T N (ω) ∈ / Ef (z)+η . As a consequence, for all i, j we have Xij (ω) ∈ / Ef (z)+η , as otherwise we would j N have f (Xi (ω)) < f (z) + η < f (T (ω)) ≤ f (Xij (ω)), which is impossible. As a result, we finally have:
N +∞ | f (T N ) − f (z) |> η ⊆ Xij ∈ / Ef (z)+η , j=0 i=0
which implies that
inf
0≤j<+∞
| f (T N ) − f (z) |> η is bounded above, for instance, by:
N j Xi ∈ / Ef (z)+η ≤
Xi0 ∈/ Ef (z)+η N
.
Now, Xi0 being uniformly distributed over K we have, with η small enough:
Xi0 ∈/ Ef (z)+η = λ(Efc (z)+η )/λ(K) < 1. So, we have as wanted, for all η > 0 small enough: limn→+∞ | f (T N ) − f (z) |> η = 0.
We now observe that the above convergence is, in fact, almost sure convergence, that is, (f (T N ))N ≥1 converges to f (z) almost surely. This is a consequence of the well known fact that a non increasing sequence of random variables converging in probability, converges almost surely and the facts proved above that show:
848
M.L. Esqu´ıvel
f (TjN +1 ) ≤ f (TjN ) ↓j→+∞ f (T
N +1
↓j→+∞
) ≤ f (T N ) .
Consider Ω ⊂ Ω such that [Ω ] = 1 such that the above convergence takes place over Ω and observe that for every ω ∈ Ω the sequence (T N (ω))N ≥1 is a sequence of points in the compact set K. As a consequence, every convergent subsequence (T Nk (ω))N ≥1 of (T N (ω))N ≥1 converges to z. In fact, if limk→+∞ T Nk (ω) = y ∈ K then limk→+∞ f (T Nk (ω)) = f (y) and by the result we just proved limk→+∞ f (T Nk (ω)) = f (z). This implies that f (y) = f (z) and as z is a unique minimizer we have that y = z. We now conclude the proof of the theorem by noticing that (T N (ω))N ≥1 converges to z because if otherwise we would have: ∃ > 0 ∀N ∃Nm > N | T Nm (ω) − z |> . As (T Nm (ω))m≥1 is a sequence of points in K which is a compact set, by Bolzano Weierstrass theorem it has a convergent subsequence. This subsequence must converge to z which can not occur by the definition of (T Nm (ω))m≥1 .
4
Computational Results
CGM algorithm was compared with other algorithms. With Styblinski-Tang function, we compared algorithms A (simple random search), B (localized random search), C (enhanced random search), from [9, p. 38–48], and ARS (accelerated random search) from [1]. The following notations are used. N is the number of random points drawn at each repetition; M will denote the number of repetitions in the simulation; J will be number of steps in the repetition; J represents the sample mean of the number of steps J necessary to achieve a prescribed result, taken over the whole set of repetitions of the simulation; SD(J) is the sample standard deviation of J. The stopping criterion for the number of steps j Jota 400; Uva ; Ce 2 ^ 0.5; Rol 10 ^ 4; Forj 1, j Jota, j , XisEne RandomReal, xm, xM, RandomReal, xm, xM; ErEne 1; ind 0; For i 1, Andi 25000, AbsApplyf, x0, y0 Applyf, XisEne 10 ^ 10, Normx0, y0 XisEne 10 ^ 10, i , YupEne RandomReal, XisEne1 ErEne xM xm 2, XisEne2 ErEne xM xm 2, RandomReal, XisEne1 ErEne xM xm 2, XisEne1 ErEne xM xm 2; IfApplyf, YupEne Applyf, XisEne, AndXisEne YupEne, ErEne 1, ErEne ErEne Ce; IfErEne Rol, ErEne 1,; ind i; Vaca ind, AbsApplyf, XisEne fx0, y0 Absfx0, y0, NormXisEne; ; Uva AppendUva, Vaca;
Fig. 4.1. The Mathematica implementation of the ARS algorithm used
A Conditional Gaussian Martingale Algorithm for Global Optimization
849
in the simulation will be: j ≤ J0 ∧ | f (zn ) − f (z) |> 10−10 ∧ | zn − z |> 10−10 , J0 being the number of steps we decide to impose as an upper bound and zn being the estimated minimizer at step n of a repetition. A true minimizer of the function is z. The function evaluation accuracy criterion after j steps is: (f (zj ) − f (z))/f (z) if f (z) = 0 Δf (zj ) = f (zj ) if f (z) = 0 . The function argument evaluation accuracy criterion after j steps is: (zj − z)/z if z = 0 Δzj = zj if z = 0 . The averages or sample means over M repetitions with j(k) steps at repetiM M tion k are: Δz = (1/M ) k=1 Δzj(k) , z = (1/M ) k=1 zj(k) , Δf (z) = M (1/M ) M k=1 Δf (zj(k) ) and f ((z) = (1/M ) k=1 f (zj(k) ). Table 1. Styblinski-Tang function; Repetitions: 400; ARS number of evaluations: 25000 Algorithm A B C ARS (381) CGM
f (z) SD(f (z)) -78.2732 2.8 × 10−3 -78.3201 6.1581 × 10−4 -78.3201 5.9301 × 10−4 -78.3323 1.35255 × 10−8 -78.3323 2.72116 × 10−11
E SD(E) 25000 – 25000 – 25000 – 24778 1818 15783 970
Table 2. CGM algorithm; random draws: 500; maximum number of steps: 50 Function Gaussian 1 (371) Gaussian 2 (97) Griewank Himmelblau Jennrich-Sampson Rastrigin (399) Rosenbrock Freudenstein-Roth Styblinski-Tang
J 33.372 37.0103 30.475 32.1675 49.7375 32.3885 30.5325 33.9725 31.5675
SD(J) Δf (z) 1.91627 9.34019 − ×10−12 1.70474 7.38679 × 10−12 1.873 4.65529 × 10−11 1.91012 4.21595 × 10−11 2.13541 4.90973 × 10−11 1.75567 2.29124 × 10−11 1.90868 4.30236 × 10−11 1.83033 4.42997 × 10−11 1.93903 5.13851 × 10−13
Δz 8.96686 × 10−7 2.49709 × 10−7 9.63914 × 10−6 1.09551 8.77548 × 10−7 4.93828 × 10−7 6.17981 × 10−6 5.23501 × 10−7 3.45016 × 10−7
Table 3. ARS algorithm; Maximum number of function evaluations: 25000 Function Gaussian 1 Gaussian 2 (356) Griewank Himmelblau Jennrich-Sampson Rastrigin Rosenbrock (398) Freudenstein-Roth Styblinski-Tang (382)
N 24984.8 25000 25000 25000 24900.3 23991.4 24518.6 25000 24777.9
SD(N ) 303.95 0 0 0 1426.31 3954.14 2543.41 0 1862.48
Δf (z) 5.34489 × 10−9 7.45836 × 10−8 4.75247 × 10−8 5.03447 × 10−1 1.00239 × 10−9 7.65517 × 10−10 2.89627 × 10−9 4.80997 × 10−2 7.08203 × 10−11
Δz 2.03585 × 10−5 2.27671 × 10−5 2.90102 × 10−4 6.76862 × 10−1 1.24142 × 10−5 2.67892 × 10−6 4.59241 × 10−5 1.63671 × 10−2 3.6433 × 10−6
850
M.L. Esqu´ıvel
ARS and the CGM algorithms were compared for nine different test functions from [1] and [4]. The numerical results presented in the tables show that CGM outperforms all other algorithms tried, in precision and with less function evaluations. For one of the test functions (Gaussian 2) a further test, run with 2700 random draws at each step of the repetitions (instead of 500 used for the tables study) gave a result with the prescribed precision for all repetitions, in accordance with theorem 2. In the tables, the number between parenthesis in a given line report correct locations of the global minimum among the repetitions performed.
5
Conclusion
To achieve its goal, any random search algorithm has simultaneously to detect the region where the global minimum is located and to achieve enough precision in the calculation of the minimizer. Practically, and to the limits of machine and software precision used this is obtained, respectively, by an increasing number of random trials and, by concentrating these trials in the favorable region. CGM algorithm, hereby introduced and studied, always attained better precision than ARS; we got also perfect global minimum location for all the test functions tried, provided (in three cases) the number of random draws was sufficiently augmented. The CGM convergence result was proved under mild but fully verifiable hypothesis in sharp contrast with our formulation of a Solis and Wets’s type theorem for adaptive random search with an hypothesis of difficult if not impossible verification even, in very simple cases.
References 1. Appel, M. J., Labarre, R., Radulovi´c, D.: On accelerated random search. SIAM J. Optim., 14, (2003) no. 3, 708–731. 2. Guimier, A.: Mod´elisation d’algorithmes d’optimisation ` a strat´egie al´eatoire. Calcolo. 23 (1986) no. 1, 21–34. 3. Horst, R., Pardalos, P. M. (ed.): Handbook of Global Optimization. Kluwer Academic Publishers 1995. 4. Mor´e, J. J., Garbow, B. S., Hillstrom, K. E.:Testing Unconstrained Optimization Software, ACM Transactions on Mathematical Software 7, No.1, (1981) 17–41. 5. Pint´er, J. D.: Global optimization in action. Continuous and Lipschitz optimization: algorithms, implementations and applications., Nonconvex Optimization and Its Applications. 6. Dordrecht: Kluwer Academic Publishers. xxvii 1996. 6. Raphael, B., Smith, I.F.C.: A direct stochastic algorithm for global search. Appl. Math. Comput. 146, No.2-3, (2003) 729–758. 7. Shi, D., Peng, J.: A new theoretical framework for analyzing stochastic global optimization algorithms. J. Shanghai Univ. 3, No.3, (1999) 175-180. 8. Solis, F. J., Wets, R. J.-B.: Minimization by random search techniques. Math. Oper. Res., 6, (1981) no. 1, 19–30. 9. Spall, J. C.: Introduction to stochastic search and optimization. Estimation, simulation, and control. Wiley-Interscience Series in Discrete Mathematics and Optimization. Hoboken, NJ: Wiley. xx, 2003.
A Conditional Gaussian Martingale Algorithm for Global Optimization
851
10. Stephens, C.P., Baritompa, W.: Global optimization requires global information. J. Optimization Theory Appl. 96, (1998) No.3, 575–588. 11. Yin, G.: Rates of convergence for a class of global stochastic optimization algorithms. SIAM J. Optim. 10, No.10, (1999) 99–120 . 12. Zabinsky, Z. B.: Stochastic adaptive search for global optimization. Nonconvex Optimization and Its Applications 72. Boston, MA: Kluwer Academic Publishers. xviii, 2003.
Finding the Number of Clusters Minimizing Energy Consumption of Wireless Sensor Networks Hyunsoo Kim and Hee Yong Youn School of Information and Communications Engineering, Sungkyunkwan University, 440-746, Suwon, Korea [email protected], [email protected]
Abstract. Wireless sensor network consisting of a large number of small sensors with low-power can be an effective tool for the collection and integration of data in a variety of environments. Here data exchange between the sensors and base station need to be designed to conserve the limited energy resources of the sensors. Grouping the sensors into clusters has been recognized as an efficient approach for saving the energy of the sensor nodes. In this paper we propose an analytical model based on homogenous spatial Poisson process, which allows the number of clusters in a sensor network minimizing the total energy spent for data exchange. Computer simulation on various size sensor networks with the LEACH algorithm reveals that the proposed model is very accurate. We also compare the proposed model with an earlier one, and it turns out that the proposed model is more accurate. Keywords: Energy consumption, LEACH, number of clusters, Poisson process, wireless sensor network.
1
Introduction
Recent developments in wireless sensor network have motivated the growth of extremely small, low-cost sensors that possess the capability of sensing, signal processing, and wireless communication. The wireless sensor network can be expanded at a cost much lower than conventional wired sensor network. Here each sensor is capable of detecting the condition around it such as temperature, sound, and the presence of other objects. Recently, the design of sensor networks has gained increasing importance due to their potential for civil and military applications such as combat field surveillance, security, and disaster management. The smart dust project at the University of California, Berkeley [7, 8, 12] and WINS project at UCLA [9] are attempting to build extremely small sensors of low-cost allowing autonomous sensing and communication in a cubic millimeter
This research was supported in part by the Ubiquitous Autonomic Computing and Network Project, 21st Century Frontier R&D Program in Korea and the Brain Korea 21 Project in 2005. Corresponding author: Hee Yong Youn.
M. Gavrilova et al. (Eds.): ICCSA 2006, LNCS 3982, pp. 852–861, 2006. c Springer-Verlag Berlin Heidelberg 2006
Finding the Number of Clusters Minimizing Energy Consumption
853
range. The system processes the data gathered from the sensors to monitor the events in an area of interest. Prolonged network lifetime, scalability, and load balancing are important requirements for many wireless sensor network applications. A longer lifetime of sensor networks can be achieved through optimized applications, operating systems, and communication protocols. If the distance between the nodes is small, energy consumption is also small. With the direct communication protocol in a wireless sensor network, each sensor sends its data directly to the base station. When the base station is far away from the sensors, the direct communication protocol will cause each sensor to spend a large amount of power for data transmission [5]. This will quickly drain the battery of the sensors and reduce the network lifetime. With the minimum energy routing protocol, the sensors route data ultimately to the base station through intermediate sensors. Here only the energy of the transmitter is considered while energy consumption of the receivers is neglected in determining the route. The low-energy adaptive clustering hierarchy (LEACH) scheme [9] includes the use of energy-conserving hardware. Here sensors in a cluster detect events and then transmit the collected information to the cluster head. Power consumption required for transmitting data inside a cluster is lower than with other routing protocols such as minimum transmission energy routing protocol and direct communication protocol due to smaller distances to the cluster head than to the base station. There exist various approaches for the evaluation of the routing protocols employed in wireless sensor networks. In this paper our main interest is energy consumption of the sensors in the network. Gupta and Kumar [2,3] have analyzed the capacity of wireless ad-hoc networks and derived the critical power with which a node in a wireless ad-hoc network communicates to form a connected network with probability one. A sensor in a wireless sensor network can directly communicate with only the sensors within its radio range. To enable communication between the sensors not within each other’s communication range, the sensors need to form a cluster with the neighboring sensors. An essential task in sensor clustering is to select a set of cluster heads among the sensors in the network, and cluster the rest of the sensors with the cluster heads. The cluster heads are responsible for coordination among the sensors within their clusters, and communication with the base station. [4] have suggested an approach for cluster formation which ensures the expected number of clusters in LEACH. A model deriving the number of clusters with which the energy required for communication is minimized was developed in [10]. Energy consumption required for communication depends on the distance between the transmitter and receiver. Here, the expected distance between the cluster head and non-cluster head was computed. The distance from a sensor to the base station was not computed but fixed. [11] proposed a method for finding the probability for a sensor to become a cluster head. They directly computed the expected distance from a sensor to the base station with uniform distribution in a bounded region. They assumed that the distance between a cluster head and non-cluster heads in a cluster de-
854
H. Kim and H.Y. Youn
pends only on the density of the sensors distributed with a homogeneous Poisson process in a bounded region. More details are given in Section 2.3. In this paper we develop a model finding the number of cluster heads which allows minimum energy consumption of the entire network using homogeneous spatial Poisson process. [11] modeled the expected distance between a cluster head and other sensors in deciding the number of cluster heads. Here, we additionally include the radio range of the cluster head and the distribution of energy dissipation of the sensors for more accurate prediction of the number. With the proposed model the network can determine, a priori, the number of cluster heads in a region of distributed sensors. The validity of the proposed model is verified by computer simulation, where the clustering is implemented using the LEACH algorithm. It reveals that the number of clusters obtained by the developed model is very close to that of the simulation with which the energy consumption of the network is minimum. We also compare the proposed model with [11], and it consistently produces more accurate value than [11]. The rest of the paper is presented as follows. In Section 2 we review the concept of LEACH, sensor clustering, and previous approaches for modeling the number of clusters. Section 3 presents the proposed model finding the number of clusters which minimizes energy consumption. Section 4 demonstrates the effectiveness of the proposed model by computer simulation and comparison with the earlier one. We provide a conclusion in Section 5.
2 2.1
Preliminaries The LEACH Protocol
LEACH is a self organizing, adaptive clustering protocol that employs a randomization approach to evenly distribute energy consumption among the sensors in the network. In the LEACH protocol, the sensors organize themselves into clusters with one node acting as the local base station or cluster head. If the cluster heads are fixed throughout the network lifetime as in the conventional clustering algorithms, the selected cluster heads would die quickly, ending the useful lifetime of all other nodes belonging to the clusters. In order to circumvent the problem, the LEACH protocol employs randomized rotation of the cluster head such that all sensors have equal probability to be a cluster head in order not to spent the energy of only specific sensors. In addition, it carries out local data fusion to compress the data sent from the cluster heads to the base station. This can reduce energy consumption and increase sensor lifetime. Clusters can be formed based on various properties such as communication range, number and type of sensors, and geographical location. 2.2
Sensor Clustering
Assume that each sensor in the wireless sensor network becomes a cluster head with a probability, p. Each sensor advertises itself as a cluster head to the sensors within its radio range. We call the cluster heads the voluntary cluster heads. For
Finding the Number of Clusters Minimizing Energy Consumption
855
the sensors not within the radio range of the cluster head, the advertisement is forwarded. A sensor receiving an advertisement which does not declare itself as a cluster head joins the cluster of the closest cluster head. A sensor that is neither a cluster head nor has joined any cluster becomes a cluster head; we call the cluster heads the forced cluster heads. Since forwarding of advertisement is limited by the radio range, some sensors may not receive any advertisement within some time duration, t, where t is the time required for data to reach the cluster head from a sensor within the radio range. It then can infer that it is not within radio range of any voluntary cluster head, and hence it becomes a forced cluster head. Moreover, since all the sensors within a cluster are within radio range of the cluster head, the cluster head can transmit the collected information to the base station after every t units of time. This limit on radio range allows the cluster heads to schedule their transmissions. Note that this is a distributed algorithm and does not require clock synchronization between the sensors. The total energy consumption required for the information gathered by the sensors to reach the base station will depend on p and radio range, r. In clustering the sensors, we need to find the value of p that would ensure minimum energy consumption. The basic idea of the derivation of the optimal value of p is to define a function of energy consumption required to transmit data to the base station, and then find the p value minimizing it. The model needs the following assumptions. • The sensors in the wireless sensor network are distributed as per a homogeneous spatial Poisson process of intensity λ in 2-dimensional space. • All sensors transmit data with the same power level, and hence have the same radio range, r. • Data exchanged between two sensors not within each others’ radio range is forwarded by other sensors. • Each sensor uses 1 unit of energy to transmit or receive 1 unit of data. • A routing infrastructure is in place; hence, when a sensor transmits data to another sensor, only the sensors on the routing path forward the data. • The communication environment is contention and error free; hence, sensors do not have to retransmit any data. 2.3
Previous Modeling
Energy consumption required for communication depends on the distance between the transmitter and receiver, i.e., the distance between the sensors. In [11], first, the expected distance from a sensor to the base station is computed. Then the expected distance from the non-cluster head to the cluster head in a Voronoi cell (a cluster) is obtained. Assume that a sensor becomes a cluster head with probability p. Here the cluster-heads and the non-cluster heads are distributed as independent homogeneous spatial Poisson processes of intensity λ1 = pλ and λ0 = (1 − p)λ, respectively. Each non-cluster head joins the cluster of the closest cluster-head to form a Voronoi tessellation [6]. The plane is divided into zones called the Voronoi cells, where each cell corresponds to a Poisson process with
856
H. Kim and H.Y. Youn
intensity λ1 , called nucleus of Voronoi cells. Let Nv be the random variable denoting the number of non-cluster heads in each Voronoi cell, and N be the total number of sensors in a bounded region. If Lv is the total length of all segments connecting the non-cluster heads to the cluster head in a Voronoi cell, then E[Nv |N = n] = λ0 /λ1 ,
3/2
E[Lv |N = n] = λ0 /2λ1 .
(1)
Note that these expectations depend only on the intensities λ0 and λ1 of a Voronoi cell. Using the expectations of the lengths above, an optimal value p minimizing the total energy spent by the network is found.
3 3.1
The Proposed Scheme Motivation
For a wireless sensor network of a large number of energy constrained sensors, it is important to fastly group the sensors into clusters to minimize the energy used for communication. Note that energy consumption is directly proportional to the distance between the cluster heads and non-cluster heads and the distance between the cluster head and base station. We thus focus on the functions of the distances for minimizing the energy spent in the wireless sensor network. Note that the earlier model of Equation (1) depends only on the intensity of the sensor distribution. In this paper, however, the expected distance between the sensors and cluster head is modeled by including p, and the number of sensors, N , and the size of the region for obtaining more accurate estimation on the number of clusters. The number is decided by the probability, p, which minimizes the energy consumed to exchange the data. We next present the proposed model. 3.2
The Proposed Model
In the sensor network the expected distance between the cluster heads to the base station and the expected distance between the sensors to the cluster head in a cluster depend on the number of sensors, the number of clusters, and the size of the region. The expected distance between a sensor and its cluster head decreases while that between a cluster head and the base station increases as the number of clusters increases in a bounded region. An opposite phenomenon is observed when the number of clusters decreases. Therefore, an optimal value of p in terms of energy efficiency needs to be decided by properly taking account the tradeoff between the communication overhead of sensor-to-cluster head and cluster head-to-base station. Figure 1 illustrates this aspect where Figure 1(a) and (b) has 14 clusters and 4 clusters, respectively. Notice that the clusters in Figure 1(a) have relatively sparse links inside the clusters compared to those in Figure 1(b), while there exist more links to the base station. Let S denote a bounded region of a plane and X(S) does the number of sensors contained in S. Then X(S) is a homogeneous spatial Poisson process if it distributes the Poisson postulates, yielding a probability distribution
Finding the Number of Clusters Minimizing Energy Consumption
857
BS
BS
(a) Large number
(b) Small number
Fig. 1. Comparison of the structures having a large and small number of clusters
P {X(S) = n} =
[λA(S)]n e−λA(S) , n!
f or A(S) ≥ 0, n = 0, 1, 2, · · ·
Here λ is a positive constant called the intensity parameter of the process and A(S) represents the area of region S. If region S is a square of side length, M , then the number of sensors in it follows a Poisson distribution with a mean of λA(S), where A(S) is M 2 . Assume that there exist N sensors in the region for a particular realization of the process. If the probability of becoming a cluster head is, p, then N p sensors will become cluster heads on average. Let DB (x, y) be a random variable denoting the distance between a sensor located at (x, y) and the base station. Let PS be the probability of existence of sensors uniformly distributed in region S. Without loss of generality, we assume that the base station is located at the center of the square region (i.e. the origin coordinate). Then, the expected distance from the base station to the sensors is given by E[DB (x, y)|X(S) = N ] = DB (x, y) · PS dS
S M/2
M/2
= −M/2
−M/2
= 0.3825M
x2 + y 2
1 dx dy M2 (2)
Since there exist N p cluster heads on average and the location of a cluster head is independent of those of other cluster heads, the total length of the segments from all the cluster heads to the base station is 0.3825N pM . Since a sensor becomes a cluster head with a probability p, we expect that the cluster head and other sensors are distributed in a cluster as an independent homogeneous spatial Poisson process. Each sensor joins the cluster of the closest cluster head to form a cluster. Let X(C) be the random variable denoting the number of sensors except the cluster head in a cluster. Here, C is the area of a cluster. Let DC be the distance between a sensor and the cluster head in a
858
H. Kim and H.Y. Youn
cluster. Then, according to the results of [1], the expected number of non-cluster heads in a cluster and the expected distance from a sensor to the cluster head (assumed to be at the center of mass of the cluster) in a cluster are given by E[X(C)|X(S) = N ] =
E[DC |X(S) = N ] =
1 − 1, p
(3)
x2 + y 2 k(x, y) dA(C)
C √ 2π M/ N pπ
Np 2M drdθ = √ , (4) 2 M 3 N pπ 0 0 √ respectively. Here, region C is a circle with radius M/ N pπ. The sensor density of the cluster, k(x, y), is uniform, and it is approximately M 2 /N p. Let EC be the expected total energy used by the sensors in a cluster to transmit one unit of data to their respective cluster head. Since there are N p clusters, the expected value of EC conditioned on X(S) = N is given by, =
r2
E[DC |X(S) = N ] E[EC |X(S) = N ] = N (1 − p) · r 1 2M 1 − p =N2 · √ √ . 3r π p
(5)
If the total energy spent by the cluster heads to transmit the aggregated information to the base station is denoted by EB , then E[DB |X(S) = N ] E[EB |X(S) = N ] = N p · r 0.3825N pM = . r
(6)
Let ET be the total energy consumption with the condition of X(S) = N in the network. Then E[ET |X(S) = N ] = E[EC |X(S) = N ] + E[EB |X(S) = N ].
(7)
Taking the expectation of Equation (7), the total energy consumption of the network is obtained. E[ET ] = E[E[ET |X(S) = N ]] 2M 1 − p 0.3825pM = E[X(S)1/2 ] · √ √ + E[X(S)] · , 3r π p r
(8)
where E[·] is expectation of a homogeneous Poisson process. E[ET ] will have a minimum value for a value of p, which is obtained by the first derivative of Equation (8); 2c2 p3/2 − c1 (p + 1) = 0, √ where c1 = 2M · E[(X(S))1/2 ]/3 π and c2 = 0.3825M · E[X(S)].
(9)
Finding the Number of Clusters Minimizing Energy Consumption
859
Equation (9) has three roots, two of which are imaginary. The second derivative of Equation (8) is positive and log concave for the only real root of Equation (9), and hence the real root minimizes the total energy consumption, E[ET ]. The only real root of Equation (9) is as follows. p=
4
0.0833c21 0.1050(c41 + 24c21 c22 ) + 2 6 2 4 2 2 2 6 8 1/3 c2 c2 (2c1 + 72c1 c2 + 432c21 c42 + 83.1384c 1 c1 c2 + 27c2 ) 0.0661 + 2 (2c61 + 72c41 c22 + 432c21 c42 + 83.1384c21 c21 c62 + 27c82 )1/3 . (10) c2
Performance Evaluation
In order to validate the proposed model, we simulate a network of sensors distributed as a homogeneous Poisson process with various spatial densities in a region. We employ the LEACH algorithm to generate a cluster hierarchy and find how much energy the network spends with the p value obtained using the developed model. For various intensities λ = (0.01, 0.03, 0.05) of sensors in a bounded region, we first compute the probability for a sensor to be a cluster head by Equation (9). Table 1 lists the probability obtained using the developed model and the corresponding energy consumption of Equation (8) for different intensities and radio range of a sensor, r. Here, M is set to 100, and thus there exist 100, 300, and 500 sensors if λ = 0.01, 0.03, and 0.05, respectively. Table 1. The p probability and corresponding energy consumption with the proposed model
λ 0.01 0.03 0.05
p 0.147 0.099 0.083
ET r=2 699.02 1500.39 2131.84
r=1 1398.03 3000.77 4263.68
r=3 466.01 1000.26 1421.23
Simulation(LEACH ) Proposed model
Energy Consumption
3.3
3.2
3.1
3
0.12
0.13
0.14
0.15
0.16
p
Fig. 2. The total energy consumptions from the proposed model and simulation
860
H. Kim and H.Y. Youn Table 2. The p probability and corresponding energy consumption with [11]
λ 0.01 0.03 0.05
p 0.181 0.121 0.101
r=1 1674.76 3615.65 5147.63
ET r=2 837.381 1807.83 2573.81
r=3 558.25 1205.32 1715.88
Table 3. Comparison of the proposed model with an earlier model [11]
Model The proposed model [11] Percentage of energy saving
100 0.057324 0.059178 3.133
No. of Nodes 300 0.1092644 0.197071 2.247
500 0.299371 0.310891 3.706
Note that the validity of the p value estimated by the proposed analytical model can be verified only through actual implementation of the clustering. Figure 2 shows the energy consumed by the entire network as p changes, obtained using the LEACH algorithm which is one of the most popular clustering algorithms for sensor network, when λ = 0.01 and r = 1. Recall that the p value predicted by the proposed model in this case is 0.147, while the optimal value of p obtained from the simulation turns out to be 0.145. The proposed model thus can be said to be very accurate for deciding the p value. We obtain similar results as this for different cases of λ and r values. We also compare the proposed model and that of [11]. In order to show the relative effectiveness of the proposed model, the probability and the corresponding energy consumption of [11] are obtained and listed in Table 2. Observe from the tables that the probabilities of the proposed model are smaller than [11]. Table 3 summarizes the results of comparison. Here the p values decided from the proposed model and [11] are applied to the LEACH algorithm for obtaining the energy consumed by the network. Notice that the proposed model consistently provides better p values than [11] regardless of the sensor density.
5
Conclusion
The sensors becoming the cluster heads spend relatively more energy than other sensors because they have to receive information from other sensors within their cluster, aggregate the information, and then transmit them to the base station. Hence, they run out of their energy faster than other sensors. To solve this problem, the re-clustering needs to be done periodically or the cluster heads trigger re-clustering when their energy level falls below a certain threshold. We have developed a model deciding the probability a sensor becomes a cluster head,
Finding the Number of Clusters Minimizing Energy Consumption
861
which minimizes the energy spent by the network for communication. Here the sensors are distributed in a bounded region with homogeneous spatial Poisson process. The validity of the proposed model was verified by computer simulation, where the clustering is implemented using the LEACH algorithm. It revealed that the number of clusters obtained by the developed model is very close to that of the simulation with which the energy consumption of the network is minimum. We also compared the proposed model with [11], and it consistently produces more accurate value than [11]. Here we assumed that the base station is located at the center of the bounded region. We will develop another model where the location of base station can be arbitrary. We will also expand the model for including other factors such as the shape of the bounded region.
References 1. S.G. Foss and S.A. Zuyev, “On a Voronoi Aggregative Process Related to a Bivariate Poisson Process ,” Advances in Applied Probability, Vol. 28, no. 4, 965-981, 1996. 2. P. Gupta and P.R. Kumar, “The capacity of wireless networks,” IEEE Transaction on Information Theory, Vol. IT-46, No. 2, 388-404, March, 2000. 3. P. Gupta and P.R. Kumar, “Critical power for asymptotic connectivity in wireless networks,” Stochastic Analysis, Control, Optimization and Applications: A Volume in Honor of W. H. Fleming, 547-566, 1998. 4. W.B. Heinzelman, A.P. Chandrakasan and H. Balakrishnan, “An ApplicationSpecific Protocol Architecture for Wireless Microsensor Networks”, IEEE Trans. on Wireless Communications, Vol. 1, No. 4, 660-670, 2002. 5. W.R. Heinzelman, A. Cahandrakasan and H. Balakrishnan, “Energy-efficient communication protocol for wireless sensor networks, in the Proceedings of the 33rd Hawaii International Conference on System Sciences, Hawaii, 2000. 6. T. Meng and R. Volkan, “Distributed Network Protocols for Wireless Communication”, In Proc. IEEEE ISCAS, 1998. 7. V. Hsu, J.M. Kahn, and K.S.J. Pister, “Wireless Communications for Smart Dust”, Electronics Reaserch Laboratory Technical Memorandum M98/2, Feb. 1998. 8. J.M. Kahn, R.H. Katz and K.S.J. Pister, “Next Century Challenges: Mobile Networking for Samrt Dust,” in the 5th Annual ACM/IEEE International Conference on Mobile Computing and Networking (MobiCom 99), 271-278, Aug. 1999. 9. G.J. Pottie and W.J. Kaiser, “ Wireless integrated network sensors,” Communications of the ACM, Vol 43, No. 5, 51-58, May, 2000. 10. T. Rappaport, “Wireless Communications: Principles & Practice”, Englewood Cliffs, NJ: Prentice-Hall, 1996. 11. S. Bandyopadhyay and E.J. Coyle, “An energy efficient hierarchical clustering alogorithm for wireless sensor networks,” IEEE INFOCOM 2003 - The Conference on Computer Communications, vol. 22, no. 1, 1713-1723, March 2003. 12. B. Warneke, M. Last, B. Liebowitz, Kristofer and S.J. Pister, “Smart Dust: Communication with a Cubic-Millimeter Computer,” Computer Magazine, Vol. 34, No 1, 44-51, Jan. 2001.
A Two-Echelon Deteriorating Production-Inventory Newsboy Model with Imperfect Production Process Hui-Ming Wee and Chun-Jen Chung Department of Industrial Engineering, Chung Yuan Christian University, Chungli 32023, Taiwan, ROC [email protected]
Abstract. This paper discusses a two-echelon distribution-free deteriorating production-inventory newsboy model. Distinct from previous models, this study considers a deteriorating commodity with the imperfect production process. The model also considers JIT (Just In Time) deliveries and integrates the marketing and manufacturing channels. This model can be used to solve the production and replenishment problem for the manufacturer and the retailer. We derive the optimal production size, the wholesale price and the optimal number of deliveries. The model takes an insight to the defect-rate reduction. We find an upper bound for the optimal number of material’s JIT deliveries. The necessary and sufficient conditions are explored to gain managerial insights and economic implications.
1 Introduction The newsboy problem is very well suited for a single period uncertain demand pattern. This type of problem has significant theoretical and practical implications. Some of the newsboy problem examples are in the inventory management of fashion goods, seasonal sports and apparel industry (Gallego and Moon, [1]). If the manufacturer’s wholesale price is too high, the retailer’s orders may diminish and the profit may decrease. Our study focuses on determining the optimal wholesale price and the production size policies when the retailer has decided on the order size by the distribution-free newsboy model. There are several newsboy papers exploring the return problems of the integrated inventory model for the manufacturers and the retailers. Lau and Lau ([2], [3]) developed the pricing and return policies for a single-period item model and studied how the uncertain retail-market demand would affect the expected manufacturer’s and the retailer’s profits. Lariviere and Porteus [4] incorporated market size and market growth into the model to investigate how market size and market growth affect profits. Shang and Song [5] developed a simple heuristic to minimize 2N newsvendor-type cost functions. When the demand distribution is unknown, the newsboy problem may be classified as distribution-free newsboy problem. Several authors have analyzed the M. Gavrilova et al. (Eds.): ICCSA 2006, LNCS 3982, pp. 862 – 874, 2006. © Springer-Verlag Berlin Heidelberg 2006
A Two-Echelon Deteriorating Production-Inventory Newsboy Model
863
distribution-free newsboy problem, which the demand distribution is specified by the mean μ and variance σ 2 are. Gallego and Moon [1] analyzed the cases with random yields, fixed ordering cost, and constrained multiple products. Moon and Choi [6] extended the model of Gallego and Moon to consider the balking customers. Moon and Silver [7] studied the distribution-free models of a multi-item newsboy problem with a budget constraint and fixed ordering costs. However, the above did not consider deterioration in the “manufacturing channel”. The phenomena of deterioration are prevalent and should not be neglected in the model development. Several researchers have studied deteriorating inventory in the past. Ghare and Schrader [8] were the first authors to consider on-going deterioration of inventory. Other authors such as Covert and Philip [9] and Raafat et al. [10] assumed different production policy with different assumptions on the patterns of deterioration. Wee [11] dealt with an economic production quantity model for deteriorating items with partial backordering. Wee and Jong [12] investigated the JIT delivery policy in an integrated production-inventory model for deteriorating items. Yang and Wee [13] provided a review of an integrated deteriorating inventory model. Quality management, JIT purchasing and manufacturing strategy are widely studied in manufacturing issues. Nassimbeni [14] investigated the relationship between the JIT purchasing and JIT practices. The study demonstrates that purchasing is related to three factors: delivery synchronization, interaction on quality and interaction on design. The buyer or procurer perceived that the quality is a responsibility of the supplier (Nassimbeni, 1995).The JIT purchasing of materials from outside suppliers is notable development of JIT concept. The JIT purchasing ensures that disruption of production can be eradicated immediately. To avoid the loss of production and reduce the inventory level and scrap, the materials received by the suppliers should be of good quality (Reese and Geisel, [15]). The non-conforming item incurs a greater post-sale service cost than a conforming item. The warranty cost which is included in the post-sale service cost influences the manufacturer’s pricing policy. Lee and Rosenblatt [16] considered the imperfect production process in a traditional EPQ model and derived the optimal number of inspection and production cycle time. Wang and Sheu [17] investigated the imperfect production model with free warranty for the discrete item. They derived an optimal production lot size to minimize the total cost. Wang [18] studied the production process problem to consider the sold products with free warranty. Yeh et al. [19] developed a production inventory model considering free warranty, and derived an optimal production cycle time. To the best of our knowledge, no research on deteriorating two-echelon distribution-free inventory model with imperfect production has been done.
2 Assumptions and Notation As shown is Figure 1, there are M types of raw materials in the manufacturing process. The retailer must sell the finished products before the expiration date.
864
H.-M. Wee and C.-J. Chung
After the expiration date the unsold stock has a salvage value. Other assumptions of the seasonal products in the production inventory model development are as follows: (a) For a fixed wholesale price, salvage cost, and average demand, the retailer uses the newsboy rule to determine the ordering quantity. (b) The manufacturer sums up the orders from retailer as the production lot size. (c) The material inventory is controlled by periodic review system, and the backorder is not allowed. (d) The lead-time for raw materials is constant. The transportation time is assumed to be zero. (e) The production rate is constant and greater than the demand rate. (f) Deteriorating rate is constant, and the deteriorated items are not replaced. (g) Deterioration of the units is considered only after they have been received into the inventory. (h) The manufacturer and the retailer have complete information of each other. (i) The replenishment is instantaneous. After a complete production lot, then a delivery batch is dispatched to the retailer. (j) At the start of each production cycle, the production process is in an in-control state. After a period of production, the production process may shift from in-control to out-of-control state. The elapsed time is exponentially distributed with a finite mean and variance. (k) When the production process shifts to an out-of-control state, the shift cannot be detected till the end of each production period. The process is brought back to an in-control state at the beginning of a new production cycle. (l) There are M types materials with JIT multiple deliveries. Type i material is independent of type j material, i ≠ j. (m) The reduction of the imperfect product is proportional to the number of the material deliveries. Information flow Failure
Free-repair warranty
Service
Raw materials 1
Warranty cost Rework
Finished goods
Retailer
Production Manufacturer
Fig. 1. The behavior of the production-inventory chain
Raw materials 2
…
Customer (Market demand)
Finished goods
Raw materials M
A Two-Echelon Deteriorating Production-Inventory Newsboy Model
865
(n) The selling price is independent of the number of the material deliveries. (o) The percentages of the non-conforming items during in-control and out-of control states are assumed to be fixed. The notation in the model is defined as follows: Table 1. Notation for model development μc
The mean demand
ϑ1
σ
The standard deviation
ϑ2
S=(1+a)CP
CR
LS=b CP V=(1-c)CP U=dCP A1=eCP CS
Selling price of the retailer Lost sale Salvage Unit transportation cost Unit ordering cost Production setup cost
P θ K A0 F
Capacity Deteriorating rate Warranty period Fixed ordering cost Fixed transportation cost
Cmj hdmj Crj Hrj
M
The No. of material’s type
rj
Cw Gm u H Lj
αj
Percentage of non-conforming items produced during the production process is in control Percentage of non-conforming items produced during the production process is out of control Unit rework cost (including inspection cost) Unit warranty cost Unit target profit Unit production cost Manufacturer’s holding cost The difference in leadtime for material j (= maximum leadtime -average leadtime) Material’s ordering cost of material j The extra handling cost of material j Unit item cost of material j Material’s holding cost Amount of material j needed for unit product The defect-rate-reduction ratio for material j
3 Modeling Description In this study, the production-inventory model has two phases; phase 1 involves the retailer’s policy when the demand is of unknown distribution. The retailer’s ordering cost is assumed to be proportion to the ordering quantities. The relevant transportation cost is also incorporated in the model. In phase 2, the manufacturer’s production policy is explored. We assume that the manufacturer provides the free-repair replacement for the imperfect product. Based on the result in phase 1, we derive the optimal solution of the manufacturer’s production policy. In practice, demand is influenced by the wholesale price. In addition to the issue of the wholesale price, the demand distribution of a single-period seasonal product is usually unknown. We use the method of distribution-free to analyze the problem. The demand distribution is assumed to be in the worst possible distribution, ℑ , and the unknown distribution is G. Let D denote the random demand, G ∈ ℑ , with mean μ and variance σ 2 .
866
H.-M. Wee and C.-J. Chung
To derive the optimal ordering quantities, we define the following relationship:
S = (1 + a )C P ; V = (1 − c )C P ; L S = bC P ; U = dC P ; A1 = eC P ; 0< a <1, 0< b <1, 0< c <1, 0< d <1, 0< e <1 The retailer determines the ordering quantity using the newsboy rule; the ordering quantity should satisfy the following formula:
ER G = S ⋅ E (min {Q , D }) + V ⋅ E (Q − D ) EC
G
+
= A 0 + F + ( A1 + C P + U )Q + L S E (D − Q )
EP G = ER G − EC
G
(1a) +
(1b) (2)
where ER G =the expected revenue; EC G =the expected cost; EP G =the expected profit. One can derive (2) by substituting the definition of a, b, c, d and e, as well as using the following relationship into (2): + E (min{Q, D}) = D − (D − Q ) , (D − Q )+ = (D − Q ) + (Q − D )+ and
(Q − D )+ = (Q − D ) + (D − Q )+ One has
ER G = Sμ − SE (D − Q ) + VE (Q − D ) +
{
+
− A0 + F + ( A1 + C P + U )Q + LS E (D − Q )
EP
G
{
+
}
= C P (a + c − b )μ − (a + c )E (D − Q )+
}
− (c − b − d − e )Q − (b ) E (Q − D ) − A1 − F +
(3)
In order to maximize (3), the following lemmas from Gallego and Moon [1] are used:
⎛
[
Lemma 1: E (D − Q )+ ≤ ⎜ σ ⎜ Lemma 2: E (Q − D )+
2
+ (Q − μ c )
]
1 2 2
⎝ ⎛ 2 ≤ ⎜⎜ σ 2 + (μ c − Q ) ⎝
[
]
1 2
⎞ − (Q − μ c )⎟⎟ 2 ⎠ ⎞ − (μ c − Q )⎟⎟ 2 ⎠
The formula of the retailer’s ordering quantity is different from the above derivation and, from (3), can be revised by applying Lemma 1 and Lemma 2:
[
⎧ σ 2 + (Q − μ c )2 EP G ≥ C P ⎪⎨(a + c − b ) − (a + c ) 2 ⎪⎩
[σ − (c − b − d − e )Q − (b )
2
+ (μ c − Q ) 2
2
]
1
2
]
1
2
− (Q − μ c )
− (μ c − Q ) ⎫⎪ ⎬ − A1 − F ⎪⎭
(4)
A Two-Echelon Deteriorating Production-Inventory Newsboy Model
867
Maximizing the lower bound on (4) is equivalent to minimizing the following function:
Θ Ep
G
[
⎧ σ ⎪ = ⎨ (c − b − d − e )Q + (a + c ) ⎪⎩
[σ + (b )
2
+ (μ c − Q ) 2
2
]
1
2
2
+ (Q − μ c ) 2
2
− (μ c − Q ) ⎫⎪ ⎬ ⎪⎭
[
]
1
2
− (Q − μ c )
]
1 1⎧ 2 2 2 ⎨ (c − b − a − 2 ( d + e ) )Q + (a + b + c ) σ + (Q − μ c ) 2⎩ ⎫ (5a) + (a + c − b )μ c ⎬ ⎭ Taking the first derivative of Θ Ep G with respect to Q and setting the result to zero,
=
one has
Q* =
1 1−θ
⎧⎪ ⎛ σ (R Z ) ⎞ ⎫⎪ ⎟⎬ , ⎨ μ c + ⎜⎜ 1 2 2 ⎟ 1 − (R Z ) ⎪⎩ ⎝ ⎠ ⎪⎭ R c − a − b − 2(d + e ) where = Z a+b+c
[
]
(5b)
From the relationship between wholesale price and the definition of c, one can realizes that the wholesale price influences the ordering decision. The Manufacturer’s Relevant Costs The differential equation of the production period for the manufacturer’s inventory level is
d Ψ S (t1 ) = P − θ ⋅ Ψ S (t1 ) 0 ≤ t1 ≤ Tb , dt 1
(6)
Using Spiegel [20] and the various boundary conditions ΨS (0 ) = 0 and
ΨS (TP ) = Qv , the solution of the differential equation is:
Ψ S (t1 ) =
P
θ
{1 − exp (− θ ⋅ t1 )}; Ψ S (T P ) = Q v
=
P
θ
{1 − exp (− θ ⋅ T P )}
(7)
θ <<1 and TP <1, exp(− θTP ) is replaced by 1 − θT P + (θT P ) 2 /( 2! ) . When θTP ≤ 0.03 , the absolute percentage error is about 0.0464%. It will be smaller Since
for the terms higher than the third term. Therefore, the terms higher than three are neglected. We use the Taylor series to approximate Qv . One has
Qv =
{θT θ P
− (θ ⋅ T P ) 2 2
P
}
(8)
868
H.-M. Wee and C.-J. Chung
From (7) and (8), connecting the relationship of the manufacturer and the retailer by setting Qv = Q * and Q * = Q (1 − θ ) , one has the production run time
TP =
1 ⎛⎜ 2θ Q * 1− 1− θ ⎜⎝ P
⎞ 1⎛ ⎞ ⎟ = ⎜ 1 − 1 − 2θ Q ⎟ ⎜ ⎟ θ P (1 − θ ) ⎟⎠ ⎝ ⎠
(9)
Property 1
When the deteriorating rate approximate zero, T P → Q . P Using L’Hospital rule, the proof is completed. The manufacturer’s holding cost considering the deterioration is TP
H ∫ Ψ S (t1 )dt 1 = H 0
HP [θT P ∫ θ {1 − exp (− θ ⋅ t )} =
TP
P
− 1 + exp (− θT P )]
θ2
1
0
(10a)
Since the production is imperfect, we assume an elapsed time until shift is exponentially distributed with a mean of 1 μ , one has f ( X ) = μe − μX and
1 − F ( X ) = e − μX , where 1 μ is the arrival rate.
The number of the nonconforming items Z in a production period is given by
⎧ ϑ PT Z =⎨ 1 P ⎩ϑ1 PX + ϑ 2 P (T P − X ) , Since
μ
when X ≥ T P when X < T P
(11)
is very small, from (11), the expected number of the nonconforming items is ⎧ TP
⎫
E (Z ) = ⎪⎨ [ϑ1 PX + ϑ 2 P (T P − X )]f ( X )dX + ϑ1 P T P f ( X )dX ⎪⎬ ∫ ∫ ∞
⎪⎭ ⎪⎩ 0 TP (ϑ − ϑ2 )Pμ 2 ⎤⎫ ⎧⎡ (12) (TP ) ⎥⎬ ≈ ⎨⎢ϑ1 PTP − 1 2 ⎦⎭ ⎩⎣ There is a material’s delivery-batch effect on finding and reducing the manufacturer’s imperfect items because of the frequent JIT deliveries. This phenomenon of JIT benefit is found in practices. As mentioned above, the expected number of non-conforming items’ reduction is assumed to be proportion to the number of the material’s deliveries, and the rework cost (including the inspection cost) can be derived as M ⎡ ⎞⎤ ⎛ (13a) RW = C R ⎢ E (Z )⎜⎜ 1 − ∑ r j n j − 1 ⎟⎟ ⎥ j = 1 ⎢⎣ ⎥ ⎠⎦ ⎝ After sale, the product may become unusable. Assuming the hazard rates of the
(
)
conforming and nonconforming items are of Weibull distribution, the mean failure K
rates
are
K
h1 = ∫ v1 (τ )d τ = ∫ ( λ1ρ1 ρ 1t ρ1 −1 ) dt = (λ1 K ) 0
0
ρ1
and
A Two-Echelon Deteriorating Production-Inventory Newsboy Model K
K
0
0
h 2 = ∫ v 2 (τ )d τ = ∫ ( λ ρ2 2 ρ 2 t ρ 2 −1 ) dt = (λ 2 K )
ρ2
869
, respectively. The free-repair
warranty cost is M M ⎧⎪ ⎡ ⎡ ⎛ ⎞⎤ ⎡ ⎤ ⎞ ⎤ ⎫⎪ ⎛ PO = C w ⎨ ⎢ E (Z )⎜⎜1 − ∑ r j (n j − 1)⎟⎟ ⎥ h2 + ⎢Q − ⎜ E (Z )⎢1 − ∑ r j (n j − 1)⎥ ⎟ ⎥ h1 ⎬ ⎜ ⎟ j =1 j =1 ⎪⎩ ⎣⎢ ⎠ ⎥⎦ ⎣ ⎦ ⎠ ⎦⎥ ⎪⎭ ⎝ ⎝ ⎣⎢ M ⎧⎡ ⎫ ⎤ = Cw ⎪⎨⎢⎛⎜ϑ1 PTP − (ϑ1 − ϑ2 )Pμ (TP )2 ⎞⎟⎛⎜1 − ∑ r j (n j − 1)⎞⎟⎥(h2 − h1 ) + PTP h1 ⎪⎬ ⎟ ⎜ 2 j =1 ⎠⎝ ⎪⎩⎣⎢⎝ ⎪⎭ ⎠⎦⎥
(14) The Material’s Relevant Costs of the Manufacturer The differential equation for the inventory level of the manufacturer’s material j with the boundary condition Ψ mj T P n j = 0 is
(
)
d Ψ mj (t ) dt
= − α j P − θ Ψ mj (t ) 0 ≤ t ≤ TP n j .
(15)
The solution of the differential equation and for the inventory level of the manufacturer’s material j is:
Ψmj (t j ) =
α jP {exp[θ (TP n j − t j )]− 1} θ
(16)
The holding cost of the material j and the material j’s extra handling cost due to the tax, overhead cost are as follows: TP n j
The material j’s holding cost = H rj n j
∫ Ψ (t
j
)dt j
0
TP n j
∫
= H rj n j
0
Material j’s extra handling cost =
α jP {exp θ (TP n j − t j ) − 1} dt j θ
[
∑ [h M
j =1
dmj
]
(17)
α j n j (PT P n j )(1 + θ T P (2 n j ))] (18)
For the manufacturer’s total cost/ profit function under fixed target profit, the problem can be formulated as follows: Maximize Z= manufacturer’s revenue – (materials’ and manufacturer’s) total cost = ⎧
(C P − Gm ) ⎪⎪ 1−θ
⎨μ c ⎪ ⎩⎪
( ) + ⎡1 − (R ) ⎤ Z ⎥⎦ ⎢⎣ σ RZ
2
⎫ ⎪⎪ − 1 ⎬ 2 ⎪ ⎭⎪
M
∑ j =1
⎡ α j n j P ⎡ ⎛ PTP ⎢hdmj ⎢exp⎜ θ ⎢⎣ ⎜⎝ n j ⎢⎣
⎞ ⎤⎤ ⎟ − 1⎥ ⎥ ⎟ ⎥⎥ ⎠ ⎦⎦
870
H.-M. Wee and C.-J. Chung
⎧M ⎡ n j g1 j ⎪ − ⎨∑ ⎢⎢ n j Cmj + θ ⎪ j =1 ⎢ ⎩ ⎣
+
⎧− 1 − θ TP n j + exp(θ TP n j )⎫ ⋅⎨ ⎬ θ ⎩ ⎭
n j g 2 j ⎧⎪ ⎡θTP ⎤ ⎫⎪ ⎥ − 1⎬ ⎨exp⎢ θ ⎪⎩ ⎢⎣ n j ⎥⎦ ⎪⎭
g4 jTP ⎞ ⎟ ×( g3 j + 2 ⎟⎠
⎤ ⎥ + C + HP[θTP − 1 + exp(− θTP )] S ⎥ θ2 ⎥⎦
M ⎡ ⎛ ⎤ ⎞ + ⎢ g 5 ⎜⎜1 − ∑ r j (n j − 1)⎟⎟ + g 7 ⎥TP j =1 ⎠ ⎣⎢ ⎝ ⎦⎥
⎫ ⎛ M ⎞ 2 ⎪ − g 6 ⎜⎜1 − ∑ rj (n j − 1)⎟⎟TP ⎬ ⎝ j =1 ⎠ ⎪ ⎭
(19a)
Since the optimal ordering decision is determined by retailer, (i.e., the total sale is fixed, and TP can be derived), we need to determine the minimal total cost of the “manufacturing channel” to maximize the total manufacturing profit. It is equivalent to optimize the problem: Minimize TPG (n ) = M
∑
j =1
⎡ α jn jP ⎢ h dmj θ ⎢⎣
⎡ ⎛ PT P ⎢ exp ⎜⎜ ⎢⎣ ⎝ nj
⎞ ⎤⎤ ⎟ − 1⎥ ⎥ ⎟ ⎥⎥ ⎠ ⎦⎦
(
)
⎧M ⎡ n j g1 j ⎧ − 1 − θ TP n j + exp θ TP n j ⎫ ⎪ ⋅⎨ + ⎨∑ ⎢⎢ n j C mj + ⎬ θ ⎩ θ ⎭ ⎪ j =1 ⎢ ⎩ ⎣ ⎤ ⎥ + C S + HP [θTP − 1 + exp (− θTP )] ⎥ θ2 ⎥⎦ ⎫ M M ⎡ ⎛ ⎤ ⎞ ⎛ ⎞ 2 (19b) + ⎢ g 5 ⎜⎜ 1 − ∑ r j n j − 1 ⎟⎟ + g 7 ⎥T P − g 6 ⎜1 − ∑ r j (n j − 1)⎟TP ⎪⎬ ⎜ ⎟ j 1 = ⎥⎦ j =1 ⎠ ⎝ ⎠ ⎣⎢ ⎝ ⎪ ⎭
⎡θTP ⎤ ⎫⎪ ⎛ n j g 2 j ⎧⎪ g4 jTP ⎞ ⎟ + ⎥ − 1⎬ × ⎜⎜ g3 j + ⎨exp ⎢ θ ⎪⎩ ⎢⎣ n j ⎥⎦ ⎪⎭ ⎝ 2 ⎟⎠
(
)
A Two-Echelon Deteriorating Production-Inventory Newsboy Model
n = (n1 , n2 ,..., n M )
where
g 3 j = L j (C rj + H rj ) + C rj ,
g1 j = α j PH rj
,
871
g2 j = α j P
,
,
g 4 j = (C rj + H rj ), g 5 = {[C R + C w (h2 − h1 )]ϑ1 P} , g6 = [Cw(h2 − h1) + CR ](ϑ1 −ϑ2 )Pμ 2 , g 7 = {C w Ph1 + μP}, j = (1,2,..., M ) (19c)
4 Optimization The purpose of this study is to determine the optimal wholesale price C P* for the manufacturer and optimal replenishment number
( )
n * for the materials that maximize
the total profit. Let TPGj n j be the cost function for one type of material, one can derive the
n * which minimize TPG (n ) .
Proposal 1
(
)
If C mj + g 6 r j TP2 > g 5 r j TP , there exists
n *j , j = 1..M , that satisfies the condition
of
(
{[g
)
n *j n *j − 1 ≤
1j
] ⋅T
+ θ (h dmj P + g 2 j (g 4 j T P 4 + g 3 j 2 )) }
[
]
2 C mj − r j T P ( g 5 − g 6 T P )
(
)
2 P
(
(
* such that TPG n1* − 1, n 2* − 1,..., n M − 1, C P ≥ TPG n1* , n 2* ,..., n M* , C P
(
)
≤ n *j n *j + 1 (20)
)
)
≤ TPG n1* + 1, n 2* + 1,..., n M* + 1, C P . Property 2 When the deteriorating rate approximates zero, there is an upper bound of the optimal number of material j’s deliveries when T P approximates 2C mj r j g 5 . The optimal
(
)
condition is
n j (n j − 1) ≤
α j PH rj ⎡− g ⎢ 5 ⎢⎣ 2Cmj
2
2g C ⎞ ⎛ ⎛g ⎞ ⎜⎜ rj − 6 mj ⎟⎟ + 2Cmj ⎜⎜ 6 ⎟⎟ g5 ⎠ ⎝ g5 ⎠ ⎝
2
⎤ ⎥ ⎥⎦
≤ n j (n j + 1)
(21)
From Property 2, it is clear that the material’s ordering cost, the material’s holding cost rate, the production rate, the rework cost and the warranty cost influence the material replenishment policy.
872
H.-M. Wee and C.-J. Chung
The Sufficient Condition The sufficient condition for the global optimum is that the Hessian matrix ∇ 2TPG (n ) is a positive definite matrix. We can revise the Hessian matrix using the property of ∂ 2 TPG (n ) ∂ n i ∂ n j = 0 , i ≠ j . Finally, substituting optimal n * = n1* , n 2* ,.., n M*
(
into TPG (n ) , one can derive the optimal wholesale price C P* under a given target
( )
profit: Q * ⋅ G m = Q * ⋅ C P − TPG n *
)
(22)
Proof: The revised Hessian Matrix is ⎡ ∂ 2 TPG ⎢ 2 ⎢ ∂n1 ⎢ ⎢ 0 ⎢ ∇ 2 TPG (n ) = ⎢ 0 ⎢ ⎢ ⎢ ⎢ 0 ⎢⎣
Since ∂ 2 TPG ∂n 2j > 0 and matrix
0
0
∂ 2 TPG ∂n 22 0
0
0
[ ] j× j
⎤ ⎥ ⎥ ⎥ 0 ⎥ ⎥ ⎥ 0 ⎥⎥ ⎥ ∂ 2 TPG ⎥ ∂n M2 ⎥⎦ 0
…
∂ 2 TPG ∂n M2 −1
…
0
is positive, the ∇ 2 TPG (n ) is a positive
definite matrix. Solution Procedure: Due to the complexity of solving the algebraic solution with respect to the total profit function, this paper proposes a solution procedure to derive the optimal values for the proposed model.
Step 1: Input the parameters of the problem. Step 2: If the problem satisfies Proposal 1 for all types of materials, go to Step 3; otherwise, go to step 5. Step 3: Use (20) to solve for the optimal value of n *j . If all the n *j satisfy the sufficient condition of the optimal solution, the solution of n * = n1* , n 2* ,.., n M* is the optimal solution; Otherwise, go to step 5. Step 4: Use (5b) to compute production lot size. Then derive the total cost using (19a) and compute C P* Step 5: Stop.
(
)
5 Numerical Example The theory can be illustrated by the numerical example (using the inputs in Table 3). The optimal solution for the multiple materials case is presented in the Table 4. The optimal solution considering varying deteriorating rate is shown in Table 5.
A Two-Echelon Deteriorating Production-Inventory Newsboy Model
873
Table 3. Inputs for example
μc σ a b c d e CS P θ
CR Cw Gm u H L1 L2 Cm1 Cm2 hdm1
800 70 0.64 0.6 0.58 0.04 0.05 1000 1200 0.01
40 90 25 2 4.5 0.0027 0.0025 300 310 10
μ K λ1 λ2
ρ1 ρ2 A0 F
θ1 θ2
0.015 2 0.01 0.012 0.8 0.8 230 600 1/360 1/240
hdm2 Cr1 Cr2 Hr1 Hr2
α1 α2 r1 r2
13 4 4.6 3 4 2 3 0.001 0.001
Table 4. The optimal solution of the example *
Optimal solution
*
( n1 , n 2 ) = (2.26, 3.14) = (2, 3)
C P* = 76.62
Ordering quantities = 771.29
( ) = 39671.64
Total cost TPG n
*
Table 5. The optimal solution for varying deteriorating rate
θ 0.10 0.01 0.001 0.0001 0.00001
( n1* , n 2* )
C P*
(3,4) (2,3) (2,3) (2,3) (2,3)
79.77 76.62 76.35 76.32 76.32
Ordering quantities
Total cost
848.42 771.58 764.35 763.66 763.59
44527.24 39671.64 39233.91 39190.68 39186.36
( )
TPG n
*
6 Conclusion This paper proposes a two-echelon distribution-free integrated production-inventory deteriorating model with imperfect process. The necessary and sufficient conditions for the optimal solution are derived and a solution procedure is developed to give the management insight in determining the optimal wholesale price and the optimal replenishing cycle. In this study, we derive an upper bound for the optimal number of material’s JIT deliveries.
References 1. Gallego, G., Moon, I.: The distribution-free newsboy problem–review and extensions. J. Oper. Res. Soc. 44(6) (1993) 734–825. 2. Lau, H.S., Lau, A.H.L.: Manufacturer’s pricing strategy and return policy for a single-period commodity. Eur. J. Oper. Res. 116 (1999) 291–304.
874
H.-M. Wee and C.-J. Chung
3. Lau, A.H.L., Lau, H.S.: The effects of reducing demand uncertainty in a manufacturer-retailer channel for single-period products. Comput. Oper. Res. 29 (2002) 1583–1602. 4. Lariviere, M.A., Porteus, E.L.: Selling to the newsvendor: an analysis of price-only contracts. Mfg. & Service Oper. Management 3(4) (2001) 293–305. 5. Shang , K.H., Song, J.S.: Newsvendor bounds and heuristic for optimal policies in serial supply chains. Management Sci. 49(5) (2003) 618–638. 6. Moon , I., Choi, S.: Distribution free newsboy problem with balking. J. Oper. Res. Soc. 46(4) (1995) 537–542. 7. Moon, I., Silver, E.A.: The multi-item newsvendor problem with a budget constraint and fixed ordering costs. J. Oper. Res. Soc. 51 (5) (2000) 602–608. 8. Ghare, P.M., Schrader, S.F.: A model for exponentially decaying inventory. J. Ind. Eng. 14(5) (1963) 238-243 9. Covert, R.P., Philip, G.C.: An EOQ model for items with Weibull distribution deterioration. AIIE Trans 5 (1973) 323-326. 10. Raafat, F., Wolfe, P.M., Eldin, H.K.: An inventory model for deteriorating items. Comput. Ind. Eng. 20 (1991) 89-94 11. Wee, H.M.: Economic production lot size model for deteriorating items with partial back-ordering. Comput. Ind. Eng. 24(3) (1993) 449-458. 12. Wee, H.M., Jong, J.F.: An integrated multi-lot-size production inventory model for deteriorating items. Management and System 5(1) (1998) 97-114. 13. Yang, P.C., Wee, H.M.: An integrated multi-lot-size production inventory model for deteriorating item. Comput. Oper. Res. 30(5) (2003) 671-682. 14. Nassimbeni, G.: Factors underlying operational purchasing practices: Results of a research. Int. J. Prod. Econ. 42 (1995) 275-288. 15. Reese, J., Geisel, R.: A comparison of current practice in German manufacturing industries. E. J. Purchasing & supply management 42 (1997) 275-288. 16. Lee, H.L., Rosenblatt, M.J.: Simultaneous determination of production cycle and inspection schedules in a production system. Management Sci. 33 (1987) 1125-1136. 17. Wang, C.H., Sheu, S.H.: Optimal lot size for products under free-repair warranty. Eur. J. Oper. Res. 149 (2003)131-141. 18. Wang, C.H.: The impact of a free-repair warranty policy on EMQ model for imperfect production system. Comput Oper. Res. 31 (2004) 2021-2035. 19. Yeh, R.H., Ho, W.T., Teng, S.T.: Optimal production run length for products sold with warranty. Eur. J. Oper. Res. 120 (2005) 575-582. 20. Spiegel, M.R.: Applied differential equations. Prentice-Hall, Englewood Cliffs. N.J (1960).
Mathematical Modeling and Tabu Search Heuristic for the Traveling Tournament Problem Jin Ho Lee, Young Hoon Lee, and Yun Ho Lee Department of Information and Industrial Engineering, Yonsei University, 134 Shinchon-Dong, Seodaemoon-Gu, Seoul 120-749, Korea {youngh, jinho7956, yuno80}@yonsei.ac.kr
Abstract. As professional sports have become big businesses all over the world, many researches with respect to sports scheduling problem have been worked over the last two decades. The traveling tournament problem (TTP) is defined as minimizing total traveling distance for all teams in the league. In this study, a mathematical model for the TTP is presented. This model is formulated using an integer programming (IP). In order to solve practical problems with large size of teams, a tabu search heuristic is suggested. Also, the concepts of alternation and intimacy were introduced for effective neighborhood search. Experiments with several instances are tested to evaluate their performances. It was shown that the proposed heuristic shows good performances with computational efficiency.
1 Introduction Recently, as sports industries have become big businesses all around the world, high income has been obtained through many sports leagues. One key to such high income levels is the schedule the teams play. Thus, many researches concerning sports scheduling problem have been worked in Operations Research over the two decades. The term “Traveling Tournament Problem” (TTP) with respect to sports scheduling problem is defined as minimizing total traveling distance for all teams in the league. The TTP represents the fundamental issues involved in creating a schedule for sports leagues where the amount of team travel are an issue. For many of these leagues, the scheduling problem includes a myriad of constraints based on thousands of games and hundreds of team idiosyncrasies that vary in their content and importance from year to year, so instances of this problem seem to be very difficult to solve even for very small cases. Thus, the TTP is considered as an interesting challenge for combinatorial optimization techniques. In this paper, a mathematical model for the TTP is presented and a tabu search heuristic method is proposed in order to solve the instances where there are a large number of teams involved.
2 Literature Review The sports scheduling research literature has focused on league scheduling over the past two decades. James and John [8] presented a model that reduces travel cost and M. Gavrilova et al. (Eds.): ICCSA 2006, LNCS 3982, pp. 875 – 884, 2006. © Springer-Verlag Berlin Heidelberg 2006
876
J.H. Lee, Y.H. Lee, and Y.H. Lee
player fatigue using an integer programming approach to the National Basketball Association (NBA). Fleurent and Ferland [6] allocated the games using an integer programming to the National Hockey League (NHL), and Costa [3] used a tabu search algorithm for the NHL. Schreuder [12] constructed a timetable for the Dutch professional soccer leagues that minimized the number of schedule breaks. Russell and Leung [11] developed a schedule for the Texas Baseball League that satisfied stadium availability and various timing restrictions with the objective of minimizing travel costs. Nemhauser and Trick [10] created a schedule for the Atlantic Coast Conference men’s basketball season, taking into consideration numerous conflicting requirements and preferences. Easton et al. [4] defined the traveling tournament problem as having the objective of minimizing total traveling distances for all teams in the league. Additionally, Easton et al. [5] presented an optimal solution for an instance of four, six and eight teams through the Branch-and-Price algorithm using integer programming and constraint programming. Benoist et al. [2] presented a study of hybrid algorithms combining lagrangian relaxation and constraint programming on a round-robin assignment and travel optimization. Anagnostopoulos et al. [1] achieved good results with a simulated annealing algorithm that explores both feasible and infeasible schedules for the TTP, searching large neighborhoods with complex moves. Lim et al. [9] improved the quality of solutions for instances of large size using a simulated annealing and hill-climbing algorithm for the TTP. As described above, a lot of studies have been conducted in this field. However, providing the optimal schedule for instances of more than eight teams still remains an open problem. This paper is organized as follows: In Chapter 3, the traveling tournament problem is described, and then a mathematical model is presented. In Chapter 4, a heuristic method using tabu search is proposed. In Chapter 5, the proposed heuristic is evaluated on the basis of computational results. Finally, concluding remarks and recommendations for future researches are addressed in chapter 6.
3 Problem Description and Mathematical Modeling This chapter describes the traveling tournament problem, which has the objective of minimizing total distance traveled by all teams in the league. The chapter also deals with several of the complex constraints, which consist of double round-robin tournament constraints, consecutive constraints and no-repeat constraints representatively. And then a mathematical model using an integer programming is presented. 3.1 Definition of Traveling Tournament Problem Easton et al [4] introduced and defined the traveling tournament problem (TTP), and the definition is as follows: “Given n teams and n evens, a round-robin tournament is a tournament among teams so that every team plays every other team. Such a tournament has n – 1 slots during which n / 2 games are played. For each game, one team is denoted the home team and its opponent is the away team. As suggested by the name, the game is held at the venue of the home team. A double round-robin tournament has
Mathematical Modeling and Tabu Search Heuristic for the TTP
877
2(n – 1) slots and has every pair of teams played twice, once at home and once away for each teams.” For the problem, distances between team sites are given by n by n matrix. Assuming equality of distances to and from sites, this matrix is symmetric. Each team begins the tournament at its home site to which it must return at the end of the tournament. Also, when a team plays an away games, a team travels from one away venue to the next directly. The cost to each team is the total distance traveled starting from its home site and ending back there on completion of its scheduled games. In summery, the input data and output of the TTP is as follows: Input: n, the number of teams, D, an n by n integer symmetric distance matrix. Output: A double round-robin tournament on the n teams such that 1. All constraints on the tournament are satisfied, and 2. The total distance traveled by the teams is minimized. The TTP consists of a large number of constraints, so it has been recognized as a difficult problem to solve even for very small cases, but there are two basic requirements. The first is a feasibility issue that the home and away pattern must be sufficiently varied so as to avoid long home stands and road trips. A road trip is defined as a series of consecutive away games. Similarly, a home stand is the number of home games in the series. The second is an optimality that minimizes total traveling distance for all teams. The key to the TTP is a conflict between minimizing travel distances and feasibility constraints on the home/away pattern. A solution to the TTP must satisfy the following several constraints: 1. Double round-robin constraints Each pair of teams, for example a pair of teams, A and B, play exactly twice-once at A’s home site and once at B’s home site. Thus, there is the exit total 2(n – 1) rounds and n / 2 games are played in each round. 2. Consecutive constraints For each team, no more than three consecutive home or three consecutive away games are allowed. In other words, more than three consecutive home stands or road trips are forbidden by these constraints. 3. No-repeat constraints For any pair of teams, for example a pair of teams, A and B, after A and B play at A’s home site, A and B cannot immediately play at B’s home site in next round. 3.2 Mathematical Modeling for the TTP In this paper, a mathematical model for the TTP is presented. This model was formulated using an integer programming (IP). Also, this is an extended model from James and John’s one [8] and the details are as follows: Notation i : the index for the team’s site or venue (i = 1, 2, …, n) j : the index for the team’s site or venue (j = 1, 2, …, n) k : the index for the team’s site or venue (k = 1, 2, …, n) t : the index for the round (t = 1, 2, …, 2n-2) di,j: distance between team i’s site and team j’s site
878
J.H. Lee, Y.H. Lee, and Y.H. Lee
Decision Variable Yi,j,k,t : 1 if team i moves from team j’s site to team k’s site at round t, 0 otherwise
Minimize
n 2n −2
n
∑∑∑ ∑ d
jk
× Yi , j ,k ,t
(1)
i =1 j =1 k =1 t =1
subject to n 2n − 2
∑ ∑Y
i , j ,i , t
= n −1
∀i
(2)
j =1 t =1
n 2 n- 2
∑∑ Y
i,j,k,t
=1
∀i, k (k ≠ i )
(3)
i,j,k,t
≤2
∀i,j,t(i ≠ j)
(4)
k,j,k,t
∀i,k,t
(5)
i,k,j,t +1
∀i,k,t
(6)
∀i
(7)
∀i
(8)
∀i,k,t(i ≠ k)
(9)
∀i, t = 1,2,...,2n-5
(10)
∀i, t = 1,2,...,2n-5
(11)
j =1 t =1 n
n
∑∑ Y i =1 j =1 n
∑
n
Yi,j,k,t ≤
j =1
j =1
n
∑
∑Y n
Yi,j,k,t =
j =1
∑Y j =1
n
∑Y
i,j,i,1
=1
j =1 n
∑Y
i , j ,i , 2 n −1
=1
j =1
Yi,k,i,t + Yk,k,i,t ≤ 1 t +3
∑Y
i,i,i,t
≤3
t
n
n
t +3
∑∑∑ Y
i,j,k,t
j =1 k =1 t ≠i ≠i
≤3
Mathematical Modeling and Tabu Search Heuristic for the TTP
879
The objective function (1) is to minimize total traveling distance for all teams in the league. Equation (2) is the constraint for the number of n - 1 games must be played in its home site, and equation (3) means that each team must play once with any other teams in each opponent’s home site. Equation (4) represents that no more than two teams can locate in one venue simultaneously, and equation (5) means that if an away team visits, a home team must be located in its own home site. Thus, two teams can be located in one venue simultaneously or no team is located according to constraints (4) and (5). Equation (6) is moving sequence constraints that if team i was in team j’s home at round t, team i must start in team j’s home at round t+1. Equation (7) and (8) are the constraints that all teams must start in their home sites at the beginning of the league and return to their homes at the end of the league. Equation (9) is the no-repeat constraints, and finally equation (10) and (11) are the consecutive constraints. A mathematical model is tested using an ILOG OPL Studio 3.6.1. However, this does not show optimal solutions for the instances of more than four teams because of a large number of constraints and variables. For an instance of just 4 teams, an optimal solution is solved, and the optimal value for instances of the NL4 and CIRC4 are 8276 and 20. Thus, in the next chapter a tabu search heuristic is proposed in order to solve the practical problems with large size of teams.
4 Tabu Search Heuristic The Tabu Search (TS) algorithm is an iterative improvement approach designed to avoid terminating prematurely at a local optimum for combinatorial optimization problems. In the TTP, it is difficult to maintain the feasibility of solutions for the consecutive and no-repeat constraints. Thus, it is important to generate good neighborhood, and to search unvisited neighborhood. One issue for tabu search is how to represent solution and how to generate neighborhood. Another is to keep the characteristic of good solution. In this point of view, our algorithm was designed. 4.1 Representation of Solutions and Neighborhoods
In this paper, a schedule is represented by a time-table indicating the opponents of the teams. Each row corresponds to a team and each column corresponds to a round. The opponent of team i at round t is given by the absolute value of element (i, t). If (i, t) is positive, the game takes place at i’s home, otherwise at i’s opponent home. This representation of schedules is designed by Anagnostopoulos et al. [9]. The example of the schedule for 6 teams can be shown in Table 1. Table 1. Example of the schedule for 6 teams Team / Round
1
2
3
4
5
6
7
8
9
10
1
3
-6
4
2
-4
-5
5
-3
6
-2
2
-4
4
-5
-1
6
-6
3
5
-3
1
3
-1
-5
-6
6
5
4
-2
1
2
-4
4
2
-2
-1
5
1
-3
6
-6
-5
3
5
6
3
2
-4
-3
1
-1
-2
4
-6
6
-5
1
3
-3
-2
2
-4
4
-1
5
880
J.H. Lee, Y.H. Lee, and Y.H. Lee
In table 1, while team 1 has home games with team 3, 4, 2, 5 and 6 at round 1, 3, 4, 7 and 9, team 1 has away games with team 6, 4, 5, 3 and 2 in round 2, 5, 6, 8, and 10. Thus, total distance for team 1 is as follows: Total distance of team 1 = d16 + d 61 + d14 + d 45 + d51 + d13 + d31 + d12 + d 21 The neighborhood generations can be obtained by applying five types of moves proposed by Anagnostopoulos et al. [1]. 4.2 Solution Strategy for the TTP
In order to keep the good solution’s characteristics and search the effective neighborhoods, in this paper the terms “alternation” and “intimacy” are defined. The details are as follows: Alternation: Alternation means the number of home-away or away-home conversions to one team. The elite schedule has small alternations. Intimacy: Between the two adjacent rounds, the intimacy represents the number of home-home patterns and away-away patterns. The intimacy is use in order to evaluate the each pair of column(round). In a specific team’s schedule, if the team has high numbers of alternations, it should move lot. Additionally, between the two specific adjacent rounds, low intimacy needs to be improved through the neighborhood search. Alternation and intimacy is shown in Table 2. Table 2. Alternation and intimacy T/R
1
2
3
4
5
6
7
8
9
10
ˍ ˍ
T/R
t
1
-6
4
2
4
-5
-5
-6
-2
-1
5
3
2
6
1
3
3 i
-2
-4
-1
6
5
4
-6
1
2
-5
ˍ ˍ
4
ˍ ˍ
t+1
ˍ ˍ
In the left of Table 2, team i has 4 alternations {(-1, 6), (4, -6), (-6, 1), (2, -5)}, and in the right of Table 2, intimacy between round t and t+1 is 4 {(-5, -6), (-2, -1), (3, 2), (1, 3)}. In the TTP, generating all the neighbors of the current solution needs too much time. Thus, alternation and intimacy is used to search good neighbors effectively. For each team, alternation is evaluated and then 3 teams, which have the maximum alternations, are selected. Also, for each adjacent pair of rounds, intimacy is evaluated and then until remaining 3 or 4 rounds, minimum pair of rounds is deleted in the candidates. 4.3 Tabu Search Algorithm for the TTP
The basic search procedure is driven by a tabu search meta-heuristic [7]. The initial solution is generated randomly, using constraint programming. After evaluating the
Mathematical Modeling and Tabu Search Heuristic for the TTP
881
alternation and intimacy in the current solution, 3 max-alternation teams and 3 or 4 min-intimacy rounds are selected. Then the neighbors are generated by applying SwapHome and SwapTeam using 3 max-alternation teams and SwapRound using 3 or 4 min-intimacy. Finally, PartialSwapRound and PartialSwapTeam find the neighbors, using both 3 max-alternation teams and 3 or 4 min-intimacy rounds. The outline of neighborhood generation is shown in Figure 1.
Current Solution
Alternation
Intimacy
Select Max 3 teams
Select Min 3~4 rounds
SwapHome SwapTeam
SwapRound PartialSwapTeam PartialSwapRound
Fig. 1. The outline of neighborhood generation
If a neighborhood is infeasible, it is not considered. The stop criterion is based on the number of iterations. However, when it could not find any improvement for the current best solution, it is evaluated from second best solution to fifth best solution in the same way. If the evaluation of fifth best solution is not improved, then the procedure is finally stopped. The parameters for these experiments are set as 10,000 max iterations, 5 tabu list size. The steps of our proposed tabu search are progressed by general tabu search metaheuristic approach. However in the point of neighborhood search, this algorithm finds the good solution’s characteristic through alternation and intimacy. Also, since selected teams and rounds are only candidates with respect to neighborhood, much time is not spent in neighborhood generation.
5 Computational Experiments The proposed algorithm is tested on two different sets of instances. Namely, experiments with the the real distance NLx instances and the circular instances CIRCx are tested. The detail of these two sets is shown in [13]. In each instance, 10 runs of the algorithm are implemented, recording the best found solution and the total running time of the procedure. The experiments were performed on a Pentium 4, CPU 2.8GHz, 512MB RAM and the algorithm was coded in C language.
882
J.H. Lee, Y.H. Lee, and Y.H. Lee
The results and comparison of the proposed tabu search heuristic with the best solutions of other previous researches are shown in Table 3, and the comparisons of computational time with previous meta-heuristic approaches are in Table 4. Table 3. Results and comparison of the best solution Instance NL6 NL8 NL10 NL12 NL14 NL16 CIRC6 CIRC8 CIRC10 CIRC12 CIRC14 CIRC16 CIRC18 CIRC20
1) 23916 42517 68691 143655 301113 437273 N.A. N.A. N.A. N.A. N.A. N.A. N.A. N.A.
2) N.A. 39721 59583 111248 189766 267194 N.A. N.A. N.A. N.A. N.A. N.A. N.A. N.A.
3) 23916 39721 59821 115089 196363 274673 64 132 246 408 654 928 1356 1842
4) N.A. N.A. 59436 112298 190056 272902 N.A. N.A. 242 N.A. N.A. N.A. N.A. N.A.
Our Results 23916 39721 59583 111483 190174 276520 64 132 244 416 662 976 1364 1896
1) Lagrange relaxation and constraint programming by Benoit Rottemberg et al, [2] 2) Simulated annealing by Anagnostopoulos et al, [1] 3) Simulated annealing and hill-climbing by Lim et al, [9] 4) By Glenn Langford, [13] Bold type indicates the best solution so far, and N.A. indicates “not applicable”.
※
Table 4. Comparison of the computational time
Instance NL6 NL8 NL10 NL12 NL14 NL16 CIRC6 CIRC8 CIRC10 CIRC12 CIRC14 CIRC16 CIRC18 CIRC20
Simulated annealing by [1] Min Mean N.A. N.A. 596.6 1639.3 8084.2 40268.6 28526 68505.3 418358.2 233578.4 344633.4 192086.6 N.A. N.A. N.A. N.A. N.A. N.A. N.A. N.A. N.A. N.A. N.A. N.A. N.A. N.A. N.A. N.A.
Simulated annealing & Hill-climbing by [9] Min Mean 577 821.2 3224 4106.8 8557 40289.4 10355 54026.5 21978 59019.7 50767 83287.0 594 648.3 3340 3729.9 5886 23022.8 8709 37947.1 11022 53751.0 13261 56036.9 16199 63872.2 50303 68807.0
Proposed Tabu search Min Mean 308 379.1 1411 2204.5 3427 6135 7329 26751.2 15674 41600.1 48294 71088.3 412 464 1549 2021.3 3996 8476 7034 28457.8 10426 36628.8 12467 44552.2 14234 55074.2 24715 60815.4
Mathematical Modeling and Tabu Search Heuristic for the TTP
883
It can be observed in Table 3 and 4 that our performances with respect to objective value are not better than previous best. However, the suggested tabu search shows the good performance regarding computation time. The computation time of the suggested algorithm is shorter than that of [1] and [9]. And, partially the better solutions than [1] and [9] are found. As shown in the result of experiments, the selection of candidates by application of alternation and intimacy makes this algorithm powerful in the point of performance with computational efficiency for problems.
6 Conclusion Sports league scheduling has received considerable attention in recent years, since these applications involve significant revenues for television networks and generate challenging combinatorial optimization problems. This paper deals with the traveling tournament problem (TTP) proposed in [4]. Although several recent researches with respect to the TTP propose the heuristic approaches which involve meta-heuristic and hybrid algorithm, the general mathematical model to the TTP is not presented. Thus, this paper presents a mathematical model using integer programming. Additionally, we suggest a tabu search heuristic algorithm, and introduce the mechanism which represents the characteristics of elite solutions. Through the experiments, the proposed heuristic proved that it shows the good performances with computational efficiency. Thus, our mechanism to select candidates to neighborhood generation showed that it makes the suggested heuristic effective in the point of computational time. In further research, it would be interesting to use the advantages of each metaheuristic in the hybrid approach. And, if a nice neighbor search technique is presented, it may also improve the efficiency of the algorithm.
References 1. Anagnostopoulos, A., Michael, L., V. Hentenryck, P. and Vergados, Y.: A simulated annealing approach to the traveling tournament problem. In Proceeding of CP-AI-OR (2003) 2. Benoist, T., Laburthe, F. and Rottembourge, B.: Lagrangian relaxation and constraint programming collaborative schemes for traveling tournament problem. CP-AI-OR, Wye College (2001) 15-26 3. Costa, D.: An evolutionary tabu search algorithm and the NHL scheduling problem. INFOR, Vol. 3(33). (1995) 161-178 4. Easton, K., Nemhauser, G., and Trick, M.: The traveling tournament problem description and benchmarks. In Proceeding of the 7th International Conference on the Principle and Practice of Constraint Programming, Paphos, Cyprus (2001) 580-589 5. Easton, K., Nemhauser, G., and Trick, M.: Solving the traveling tournament problem: a combined integer programming and constraint programming approach. Lecture Note in Computer Science, Vol. 2740. (2003) 100-109 6. Ferland, J. A. and Fleurent, C.: Computer aided scheduling for a sport league. INFOR, Vol. 29(1). 14-25 7. Glover, F. and Laguna, M.: Tabu search. Kluwer Academic Publishers, (1997) 8. James, C. B. and John, R. B.: Reducing traveling costs and player fatigue in National Basketball Association. The Institute of Management Science, Vol. 10(3). (1980) 98-102
884
J.H. Lee, Y.H. Lee, and Y.H. Lee
9. Lim, A., Rodrigues, B. and Zhang, X.: A simulated annealing and hill-climbing algorithm for the traveling tournament problem. European Journal of Operational Research, Vol. 131. (2005) 78-94 10. Nemhauser, G. L. and Trick, M. A.: Scheduling a major college basketball conference. Operations Research, Vol. 46(1). (1998) 1-8 11. Russel, R. A. and Leung, J. M. Y.: Devising a cost effective schedule for a baseball league. Operations Research, Vol. 42(4). (1994) 614-625 12. Schreuder, J. A. M.: Combinatorial aspects of construction of competition Dutch professional football leagues. Discrete Applied Mathematics, Vol. 35(3). (1992) 301-312 13. Trick, M.: Challenge traveling tournament problem. http://mat.gsia.cmu.edu/TTP/, (2004)
An Integrated Production-Inventory Model for Deteriorating Items with Imperfect Quality and Shortage Backordering Considerations H.M. Wee1, Jonas C.P. Yu2, and K.J. Wang3 1
Department of Industrial Engineering, Chung Yuan Christian University, Chungli 32023, Taiwan [email protected] 2 Logistics Management Department, Takming College, Taipei 114, Taiwan 3 Department of Business Administration, National Dong Hwa University, Hualien, Taiwan
Abstract. In this study we present a production-inventory model for deteriorating item with vendor-buyer integration. A periodic delivery policy for a vendor and a production-inventory model with imperfect quality for a buyer are established. Such implicit assumptions (deteriorating items, imperfect quality) are reasonable in view of the fact that poor-quality items do exist during production. Defective items are picked up during the screening process. Shortages are completely backordered. The study shows that our model is a generalization of the models in current literatures. An algorithm and numerical analysis are given to illustrate the proposed solution procedure. Computational results indicate that our model leads to a more realistic result.
1 Introduction Since the development of the economic order quantity (EOQ) more than four decades ago, a substantial amount of researches have been conducted in the area of inventory lot sizing. However, one of the weaknesses of most researches is the unrealistic assumption of perfect quality items [25]. Cheng [2] proposed an EOQ model with demand-dependent unit production cost and imperfect production process. He proposed a general power function to model the relationship between unit production cost, demand rate and process reliability. Cheng formulated this inventory decision problem as a geometric problem (GP), and applied the theories of GP to derive a closed-form optimal solution. Zhang and Gerchak [30] considered a joint lot sizing and inspection policy, for an EOQ model with a random proportion of defective units. They considered a model where the defective units are replaced by non-defective ones. Rosenblatt and Lee [24] considered the presence of defective products in a small lot size replenishment policy. They assumed that the defective rate from the beginning in-control state until the process goes out of control increased exponentially. The defective items can be reworked instantaneously and kept in stock. Rosenblatt and Lee concluded that the presence of defective products resulted in smaller lot sizes. Schwaller [26] presented a procedure to extend EOQ models by assuming that the defectives of a known proportion were present in the incoming lots, M. Gavrilova et al. (Eds.): ICCSA 2006, LNCS 3982, pp. 885 – 897, 2006. © Springer-Verlag Berlin Heidelberg 2006
886
H.M. Wee, J.C.P. Yu, and K.J. Wang
and that fixed and variable inspection costs were incurred in finding and removing the items. Porteus [22] incorporated the effect of defective items in the basic EOQ model and invested in process quality improvement. He assumed a probability q would go out of control during production. Salameh and Jaber [25] presented a modified inventory model for imperfect quality items. They considered poor-quality items are sold as a single batch by the end of the 100% screening process. Rosenblatt and Lee [24] showed that reducing the lot size quantity increased the average percentage of imperfect quality items. The reasonable explanation is that Rosenblatt and Lee [24] assumed defective items were reworked instantaneously and kept in stock. This increases the holding cost that results in lower lot sizes, whereas in this paper, imperfect quality items are withdrawn from stock resulting in lower holding cost and larger lot sizes. Goyal and Gardenas-Barron [10] extended Salameh and Jaber’s model and presented a practical approach to determine EPQ for items with imperfect quality. The approach suggested in their study results in nearly a zero penalty as compared to Salameh and Jaber. Later, Goyal et al. [11] extended the model of Goyal and Gardenas-Barron [10] to consider vendor-buyer integration. Chung and Hou [3] developed a model to determine an optimal run time for a deteriorating production system with shortages. They assumed the elapsed time is random between the production process shifts. Recently, Wee and Yu [28] extended the approach by Salameh and Jaber and considered permissible shortage backordering. They found that the traditional EOQ and Salameh and Jaber’s modified EPQ/EOQ model are both special cases of the proposed model when the backordering cost is very large. In this paper, the influence of imperfect quality and deterioration is taken into account. Imperfect quality is the result of imperfect machines and processes. Deterioration occurs because many agricultural products, gasoline and medicine do not have constant utility during storage. The distribution of time to deterioration of the item follows the exponential distribution. Ongoing deteriorating inventory has been studies by several authors in recent decades. Ghare and Schrader [12] were the first authors to consider ongoing deterioration of inventory. They have developed an EOQ model for items with an exponentially decaying inventory. Elsayed and Terasi [5] proposed a deteriorating production-inventory model with Weibull distribution and permissible shortage. Kang and Kim [20] proposed an exponentially deteriorating model considering the price and production level. An exponentially deteriorating production-inventory model with permissible shortage is presented in [23]. Other authors such as Dave [4] and Heng et al. [16] assumed either instantaneous or finite production rate with different assumptions on the patterns of deterioration. Yang and Wee [29] developed an integrated economic ordering policy of deteriorating items for a vendor and multiplebuyers. Collaboration of enterprises, especially in terms of developing strategies, is vital in reducing the overall cost of the enterprise. This is because decision made independently by one player will not result in global optimum. Global optimization will only be realized if the perspectives of all players are considered. One of the advantages of applying joint economic lot size models (JELS) is being able to generate lower total inventory relevant cost for the system so that the net benefit can be shared by both parties. The JELS approach has been studied for years. Goyal [6]
An Integrated Production-Inventory Model for Deteriorating Items
887
was the first to introduce an integrated inventory policy for the single-supplier singlecustomer problem. He showed that his integrated policy results in minimum joint variable cost for the supplier and the customer. Banerjee [1] developed a joint economic lot size model with lot-for-lot policy for a single-buyer single-vendor system by combining two EOQ models from the buyer and vendor. In his model, he assumed that the vendor makes the production setup as long as the buyer places an order and supplies on a lot for lot basis. He also showed that his JELS model has minimum joint total relevant cost by considering both the buyer and the vendor at the same time. Later Goyal [7] generalized Banerjee’s model by relaxing the assumption of the vendor’s lot-for-lot policy. He pointed out that the vendor could possibly produce a lot that can supply an integer number of orders from the buyer. Nevertheless the model restricts shipments cannot be triggered before the whole production batch is completed. A review of previous models on buyer-vendor integration until 1990 refers to [8]. Lu [21] developed an algorithm which derived an optimal solution to the singlevendor single-buyer problem, when the delivery quantity to the buyer at each replenishment was identical. Lu’s model is synchronous, allowing shipment to take place during production. The model proposed by Halm and Yano [14] is also synchronous, aiming to minimize the manufacturer’ and buyer’s inventory holding cost, manufacturer’s setup cost, as well as the transportation cost. Halm and Yano advocates that for the single-buyer single-item problem, the optimal solution has the property that the production interval is an integer multiple of the delivery interval. Halm and Yano [15] further extend the model to the single-machine multi-component problem. A heuristic procedure was therefore developed to find both the “just-intime” production runs and the delivery schedule. Goyal [9] relaxes the identical shipment constraint, allowing the quantity of successive shipments to be different in an increasing fashion by a fixed factor of production rate to demand rate. The policy is to deliver whatever that is produced at the replenished time by using the same example as Lu [21]. Goyal shows that a different shipment policy could result in a better solution. Ha and Kim [13] analyze the integration between the buyer and the supplier, and developed a mathematical model using the geometric method. Though both Goyal’s [9] and Hill’s [17] models illustrate that delivery in “what is produced” policy is better than delivery in “identical shipment” policy, Viswanathan [27] shows that neither strategy obtains the best solution for all possible problem parameters. More recently, Hill [18] derived a globally optimal batching and shipping policy for the single-vendor single-buyer integrated production-inventory problem. Hoque and Goyal [19] proposed an optimal policy for a single-vendor single-buyer integrated production-inventory system with a limited capacity of transport equipment. In this study, product deterioration and vendor-buyer integration are considered simultaneously. We propose a production-inventory model for an on-going deterioration item with partial backordering and imperfect quality. Shortages due to imperfect items are completely backordered. This is because not all customers are willing to wait for a new replenishment of stock. Customers encountering shortages will respond differently according to the type of commodities and market environment. In real world, complete backordering is likely only in a monopolistic market. An illustrative example and sensitivity analysis are given to validate the inventory model.
888
H.M. Wee, J.C.P. Yu, and K.J. Wang
2 Notation and Assumptions The following notation is used: I(t) = Im = Is = Tv1 = Tv2 = R= Q= D= X= c= h= d= K= θ= Cvh= Cvd= p= f(p) = x= Ib = b= *=
Inventory level at time t, 0 ≤ t ≤ T ; Maximum inventory level; Inventory level at the end of time t1; The time length of the production stage; The time length of the non-production stage; The total production quantity; Order size; Demand rate for buyer; Screening rate for buyer; Purchasing cost per unit for buyer; Carrying cost per unit per unit time for buyer; Deterioration cost per unit for buyer; Ordering cost per order for buyer; Deterioration rate; Carrying cost per unit per unit time for vendor; Deterioration cost per unit for vendor; Defective percentage in demands for vendor; Probability density function of p; Screening cost per unit for buyer; Total shortage demand (units/cycle); Backordering cost per unit for buyer; Superscript representing optimal value;
The mathematical models presented in this study have the following assumptions: (1) A single item with constant deteriorating rate of the on-hand inventory is considered. (2) Demand rate is a continuous known constant. (3) Lead-time is a known constant. (4) Defective items are independent of deterioration. (5) Replenishment is instantaneous. (6) Screening process and demand proceeds simultaneously. (7) Defective percentage, p, has a uniform distribution with [α, β], where 0 ≤α < β ≤1. (8) Shortages are completely backordered. (9) A single product is considered.
3 Mathematical Model We derive the cost involved in integrating the lot sizing policies between a vendor and a buyer. The ultimate form of JIT purchasing agreement should be adopted to minimize the total cost by implementing frequent small lots deliveries. Figure 1 depicts the behavior of inventory levels for both the vendor and the buyer. The annual integrated total cost consists of the vendor’s annual total cost, and the buyer’s annual total cost.
An Integrated Production-Inventory Model for Deteriorating Items
889
3.1 The Vendor’s Total Cost per Unit Time Figure 1 shows that in periodic delivery, the vendor does not stop producing until all demand is satisfied. For a given R, the values of Tv1 and Tv2 can be derived as
Tv1 = R / p
(1)
and Tv 2 =
1 ⎛⎜ ⎛ p ⎞ ⎛ ⎛ − Rθ ⎞ ⎞⎟ ⎞⎟ ⎟ −1 +1 ln ⎜1 − ⎟ ⎜⎜ exp⎜⎜ ⎜ θ ⎝ ⎝ D ⎠ ⎝ ⎝ p ⎟⎠ ⎟⎠ ⎟⎠
(2)
respectively. The proof is attached in Appendix A. For Tv1 + Tv2 = Tv, one has Tv1 + Tv 2 =
⎛⎛ ⎛ − Rθ ⎞ ⎞ ⎞⎟ R 1 p⎞ ⎛ ⎟⎟ − 1⎟ + 1 + ln⎜ ⎜1 − ⎟ ⎜⎜ exp⎜⎜ ⎟ ⎜⎝ p θ D⎠ ⎝ ⎝ p ⎠ ⎠ ⎟⎠ ⎝
Fig. 1. Inventory level of vendor and buyer with customer demand
(3)
890
H.M. Wee, J.C.P. Yu, and K.J. Wang
The vendor total inventory cost per unit time is depicted by the following formula: Total cost = setting cost + delivery cost + holding cost + deteriorating cost A carrying inventory can be derived as follows Carrying inventory =
Δ I − D ⋅ Δt θ
=
pTv1 − nQ R − nQ = θ θ R − nQ ⎞ ⎟ and the deteriorating cost ⎝ θ ⎠
Hence, the holding cost per cycle is equal to C vh ⎛⎜
per cycle is equal to C vd (R − nQ) . In addition to the setup cost and the delivery cost, the vendor’s annual total cost is given by
TCv =
Cs nCd ⎛ R − nQ ⎞ Cvh ⎟⎟( + + ⎜⎜ + Cvd ) Tv Tv ⎝ Tv ⎠ θ
(4)
3.2 The Buyer’s Total Cost per Unit Time Figure 2 shows a lot size of Q units is replenished with an ordering cost of $K and a purchasing price of $c per unit. A fraction of each lot received is defective, with a known probability density function f (p). The random variable p has a uniform distribution [α, β], where 0 ≤ α < β ≤ 1 . A 100% screening process of the item is
Fig. 2. Buyer’s inventory system with backordering
An Integrated Production-Inventory Model for Deteriorating Items
891
conducted at a rate of X. The defective items are picked up in a single batch during the replenishment period T1. Shortages of stock are partial backlogged at the beginning of each period. The behaviour of the inventory system is illustrated in Figure 1, where Tb is the cycle length, pDTb is the maximal number of defectives, and Ib is the total unit backordered. The buyer’s total cost per unit time , TCb, is depicted as: Total inventory cost = Ordering cost + Screening Cost + Deteriorating cost + Holding cost + Backordering cost. For the inventory system depicted in figure 3, the carrying inventory within the time interval between T1 and T2 is
Δ I − D ⋅ Δt θ
=
1 (Q − DTb p − DTb ) θ
For I b = DT 3 = pDTb , the backlogged inventory during T3 is equal to
(5) 1 DTb 2 p 2 . 2
Therefore, the total annual inventory cost is
TC b = TC b (Tb ) =
bDp 2Tb K ⎛ h⎞ Q ⎛ h⎞ + ⎜c + x + d + ⎟ − ⎜ d + ⎟ (Dp + D ) + θ ⎠ Tb ⎝ θ⎠ Tb ⎝ 2 (6)
The change in the inventory level during an infinitesimal time, dt, is a function of the deterioration rate θ, the demand rate D, and the inventory level I(t). It is formulated as
dI1 (t ) + θ I 1 (t ) = − D dt
0 ≤ t ≤ T1
(7)
dI 2 (t ) + θ I 2 (t ) = − D 0 ≤ t ≤ T2 dt
(8)
dI3 (t ) = −D dt
(9)
0 ≤ t ≤ T3
I(t) is the inventory level at time t. From the above differential equations, after adjusting for the constant of integration with various boundary conditions: I1(0) =Im, I2(0) =Is-pDT and I3(0) =0, the differential equations become:
⎡⎛ D⎞ ⎤ D I1(t1) = ⎢⎜ Is + ⎟ exp ((T1 − t1) θ )⎥ − , 0 ≤ t1 ≤ T1 θ⎠ ⎣⎝ ⎦ θ
(10)
⎡⎛ ⎤ D D⎞ I2(t2) = ⎢⎜ Is − pDT + ⎟exp(−θ t2 )⎥ − , 0 ≤ t2 ≤T2 θ⎠ ⎣⎝ ⎦ θ
(11)
892
H.M. Wee, J.C.P. Yu, and K.J. Wang
and
I3 (t3 ) = −Dt3 ,
0 ≤ t3 ≤ T3
(12)
Since the defective items are independent of deterioration, they have a value equal to pDTb. From figure 2, I1(T1) =Is= I2(0) + pDTb, one has
I s = pDTb +
( (
) )
D exp θ T2 − 1 θ
(13)
Ib = DT 3 = pDTb
(14)
and
(
) (exp
)
D ⎞ ⎛ exp (θ T2 ) ⎟ I1 (0) = Q − pDTb = ⎜ pDTb + θ ⎠ ⎝
(θ T1 ) −
D θ
(15)
The replenishment, Q, can be derived by substituting T1=DTb/X into Eq. (15). One has
(
)
Q = Q(Tb ) = pDTb + pDTb exp θ Tb D / X +
(
)
D D exp θ Tb (1 − p ) − θ θ
(16)
Expanding the exponential functions and neglecting the third and higher power of
θ ⋅ T , Eq. (16) becomes:
2 3 ⎛ ⎞ ⎛ p θD 2 θD Q = Q (T b ) = ( Dp + D )T b + ⎜ (1 − p )2 ⎟⎟T b 2 + ⎜⎜ p θ D2 + ⎜ X 2 ⎝ 2X ⎠ ⎝
⎞ 3 ⎟T b ⎟ ⎠
(17)
By substituting Eq. (17) into Eq. (6), one has TC b (Tb ) =
⎛ pθD 2 θ D ⎞ ⎛ pθ 2 D 3 K ⎛ h ⎞⎛ + + ⎜ c + x + d + ⎟⎜ Dp + D + ⎜ ( 1 − p )2 ⎟Tb + ⎜ ⎜ X ⎟ ⎜ 2X 2 Tb ⎝ θ ⎠⎜⎝ 2 ⎝ ⎝ ⎠
⎞ 2⎞ ⎟Tb ⎟ ⎟ ⎟ ⎠ ⎠
bDp 2 Tb h⎞ ⎛ − ⎜ d + ⎟(Dp + D ) + 2 θ⎠ ⎝
(18) 3.3 The Joint Total Cost Per Unit Time For the vendor, by substituting Tv=nTb and Eq. (17) into Eq. (4), one has ⎛ R ⎛ pθD 2 θD ⎞ ⎛ pθ2 D3 ⎞ 2 ⎞ Cvh ⎟T ⎟( − ( Dp + D) − ⎜ + 1 − p )2 ⎟Tb − ⎜ ( TCv (Tb ) = ⎜ + Cvd ) ⎜ ⎟ ⎜ 2X 2 ⎟ b ⎟ θ ⎜ nTb 2 ⎝ X ⎠ ⎝ ⎠ ⎠ ⎝ C C + s + d nTb Tb
(19)
From Eq. (18) and Eq. (19), the total cost per unit time for both the vendor and buyer is: JTC(Tb,n)=TCv(Tb,n)+TCb(Tb)
An Integrated Production-Inventory Model for Deteriorating Items
893
For α ≤ p ≤ β , one has
α+β
= μ1 2 α 2 + αβ + β 2 E[ p 2 ] = = μ2 3 α 2 + αβ + β 2 2 E[(1 − p ) ] = − (α + β ) + 1 = μ 3 3 E[ p] =
The expected value of JTC, EJTC, is ⎛ R ⎛ μ θ2 D 3 ⎛ μ θ D 2 θD ⎞ EJTC (Tb ) = ⎜ − ( D μ 1 + D ) − ⎜⎜ 1 + μ 3 ⎟⎟Tb − ⎜⎜ 1 2 ⎜ nT b 2 ⎝ 2X ⎝ X ⎠ ⎝ Cs Cd + + nTb Tb
⎞ 2 ⎞ C vh ⎟Tb ⎟ ( + C vd ) ⎟ ⎟ θ ⎠ ⎠
⎛ θ D 2 μ1 θ D μ 3 K ⎛ h⎞ ⎛ + ⎜ c + x + d + ⎟ ⎜ D μ1 + D + ⎜ + ⎜ 2 Tb ⎝ X θ⎠ ⎜ ⎝ ⎝ bDT b μ 2 h⎞ ⎛ − ⎜ d + ⎟ (D μ 1 + D ) + 2 θ⎠ ⎝ +
2 3 ⎞ ⎟T + ⎛⎜ θ D μ 1 ⎟ b ⎜⎝ 2 X 2 ⎠
⎞ 2 ⎞⎟ ⎟Tb ⎟ ⎟ ⎠ ⎠
(20) and from Eq. (3),
Tb = Tb (n) =
⎛⎛ ⎛ − Rθ ⎞ ⎞ ⎞⎟ ⎞⎟ 1 ⎛⎜ R 1 p⎞ ⎛ ⎟ − 1⎟ + 1 + ln⎜ ⎜1 − ⎟ ⎜⎜ exp⎜⎜ n ⎜ p θ ⎜⎝ ⎝ D ⎠ ⎝ p ⎟⎠ ⎟⎠ ⎟⎠ ⎟ ⎝ ⎝ ⎠
(21)
3.4 Methodology and Solution Search Our objective is to minimize the expected cost function.
Min
α ≤ p ≤β
s.t .
EJTC ( n ) Q (Tb ) > 0 Tb > 0
(22)
n∈ N The problem is to determine the value of n that minimizes EJTC. Since the number of deliveries per cycle, n, is a discrete variable, the value of n can be derived as follows:
Step 1. Input all the system parameters. Step 2. For a range of n-value, equate the first derivative of EJTC with respect to Tb to zero. For each n, denote the resulting minimum value of Tb by Tb(n).
894
H.M. Wee, J.C.P. Yu, and K.J. Wang
Step 3. Derive the optimal value of n, denoted by n#. And, the value, n*, is an integer in the vicinity of n#. The optimal value of must satisfy
EJTC (n* − 1) ≥ EJTC (n* ) ≤ EJTC (n* + 1)
(23)
Step 4. Using (17), the periodic delivery quantity, Q, can be solved.
4 Numerical Example To illustrate the preceding theory, we compare our analysis with the example from Salameh and Jaber. The following data are assumed: R=150000 unit, P=160000 units/year, Cs=300/cycle, Cd=$25/unit, Cvh=$2/unit/year, Cvd=$30/unit, D=50000 units/year, K=$100/cycle, h=$5/unit/year, X=175200 units/year, x=$0.5/unit, b=$10/unit/year, c=$25/unit, d=$30/unit, s=$50/unit. The item deteriorates at a constant rate with θ=0.01. The percentage defective random variable, p, can take any value in the range [α, β] where α=0, and β=0.04. It is assumed that p is uniformly distributed with its p.d.f.
⎧25, f ( p) = ⎨ ⎩0
0 ≤ p ≤ 0.04, otherwise.
Since EJTC is a very complicated function due to high-power expression of the exponential function, a graphical representation showing the convexity of the EJTC is given in Fig. 3. Following the above solution procedure, we compute the optimal value of n that minimizes Eq.(20) as n*=75 (n#=74.8). Substituting n*=75 into Eq.(21), the optimum values of Tb is 0.0396 year. From Eq.(17), the lot size Q* is 2020 units. Therefore, the integrated total cost per year is $1,194,719.
Fig. 3. Graphical representation of a convex EJTC (when n*=75)
An Integrated Production-Inventory Model for Deteriorating Items
895
Based on the numerical example, if the decision is made solely from the buyer’s perspective, the optimal value of Tb that minimizes Eq.(18) is Tb*=0.0272. From Eqs.(17) and (21), the optimal values of Q and n are Q*= 1388 and n*= 109. Substituting them into the buyer’s expected annual cost and the vendor’s expected annual cost, the total cost of the buyer and the vendor is $1,195,172. Therefore, the integrated cost reduction is (1,195,172-1,194,719)= 453. Note that the expected annual integrated total cost has an impressive cost-reduction as compared with an independent decision by the buyer.
5 Conclusions This study has presented a deteriorating inventory model with unreliable process. The model extends the studies in [25] and considers a single-vendor single-buyer integrated two-echelon supply chain environment. Comparative studies in the example show the benefit of integration. The effect of deterioration should be considered even if it is small. We have shown in this study that the influence of imperfect quality, deterioration and complete backordering are significant. The management of an enterprise can select suppliers based on the defective percentage and the deterioration rate of the products supplied by each supplier.
References 1. Banerjee, A.: A joint economic lot size model for purchaser and vendor. Decision Science 17 (1985) 292-311 2. Cheng, T.C.E.: An economic order quantity model with demand-dependent unit production cost and imperfect production process. IIE Transactions 23(1) (1991) 23-28 3. Chung, K.J., Hou, K.L.: An optimal production run time with imperfect production processes and allowable shortages. Computers and Operations Research 20 (2003) 483-490 4. Dave, U.: On a discrete-in-time order-level inventory model for deteriorating items. Operational Research Quarterly 30 (1979) 349–354 5. Elsayed, E.A., Terasi, C.: Analysis of inventory systems with deteriorating items. International Journal of Production Research 21 (1983) 449–460 6. Goyal, S.K.: Determination of optimum production quantity for a two-stage production system. Operational Research Quarterly 28 (1977) 865-870 7. Goyal, S.K.: A joint economic lot size model for purchaser and vendor: A comment. Decision Science 19 (1988) 236-241 8. Goyal S.K., Gupta, Y.P.: Integrated inventory models: the buyer–vendor coordination. European Journal of Operational Research 41 (1992) 261–269 9. Goyal, S.K.: A one-vendor multi-buyer integrated inventory model: a comment. European Journal of Operational Research 82 (1995) 209–210 10. Goyal, S.K., Gardenas-Barron, L.E.: Note on: economic production quantity model for items with imperfect quality – a practical approach. International Journal of Production Economics 77 (2002) 85-87 11. Goyal, S.K., Huang, C.K., Chen, H.K.: A simple integrated production policy of an imperfect item for vendor and buyer. Production Planning & Control 14(7) (2003) 596-602
896
H.M. Wee, J.C.P. Yu, and K.J. Wang
12. Ghare, P.M., Schrader, S.F.: A model for an exponentially decaying inventory. Journal of Industrial Engineering 14 (1963) 238-243 13. Ha, D., Kim, S.L.: Implementation of JIT purchasing: an integrated approach. Production Planning & Control 8(2) (1997) 152-157 14. Hahm J., Yano, C.A.: The economic lot and delivery scheduling problem: the single item case. International Journal of Production Economics 28 (1992) 235–251 15. Hahm J., Yano, C.A.: The economic lot and delivery scheduling problem: the common cycle case. IIE Transactions 27 (1995) 113–125 16. Heng, K.J., Labban, J., Linn, R.L.: An order-level lot-size inventory model for deteriorating items with finite replenishment rate. Computers & Industrial Engineering 20 (1991) 187–197 17. Hill, R.M.: The single-vendor single-buyer integrated production-inventory model with a generalized policy. European Journal of Operational Research 97 (1997) 493–499 18. Hill, R.M.: The optimal production and shipment policy for the single-vendor singlebuyer integrated production inventory problem. International Journal of Production Research 37 (1999) 2463-2475 19. Hoque, M.A., Goyal, S.K.: An optimal policy for a single-vendor single-buyer integrated production-inventory system with capacity constraint of transport equipment. International Journal of Production Economics 65 (2000) 305-315 20. Kang, S., Kim, I.: A study on the price and production level of the deteriorating inventory system. International Journal of Production Research 21 (1983) 449–460 21. Lu, L.: A one-vendor multi-buyer integrated inventory model. European Journal of Operational Research 81 (1995) 312–323 22. Porteus, E.L.: Optimal lot sizing, process quality improvement and setup cost reduction. Operations Research 34(1) (1986) 37-144 23. Raafat, F., Wolfe, P.M., Eldin, H.K.: An inventory model for deteriorating items. Computers & Industrial Engineering 20 (1991) 89–94 24. Rosenblatt, M.J., Lee, H.L.: Economic production cycles with imperfect production process. IIE Transaction 18 (1986) 48-55 25. Salameh, M.K., Jaber, M.Y.: Economic order quantity model for items with imperfect quality. International Journal of Production Economics 64 (2000) 59-64 26. Schwaller, R.L.: EOQ under inspection costs. Production and Inventory Management 29(3) (1988) 22 27. Viswanathan, S.: Optimal strategy for the integrated vendor-buyer inventory model. European Journal of Operational Research 105 (1998) 38-42 28. Wee, H.M., Yu, J., Chen, M.C.: Optimal inventory model for items with imperfect quality and shortage backordering. Omega. (2005) In press (available online) 29. Yang, P.C., Wee, H.M.: A single-vendor and multiple-buyers production-inventory policy for a deteriorating item. European Journal of Operational Research 43 (2002) 570-581 30. Zhang X., Gerchak, Y.: Joint lot sizing and inspection policy in an EOQ model with random yield. IIE Transaction 22(1) (1990) 41
Appendix A From Fig. 2, the vendor’s production interval, Tv1, and the production interval, Tv2, can be denoted by
Tv1 = t s + n1 ⋅ Tb + t f and
Tv = Tv1 + Tv 2 = n ⋅ Tb
(A1)
An Integrated Production-Inventory Model for Deteriorating Items
897
It is observed that the total inventory level at Tv1 (the production stage) is the same as total inventory level at Tv2 (the non-production stage). One has ⎛ p−D⎞ ⎛D⎞ I (Tv1 ) = ⎜ ⎟(1 − exp (− Tv1θ )) = ⎜ ⎟(exp (Tv 2 θ ) − 1) . θ ⎝ ⎠ ⎝θ⎠
(A2)
Substituting T1 = R / P into (A2), the values of T2 can derived as Tv 2 =
1 ⎛⎜ ⎛ p ⎞ ⎛ ⎛ − Rθ ⎞ ⎞⎟ ⎞⎟ ⎟ −1 +1 ln ⎜1 − ⎟ ⎜⎜ exp⎜⎜ ⎜ θ ⎝ ⎝ D ⎠ ⎝ ⎝ p ⎟⎠ ⎟⎠ ⎟⎠
(A3)
A Clustering Algorithm Using the Ordered Weight Sum of Self-Organizing Feature Maps Jong-Sub Lee1 and Maing-Kyu Kang2 1
Department of Technical Management Information Systems, University of Woosong, Daejeon, South Korea [email protected] 2 Department of Information & Industrial Engineering, University of Hanyang, Ansan, South Korea [email protected]
Abstract. Clustering is to group similar objects into clusters. Until now there are a lot of approaches using Self-Organizing Feature Maps(SOFMs). But they have problems with a small output-layer nodes and initial weight. This paper suggests one-dimensional output-layer nodes in SOFMs. The number of outputlayer nodes is more than those of clusters intended to find and the order of output-layer nodes is ascending in the sum of the output-layer node's weight. We can find input data in SOFMs output node and classify input data in output nodes using the Euclidean Distance. The suggested algorithm was tested on well-known IRIS data and machine-part incidence matrix. The results of this computational study demonstrate the superiority of the suggested algorithm.
1 Introduction Clustering deals with data corresponding to processes of clusters. The predicament of clustering is the input of n data of multi dimension and dividing it into k cluster having similar features. When compared with common data, data in a cluster has greater similarities rather than the differences. The measurement of the similarity is calculated by the Euclidean Distance Method based on the Attribute Value of Data. The shorter the size of Euclidean distance, the higher the resemblance. Clustering Algorithms can be divided into two categories; hence Hierarchical Algorithms and Partition Algorithms. Recently, there has been a dynamic study of clustering algorithms utilizing neural network and fuzzy-neural networks The hierarchical algorithm, which is postulated by Mangiameli et al[10], depicts that there is single linkage clustering which measures the similarity by the minimum distance between two clusters, complete linkage clustering which measures the similarity by the maximum distance between two clusters, group-average clustering which measures the similarity by the average distance between the two clusters, and finally consistent with Ward’s Hierarchical Clustering measure the similarity by density between two clusters. Hierarchical Algorithms has a defect in that it cannot complete the problems caused by an inappropriate merge. In a broader way, while we consider about the Partition Algorithms, it can be noted that such algorithms formulate partition of the data and form clusters so that data in a M. Gavrilova et al. (Eds.): ICCSA 2006, LNCS 3982, pp. 898 – 907, 2006. © Springer-Verlag Berlin Heidelberg 2006
A Clustering Algorithm Using the Ordered Weight Sum of SOFMs
899
cluster is more similar than other clusters. Partition Algorithms can be characteristically classified into k-Means Algorithm and ISODATA Algorithm. To deal with kMeans Algorithm, this will incorporate the data of each cluster which is able to rearrange as the algorithms is repeated into the direction minimizing the distance difference between each data and central value of each cluster. This conquered the disadvantages of hierarchical algorithms, which was unable to overcome the inappropriate merge occurring in the early stage. ISODATA Algorithm start with k centrums but the number of clusters is not necessarily k. Despite this aspect, the number of clusters can be flexibly changed during algorithm performing. ISODATA algorithm complete k-Means algorithm’ defect of having fixed numbers of clusters[5]. In the year of 1989, two professionals called Huntsberger and Ajjimarangsee[6] postulated a clustering algorithm modified in parameters such as learning rate and vicinity rate and slightly modified Kohonen[8]’s learning method. In 1993 another personality called Pal et al[11] illustrated a lose function method which provided the connecting weight of the distance between input data and output node by early connection strength, its also designated the Competitive learning neural network Algorithms that minimize the Lose function. The fuzzy concept of neural network was divulged by two individuals named Tasao et al[13] and Karayiannis[7]. Particularly with Karayiannis[7] he formulated an algorithm that minimizes the connecting weight sum of square Euclidean distance between FALVQ(Fuzzy Algorithm for Learning Vector Quantization) inputting data and connecting weight of LVQ(Learning Vector Quantization). In 2000 another person called Kusiak[9] defined the clustering problem as an NPComplete problem. This illustrates where the number of machines in a cluster is higher, the computational complexity is exponentially increased and time consuming. To overcome such tribulations, we prefer Heuristic algorithms to Optimization algorithms. In this erudition, the attribute of the connecting weight modifying the form into a similar one to the inputting data is consumed, when the Self-Organization neural network is in a progress of unsupervised learning with input data of Anderson’s IRIS data and machine-part matrix data. The attributes of this cram depict the numbers of one dimension output nodes, learning rate and the boundary of adjacency, etc. The suggested algorithm creates a process of learning using these parameters, and groups depending on the dimensions of Connecting Weight Distance between i and j output nodes. As the result this algorithm elaborates how to reduce the numbers of misclassification.
2 SOFMs Neural Network SOFM is a Competitive Learning Neural Network model which is explained by Kohonen[8]. Fig. 1 illustrates that SOFM includes two layers, hence an input layer formed with m input nodes and an output layer formed with n output nodes. As the Input layer receives input data, mapping is performed at the output layer. The output layer uses either a one-dimensional structure or a two-dimensional structure. We can set the output node to either a bigger number than that of input data or a random number k chosen by users so that the input data can be spread on other output nodes. At each node of output, mapping is performed with the input data. Every node of input layer and output are connected and there will be a connecting line between output node i (1≤i≤n) and input node j (1≤j≤m), having a Connecting
900
J.-S. Lee and M.-K. Kang
Output Layer
j
i i th output Node
j th output Node
W
W
i1
W
i2
W
Input Layer
1
W
j1
2
W
j2
W
j3
W
i3
j4
i4
4
3
Fig. 1. General structure of SOM
Weight Wij. Connecting weight is a real number between 0 and 1, which is randomly presented at the initial stage but can be changed by the input data. Corresponding with each input data, winner node i* is selected, which is the most similar output node. As formula (1), among the distance Di calculated between input data X and connecting weight Wi, the lowest output node i* is selected
Di = (x1 − wi1)2 + (x 2 − wi2 )2 + ⋅ ⋅ ⋅ + (x m − wim )2
i = 1, ⋅ ⋅ ⋅ , n
(1)
*
Connecting weight Wi* of output node i which is given in as formula (2), will establish the nearest connecting weight to input data X among n output node. The node located on the front and the back of winner node i* is called the Neighbor Node and Neighborhood Ni*(δ) is a congress of neighbor node separated as δ from winner node i*. Winner node is specified as “#” in Fig. 2. While we refer to the other output node as “*”, the neighbor node is referred with the case of which the Radius is δ=0, 1, 2 by using the Rectangular Grid. | X - Wi* |= min | X − Wi* | i = 1, ⋅ ⋅ ⋅ , n (2) Coherence with neighborhood as Ni*(δ), the connecting weight of self-organization neural network decreases the neighbor range and the learning ratio with regulating the connecting weight until the neighbor range becomes the same as the winner node itself as formula (3). The learning ratio α(t) is a ratio which is used to control the difference between input data and existing connecting weight in accordance with flow of time t. This is a real number between 0 and 1 and progressively decreases as the learning proceeds. Generally, the learning cannot be achieved accurately while the learning ratio is too high, and it takes too long when its too low[9]. Here w(old)ij represents the link-weight before adjustment and w(new)ij represents the link weight after adjustment. (3) w(new)ij = w(old)ij + α (xi − w(old)ij ) i ∈ N i *(δ ) j = 1, ⋅ ⋅ ⋅ , m
N i *(δ ) is neighborset with δ far from i *
A Clustering Algorithm Using the Ordered Weight Sum of SOFMs
901
Fig. 2. The neighborhood using rectangular grid in two dimensions
Learning algorithm is summarized as follows. 1. Initialize link weight with random number between 0 and 1. Decide the range of neighbor and the learning rate. 2. Input one input vector in input layer. 3. Calculate the distance Di between the ith input vector and link weight vector of formula (1). 4. Choose one winner node. 5. Adjust link weight according to learning rule of formula (3). 6. Downsize the range of neighbor and learning rate. Repeat the process from step 2 to 5 until the range of neighbor becomes winner node itself.
3 Suggestion Taking into account at first level, the SOFM given by Kohonen has a few tribulations which are caused by the change of solution and an extensive amount of learning time. The suggested structure of SOFM formed with a one-dimensional linear output layer gets a connecting weight sum at each output node based on a randomly given connecting weight at first level. Anchored in this, it lines up in an ascending order. Matching input data to the consisted output node and transforming output node to a similar form to input node through learning, it forms a group. Exceptionally, when the number of output node is set higher than the number of input data, it enables the distribution of the output node to be spread in similar form as the distribution of input data. During the learning process, at the point of neighbor range’s being half at the first level, the connecting weight of output node that had not been winner node is not regulated as formula (3). In accordance with the same situation, when learning processed input data is lined up on the output node which has a similar order, the required group can be formed by linear separating the highest point of the connecting weight among output nodes. This disentangles the defect which can occur in learning process, and apparently acts upon the process of forming a group after learning.
902
J.-S. Lee and M.-K. Kang
Suggested clustering algorithm settles on the parameter such as a number of output node, learning ratio, and neighbor range. Smaller the number of output node, shorter the learning time. Least number of output node enables the input data to adequately spread on the output node. According to practical method, the best solution is shown by how to trounce the drawbacks which are possibly caused from the learning process; it would be gained if the number of output node is set to more than twice of the number of input data. As the learning ratio gets higher, the learning time decreases. At first level, set every output mode as neighbor, diminish the range as time passes, and stop the algorithm when the neighbor range is itself. Set the initial learning ratio, α(0), to 0.4 which practically offers good results, and set the initial neighbor range as the same number as that of output node. With the assistance of Euclidean distance, separate the output node i, on which each input data is mapped from the distance between the connecting weight i+1. Where i and i+1 represent the number of neighboring output node. That is why the connecting weight consists of a similar form of probability distribution function; the mapped neighboring output node’s connecting weight is used as Euclidean distance. So the neighboring output node’s connecting weight simply needs to be considered. Suggested Clustering algorithm process is as follows, 1. Initialize the structure of SOFM (output node type, number of input or output node) 2. Initialize the weight on each connecting line, and set an initial learning ratio α(0) and learning ratio function α(t), and an initial neighboring range. 3. Gain the sum of connecting ratio at each output node. Depends on the volume of connecting weight sum, arrange the output node with an ascending order. 4. As formula (1), refers to each input data, calculate the distance between the connecting weight of output node and input data and decide the winner node which is the shortest node. 5. Regulate the connecting ratio of neighbor node located at a regular range from the winner node. 6. Decrease the neighbor range as 1 and learning ratio as α(t)=(1-t/4950)×α(t-1). Finally, repeat Step 4 to 5 until the neighbor range becomes winner node itself or learning ratio to be 0. Exclude the tip that the neighbor range becomes half of the initial neighbor range, hence any disastrous output node becomes a winner node and thereafter learning is not required. 7. Map each input data on the nearest output node. 8. Calculate the connecting weight distance WD(i, j) between the neighboring output node i and j of which the input data is matched. 9. At the point of an output node with linear structure, selecting k-1 (output node range) which has the greatest difference of connecting weight distance WD(i, j) is able to form k group.
4 Numerical Example In this premise, we recycled the postulations presented by Anderson’s IRIS data: [1] and machine-part matrix: [2],[3],[4],[9],[12]. Anderson’s IRIS data, especially,
A Clustering Algorithm Using the Ordered Weight Sum of SOFMs
903
consists of 150 samples of data which have parameters of 4 dimensions (Petal Width, Petal Length, Sepal Width, and Sepal Length). The three clusters (Iris Setosa, Iris Verisiclolr, Iris Verginica) consist of 50 pieces of data each. It is generally noted that IRIS data, as a clustering algorithm which devours unsupervised learning, is known to fabricate about 15 to 17 misclassifications[11]. Following are some distinctive applications of suggested clustering algorithms to IRIS date used for this study. 1. The structure of SOFM used for this example is depicted as is in the subsequent case. The structure of the output node is linear and the number of the input nodes and the output nodes are 4, 300 each. In this situation, the structure of the output nodes is also linear, and the number of the output doubles the input data. Hence, the number of input nodes is 4.
Fig. 3. Suggesting structure of SOFMs
2. The weight of the first output node is W1={0.5882, 0.2500, 0.2200, 0.1872}, the weight of the second output node is W2={0.9232, 0.4950, 0.8613, 0.7534}, the weight of nineteenth output node is W19={0.0145, 0.2887, 0.0584, 0.0835} and the weight of 300th output node is W300={0.7083, 0.8935, 0.8302, 0.0759}. Set the initial learning ratio as 0.4, and the learning function is α(t)=(1-t/4950)×α(t-1). 3. The sum of the weight of the output nodes: the first: 1.245, the second: 2.5468, the 19th :0.4453, the 300th: 2.5082. The order by size is this: 19, 48,…, 44, 89. 4. When calculating the distance W19, the weight of the first output to the first input data as formula (1), it produces 0.6902, 1.3757, 1.6861, … . And the shortest output node, W19, is called the winner node. 5. Adjust the weight of all output nodes within 300 radiuses as formula (3). 6. Reduce the neighbor rate one by 1, and in the case when t is zero, reduce the initial learning ratio α(0) to 0.4, and in the case when t is 1, reduce it as 0.4×(1-(1/4950) and repeat the Step 4 to 5 until the neighbor rate becomes the winner node itself. 7. When mapping each data to the nearest output node, it is as Table 1. 8. When calculating the distance of each output node and the weight distance, the results are as follow: WD(1, 5)=0.011, WD(5,8)=0.0074, WD(8,10)=0.1738, WD(10,12)=0.0286, ... ,WD(56, 75)=2.8132, … , WD(182, 191)=0.4176, … , WD(299, 300)=0.0139.
904
J.-S. Lee and M.-K. Kang Table 1. The Data in the output nodes
No 1
5 8 10 12 15 18 19 21 22 23 27 29 31 32 34 39 40
1
Data2 6,11 15,16 17,19 33,34 37,49 20,45 47 21 22,32 28 18 1,5, 29,44 24,40 27,41 8,38 50 12 36 23 7 3 48
No 41 42 44
47 48 49 50 52 54 56 75 77 78 80 83 85 89 93 96 97 98
Data 14,43 39 4,9, 13 2,10 46 42 35 30 31 26 25 99 58 94 61 80 82 81 65 70 54 60
No 100 102 105 111 114 115 116 119 121 125 126 128 134 136 144 147 150 151 153 154 157 166 167
Data 90 63 83,93 68 107 91 95 100 89 97 85,96 56,67 62 72 79,86 98 69,88 74 92 64 75 52,76 66
No 169 170 175 176 182 191 196 202 204 205 206 211 213 214
218 219 227 232 236 237 238 240
Data 59 55 51,77,87 53,57 78 73,134 124,127 128,139 71 120 84 150 122 102,114 143,147 115 135 104 138,148 112 117 111 129
No 241 245 250 253 260 261 262 263 264 266 267 271 272 280 283 284 286 293 294 299 300
Data 133 116,149 142,146 137 105 101 141,145 113 140 125 109 121 144 103 130 126 110 131 108 136 106,118 119,123 132
Table 2. The Data in 3 Groups
No of Group
1 2
3
No of Input Data 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 72, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 107 71, 73, 84, 101, 102, 103, 104, 105, 106, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150
9. The output node and the section of output node that have the longest weight distance are 56 and 75 and the next are 182 and 191. Each value is WD(56, 75)=2.8132, WD(182, 191)=0.4176. Therefore, the 3 groups are same as table 2. 1 2
No of Output Node. No of Input Data.
A Clustering Algorithm Using the Ordered Weight Sum of SOFMs
905
5 Result The suggested revision is an assessment on IRIS data[1], referred by existing Studies and machine-part matrix data. As Table 2 shows, the misclassified data appear one in group 2, three in group 3 and the total is four. This analysis created a better consequence than the existing algorithms that appears in Table 3. The recommended structure of SOFM of the output node is one-dimensional linear, nevertheless as the initial arbitrarily set weight; the sum of weight of the output node was not able to line up from left to right by its magnitude. To make it acceptable, we lined them up setting up the sum of weight as a standard from left to right in ascending order. In addition to this, for fixing on the number of output nodes which is probable to get the nearest optimizing quotient of IRIS data and machine-part matrix, we have carried out the experiment by putting the number of output nodes from 3 as a start, and increased it to quadruple input data. As a result, we found that the results under the situation of 3 output and less than the double of input data were insignificantly different from time to time; however, the result under the situation of more than two times of input data were the same. The initial neighbor range is set as 300 in the radius, so as to enable amending the weight of all output nodes. Once the learning process passed over half of the initial neighbor range, the output nodes failed to become winner nodes, hence cease adjusting the weight. Table 3 shows the result of measure up to the quotients of the suggested clustering algorithms to Anderson’s IRIS data and of existing algorithms. It can be widely known that the suggested clustering algorithms accomplished the quotient with 4 misclassifications, which is a enhanced result from Pal et al[11]’s 17 misclassifications and Karayiannis[7]’s 15 misclassifications. According to Pal et al[11], the existing clustering algorithms that use unsupervised learning produce at least 15 to 17 misclassifications. The suggested clustering algorithms produce only 4 misclassifications which is lesser than presently existing algorithms. The suggested clustering algorithms used the same parameter to solve the machine-part grouping problem which is well known in manufacturing field. When applied to IRIS data that has a value of 4 dimensional real number, setting the initial learning ratio as 0.4, and the learning function (1-t×(1/4950). The machine-part grouping problem consists of the machine-part matrix, which has no exceptional elements. Table 3. The number of error Comparison for Anderson's IRIS Data
Source of Problem Anderson's IRIS Data Set
Source of Algorithms Suggested algorithm Karayiannis[7] Pal et al[11]
No of Error 4 15 17
906
J.-S. Lee and M.-K. Kang
In the initial phase of machine-part grouping, Table 4 convinced the optimizing number of group and misclassifications that may occur during the grouping. The suggested clustering algorithm forms the machine cells, utilizing an independent machine-part matrix, and indicates the number of machine cells and misclassifications that will occur during the process, as Table 4. The suggested clustering algorithms indicate the optimizing number of machine cells and the minimum number of classifications, 0. Table 4. The number of error Comparison for Machine-Part Incidence Matrix
Size
3
4×5 10×15 10×20 24×40 40×100
Source of Problems
Kusiak[9] Chan et al[2] Srinivasan et al[12] Chandraseharan et al[4] Chandraseharan et al[3]
No of Group No of Error Suggested Suggested Optimal Optimal Algorithm Algorithm 2 2 0 0 3 3 0 0 4 4 0 0 7 7 0 0 10 10 0 0
6 Conclusion This revise contributes an effective clustering algorithms classifying the IRIS data, and forming the machine-part groups. The features portrayed here are the structure of SOFM studying parameter. The structure of SOFM suggested in a one-dimension is linear. The number of output nodes is set as twice as the number of input node. We endowed the weight to the output node voluntarily, and arranged them as the size of the sum of weight in ascending order. Once the learning is in process, and the neighbor range arrives at the point of half of the initial neighbor node, the weight of output node that has failed to be the winner node stops adjusting the weight. Analogous with output node the input data learning is completed and takes its turns, it is possible to form a group by linear dividing the point that has the largest difference of weight between output nodes. According to the experienced method, to set the number of output node to more than twice the number of input data is the best way to achieve the best quotient that is able to overcome the defects that might occur during the learning process. In the neighbor arrangement, it set all output nodes as its neighbor, and as time goes by, it reduces the range, and finally when it comes to be its own neighbor, the algorithms stops. A well recognized method called IRIS data and machine-part matrix is used in this premise. As the result of research on IRIS data, we achieved better quotient with only 4 misclassifications. But the normal way of the existing clustering algorithms utilizing unsupervised learning produces 15 to 17 misclassifications. So, broadly saying, the suggested algorithms do not use a complex operation, hence the suggested algorithms perform more flexibly and feasibly in real time applications. 3
No of Machine× No of Part.
A Clustering Algorithm Using the Ordered Weight Sum of SOFMs
907
References 1. Anderson, E.: The IRIS's of the Gaspe Peninsula, Bull. Amer. IRIS Soc., 59 (1939) 2-5 2. Chan, H. M., Milner, D. A.: Direct clustering algorithm for group formation in cellular manufacturing, Journal of Manufacturing Systems, 1(1) (1982) 65-75 3. Chandrasekharan, M. P., Rajagopolan, R.: ZODIAC: An algorithm for concurrent format of part-families and machine-cells, International Journal of Production Research, 25(6) (1987) 835-850 4. Chandrasekharan, M. P., Rajagopolan, R.: Groupability: An analysis of the properties of binary data matrices for group technology, International Journal of Production Research, 27(6) (1989) 1035-1052 5. Everitt, B. S., Laudau, S., Leese, M.: Cluster Analysis, Edward Arnold, London (2001) 6. Huntsberger, T. L., Ajjimarangsee, P.: Parallel Self-Organizing Feature Maps for Unsupervised Pattern Recognition, International Journal of General Systems, 16(4) (1990) 357-372 7. Karayiannis, N. B.: A Methodology for Constructing Fuzzy Algorithms for Learning Vector Quantization, IEEE Trans. Neural Networks, 8(3) (1997) 505-518 8. Kohonen, T.: Self-Organizing Maps, Springer, Berlin (1997) 9. Kusiak, A.: Computational Intelligence in Design and Manufacturing, John Wiley & Sons, New York (2000) 10. Mangiameli, P., Chen, S. K., West, D.: A Comparison of SOM Neural Network and Hierarchical Clustering Methods, European Journal of Operational Research, 93(2) (1996) 402-407 11. Pal, N. N., Bezdek, J. C., Tasao, E. C. -K.: Generalized Clustering Networks and Kohonen's Self-Organizing Scheme, IEEE Trans. Neural Networks, 4(4) (1993) 549-551 12. Sirnivasan, G., Narendran, T.T., Mahadevan, B.: An assignment model for the part families problem in group technology, International Journal of Production Research, 28(1) (1990) 145-152 13. Tasao, E. C. -K., Bezdek, J. C., Pal, N. N.: Fuzzy Kohonen Clustering Networks, Pattern Recognition, 27(5) (1994) 754-757
Global Optimization of the Scenario Generation and Portfolio Selection Problems Panos Parpas and Ber¸c Rustem Department of Computing, Imperial College, London SW7 2AZ
Abstract. We consider the global optimization of two problems arising from financial applications. The first problem originates from the portfolio selection problem when high-order moments are taken into account. The second issue we address is the problem of scenario generation. Both problems are non-convex, large-scale, and highly relevant in financial engineering. For the two problems we consider, we apply a new stochastic global optimization algorithm that has been developed specifically for this class of problems. The algorithm is an extension to the constrained case of the so called diffusion algorithm. We discuss how a financial planning model (of realistic size) can be solved to global optimality using a stochastic algorithm. Initial numerical results are given that show the feasibility of the proposed approach.
1
Introduction
We consider the global optimization of two problems arising from financial applications. The first problem originates from the portfolio selection problem when high-order moments are taken into account. This model is an extension of the celebrated mean-variance model of Markowitz[1, 2]. The inclusion of higher order moments has been proposed as one possible augmentation to the model in order to make it more applicable. The applicability of the model can be broadened by relaxing one of its major assumptions, i.e. that the rate of returns are normal. The second issue we address is the problem of scenario generation i.e. the description of the uncertainties used in the portfolio selection problem. Both problems are non-convex, large-scale, and highly relevant in financial engineering. Given the numerical and theoretical challenges presented by these models we only consider the “vanilla” versions of the two problems. In particular, we focus on a single period model where the decision maker (DM) provides as input preferences with respect to mean, variance, skewness, and possibly kurtosis of the portfolio. Using these four parameters we then formulate the multi-criteria optimization problem as a standard nonlinear programming problem. This version of the decision model is a non-convex linearly constrained problem. Before we can solve the portfolio selection problem we need to describe the uncertainties regarding the returns of the risky assets. In particular we need to specify: (1) the possible states of the world and (2) the probability of each state. A common approach to this modeling problem is the method of matching moments (see e.g. [3, 4, 5]). The first step in this approach is to use the historical M. Gavrilova et al. (Eds.): ICCSA 2006, LNCS 3982, pp. 908–917, 2006. c Springer-Verlag Berlin Heidelberg 2006
Global Optimization of the Scenario Generation
909
data in order to estimate the moments (in this paper we consider the first four central moments i.e. mean, variance, skewness, and kurtosis). The second step is to compute a discrete distribution with the same statistical properties as the ones calculated in the previous step. Given that our interest is on real– world applications we recognize that there may not always be a distribution that matches the calculated statistical properties. For this reason we formulate the problem as a least squares problem [3, 4]. The rationale behind this formulation is that we try to calculate a description of the uncertainty that matches our beliefs as well as possible. The scenario generation problem also has a non-convex objective function, and is linearly constrained. For the two problems described above we apply a new stochastic global optimization algorithm that has been developed specifically for this class of problems. The algorithm is described in [6] (see also section 4). It is an extension to the constrained case of the so called diffusion algorithm [7, 8, 9, 10]. The method follows the trajectory of an appropriately defined Stochastic Differential Equation (SDE). Feasibility of the trajectory is achieved by projecting its dynamics onto the set defined by the linear equality constraints. A barrier term is used for the purpose of forcing the trajectory to stay within any bound constraints (e.g. positivity of the probabilities, or bounds on how much of each asset to own). The purpose of this paper is to show that stochastic optimization algorithms can be used to solve realistic financial planning problems. A review of applications of global optimization to portfolio selection problems appeared in [11]. A deterministic global optimization algorithm for a multi-period model appeared in [12]. This paper extends and complements the methods mentioned above in the sense that we describe a complete framework for the solution of a realistic financial model. The type of models we consider, due to the large number of variables, cannot be solved by deterministic algorithms. Consequently, practitioners are left with two options: solve a simpler, but less relevant model, or use a heuristic algorithm (e.g. tabu–search or evolutionary algorithms). The approach proposed in this paper lies somewhere in the middle. The proposed algorithm belongs to the simulated–annealing family of algorithms, and it has been shown in [6] that it converges to the global optimum (in a probabilistic sense). Moreover, the computational experience reported in [6] seems to indicate that the method is robust (in terms of finding the global optimum) and reliable. We believe that such an approach will be useful in many practical applications. Admittedly the models (especially the portfolio selection problem) are rather simplistic. Given the theoretical and computational difficulties involved with such models it is important to consider the simplified version of the problem in the hope that this approach will shed more light to the general case. Moreover, to the authors’ knowledge this is the first paper to address, in a holistic manner, the global optimization of the moment problem and the portfolio selection problem with higher order moments. The rest of the paper is structured as follows: in section 2 we describe the scenario generation problem. While there are many ways to generate scenario trees for stochastic programming problems, we will focus on the moment matching
910
P. Parpas and B. Rustem
approach. The interested reader is referred to [3] for a review of other methods. Also in this section we discuss the importance of arbitrage opportunities; we describe how we dealt with this requirement of financial models in our implementation. In section 3 we discuss the portfolio selection problem. A model with a non-convex objective and linear constraints is proposed as a simple extension to the classical Markowitz model. The non-convexities in the model arise from the inclusion of higher order moments. The model considered here relaxes the normality assumption of the classical model, the reader is referred to [13] for a more complete overview of non-convex optimization problems in financial applications. In section 4 we describe an algorithm for the solution of the two models described above. For a full theoretical treatment of the algorithm we refer the interested reader to [6]. In section 5 we present some initial numerical experiments. We study how difficult (in terms of computation time) it is to compute an arbitrage free scenario tree. We also study how the global optimum changes as we vary the parameters of the model. To illustrate the effect of the parameters we present some 3-dimensional plots of efficient frontiers, the analogs of the classical Markowitz efficient frontiers.
2
Scenario Generation
From its inception Stochastic Programming (SP) has found several diverse applications as an effective paradigm for modeling decisions under uncertainty. The focus of initial research was on developing effective algorithms for models of realistic size. An area that has only recently received attention is on methods to represent the uncertainties of the decision problem. A review of available methods to generate meaningful descriptions of the uncertainties from data can be found in[3]. We will use a least squares formulation (see e.g. [3, 4]). It is motivated by the practical concern that the moments, given as input, may be inconsistent. Consequently the best one can do is to find a distribution that fits the available data as well as possible. It is further assumed that the distribution is discrete. Under these assumptions the problem can be written as: n k 2 min pj mi (ωj ) − μi ω,p
s.t
i=1 k
j=1
pj = 1 pj ≥ 0 j = 1, . . . , k
j=1
Where μi represent the statistical properties of interest, and mi (·) is the associated ‘moment’ function. For example, if μi is the target mean for the ith asset then mi (ωj ) = ωji i.e. the j th realization of the ith asset. Numerical experiments using this approach for a multistage model, were reported in [4] (without arbitrage considerations). Other methods such as maximum entropy[14], and semidefinite programming [15], while they enjoy strong theoretical properties
Global Optimization of the Scenario Generation
911
they cannot be used when the data of the problem are inconsistent. A disadvantage of the least squares model is that it is highly non-convex which makes it very difficult to handle numerically. These considerations have lead to the development of the algorithm described in section 4 (see also [6]) that can efficiently compute global optima for problems in this class. When using scenario trees for financial planning problems it becomes necessary to address the issue of arbitrage opportunities[4, 16]. An arbitrage opportunity is a self–financing trading strategy that generates a strictly positive cash flow in at least one state and whose payoffs are nonnegative in all other states. In other words it is possible to get something for nothing. In our implementation we eliminate arbitrage opportunities by computing a sufficient set of states so that the resulting scenario tree has the arbitrage free property. This is achieved by a simple two step process. In the first step we generate random rates of returns, these are sampled by a uniform distribution. We then test for arbitrage by solving the system: xi0 = e−r
m
xij πj ,
j=1
m
πj = 1, , πj ≥ 0,
j = 1, . . . , m i = 1, . . . , n.
(1)
j=1
Where xi0 represents the current (known) state of the world for the ith asset, xij represents the j th realization of the ith asset in the next time period (these are generated by the simulations mentioned above). r is the risk-less rate of return. The πj are called the risk neutral probabilities. According to a fundamental result of Harisson and Kerps [17], the existence of the risk neutral probabilities is enough to guarantee that the scenario tree has the desired property. In the second step, we solve the least squares problem with some of the states fixed to the states calculated in the first step. In other words, we solve the following problem: min ω,p
s.t
n k i=1 k+m
j=1
pj mi (ωj ) +
m
pl mi (ˆ ωl ) − μi
2
l=1
pj = 1 pj ≥ 0 j = 1, . . . , k + m
(2)
j=1
In the problem above, ω ˆ are fixed. Solving the preceding problem guarantees a scenario tree that is arbitrage free.
3
Portfolio Selection
In this section we describe the portfolio selection problem when higher order terms are taken into account. The classical mean–variance approach to portfolio analysis seeks to balance risk (measured by variance) and reward (measured by expected value). There are many ways to specify the single period problem. We will be using the following basic model:
912
P. Parpas and B. Rustem
min − αE[w] + βV[w] w
s.t
n
wi = 1 li ≤ wi ≤ ui i = 1, . . . , n.
(3)
i=1
Where E[·] and V[·] represent the mean rate of return and its variance respectively. The single constraint is known as the budget constraint and it specifies the initial wealth (without loss of generality we have assumed that this is one). The α and β are positive scalars, and are chosen so that α + β = 1. They specify the DMs preferences, i.e. α = 1 means that the DM is risk–seeking, while β = 1 implies that the DM is risk averse. Any other selection of the parameters will produce a point on the efficient frontier. The decision variable (w) represents the commitment of the DM to a particular asset. Note that this problem is a convex quadratic programming problem for which very efficient algorithms exists. The interested reader is referred to the review in [13] for more information regarding the Markowitz model. We propose an extension of the mean–variance model using higher order moments. The vector optimization problem can be formulated as a standard non-convex optimization problem using two additional scalars to act as weights. These weights are used to enforce the DMs preferences. The problem is then formulated as follows: min − αE[w] + βV[w] − γS[w] + δK[w] w
s.t
n
wi = 1 li ≤ wi ≤ ui i = 1, . . . , n.
(4)
i=1
Where S[·] and K[·] represent the skewness and kurtosis of the rate of return respectively. γ and δ are positive scalars. The four scalar parameters are chosen so that they sum to one. Positive skewness is desirable (since it corresponds to higher returns albeit with low probability) while kurtosis is undesirable since it implies that the DM is exposed to more risk. The model in (4) can be extended to multiple periods while maintaining the same structure (non convex objective and linear constraints). The numerical solution of (2) and (4) will be discussed in the next two sections.
4
A Stochastic Optimization Algorithm
The models described in the previous section can be written as: min f (x) x
s.t Ax = b x ≥ 0. A well known method for obtaining a solution to an unconstrained optimization problem is to consider the following Ordinary Differential Equation (ODE): dX(t) = −∇f (X(t))dt.
(5)
Global Optimization of the Scenario Generation
913
By studying the behavior of X(t) for large t, it can be shown that X(t) will eventually converge to a stationary point of the unconstrained problem. A review of, so called, continuous-path methods can be found in [18]. A deficiency of using (5) to solve optimization problems, is that it will get trapped in local minima. In order to allow the trajectory to escape from local minima, it has been proposed by various authors (e.g. [7, 8, 9, 10]) to add a stochastic term that would allow the trajectory to “climb” hills. One possible augmentation to (5) that would enable us to escape from local minima is to add noise. One then considers the diffusion process: dX(t) = −∇f (X(t))dt +
2T (t)dB(t).
(6)
Where B(t) is the standard Brownian motion in Rn . It has been shown in [8, 9, 10], under appropriate conditions on f , and T (t), that as t → ∞, the transition probability of X(t) converges (weakly) to a probability measure Π. The latter, has its support on the set of global minimizers. For the sake of argument, suppose we did not have any linear constraints, but only positivity constraints. We could then consider enforcing the feasibility of the iterates by using a barrier function. According to the algorithmic framework sketched-out above, we could obtain a solution to our (simplified) problem, by following the trajectory of the following SDE: dX(t) = −∇f (X(t))dt + μX(t)−1 dt +
2T (t)dB(t).
(7)
Where μ > 0, is the barrier parameter. By X −1 , we will denote an n-dimensional vector whose ith component is given by 1/Xi . Having used a barrier function to deal with the positivity constraints, we can now introduce the linear constraints into our SDE. This process has been carried out in [6] using the projected SDE: dX(t) = P [−∇f (X(t)) + μX(t)−1 ]dt +
2T (t)P dB(t).
(8)
Where, P = I − AT (AAT )−1 A. The proposed algorithm works in a similar manner to gradient projection algorithms. The key difference is the addition of a barrier parameter for the positivity of the iterates, and a stochastic term that helps the algorithm escape from local minima. The global optimization problem can be solved by fixing μ, and following the trajectory of (8) for a suitably defined function T (t). After sufficiently enough time passes, we reduce μ, and repeat the process. The proof that following the trajectory of (8) will eventually lead us to the global minimum appears in [6]. Note that the projection matrix for the type of constraints we need impose to n for our models is particularly simple. For a constraint of the type: i=1 xi = 1 the projection matrix is given by: Pij =
− n1 n−1 n
if i = j, otherwise.
914
5
P. Parpas and B. Rustem
Numerical Experiments
The algorithm described in the previous section was implemented in C++. Before we discuss our numerical results we provide some useful implementation details. From similar studies in the unconstrained case[7] and box constrained case[19], we know that a deficiency of stochastic methods (of the type proposed in this paper) is that they require a large number of function evaluations. The reason of this shortcoming is that the annealing schedule has to be sufficiently slow in order to allow the trajectory to escape from local minima. Therefore, whilst there are many sophisticated methods for the numerical solution of SDEs, we decided to use the cheaper stochastic Euler method. The latter method is a generalization of the well known Euler method, for ODEs, to the stochastic case. The main iteration is given by: X(t + 1) = X(t) + P [−∇f (X(t)) + μX(t)−1 ]Δt + 2T (t)ΔtP u. Where Δt is the discritized step length parameter, u is a standard Gaussian vector, i.e. u ∼ N (0, I), and X(0) is chosen to be strictly feasible. The algorithm starts by dividing the discritized time into k periods. Following a single trajectory will be too inefficient. Therefore, starting from a single strictly feasible point the algorithm generates m different trajectories. After a single period elapses, we remove the worst performing trajectory. Since, all trajectories generate feasible points, we can assess the quality of the trajectory by the best objective function value achieved on the trajectory. We then randomly select one of the remaining trajectories, and duplicate it. At this stage we reduce the noise coefficient of the duplicated trajectory. When all the periods have been completed, in the manner described above, we count this event as one iteration. At this point we reduce the barrier parameter. This parameter is started at μ = 0.1, and reduced by 0.75 at every iteration. We then repeat the same process, with all the trajectories starting from the best point found so far. If the current incumbent solution vector remained the same for more than l iterations (l > 4, in our implementation) then we reset the noise to its initial value. The algorithm terminates when the noise term is smaller than a predefined value (0.1e − 4) or when after five successive resets of the noise term, no improvement could be made. In our implementation we used two trajectories, two periods (each of length 20e4). We used an initial value of 10 for the annealing schedule, and reduced it by 0.6 at every iteration. The same parameters were used for all the simulations. In table 1 we show the computational effort required to compute the sufficient set of states required to guarantee the arbitrage free property of the scenario tree. The numbers shown are averages of 50 runs. It is clear from table 1 that it is relatively easy to find the sufficient states. However, as the number of assets increases the number of states also increases. While in a single period model this does not cause much computational burden, it suggests that it will lead to a state explosion in the multi-period case. In the future we plan to investigate the approach of finding the state that causes the arbitrage opportunity and eliminating/modifying it rather than just adding more states.
Global Optimization of the Scenario Generation
915
Table 1. States added to guarantee arbitrage free tree
Assets States Added Time (secs) 2
4
0
5
14
0
10
27
0.01
15
77
0.14
20
170
0.79
Table 2. Solution Times Moment Problem
Table 3. Solution Times Portfolio Selection
Assets Time (secs) Variables
Assets M-V M-V-S M-V-K M-V-S-K
2 5 10 15 20
820 1230 4160 23316 52544
105 111 124 184 242
2 5 10 15 20
0.01 0.3 1.3 4 14.8
56 138 250 375 551
36 101 195 433 776
34 204 195 519 762
In table 2 we show the time required to solve the moment problem. The number of variables differ from one run to the next. This is because the number of states that are needed to guarantee the arbitrage free property differ from run to run (since they are randomly generated). In all runs we added fifty more (non-constant) states. The resulting problem given by (2) was then solved using the algorithm described above. The times shown are the averages for ten runs for problems with 2, 5, and 10 assets. Due to the large amount of time required to solve the larger problems (15 and 20 assets) the times reported are from a single run. Table 3 details the time required to generate a point on the efficient frontier for the four versions of the portfolio selection problem we considered in this paper. The first is the mean-variance (M-V) model, this is obtained by setting γ = δ = 0 (we used this model to test the quality of the solutions provided by the algorithm). Similarly M-V-S, M-V-K and MV-S-K stand for Mean-Variance-Skewness, Mean-Variance-Kurtosis and MeanVariance-Skewness-Kurtosis respectively. We realize that providing the computation times is not the best way to judge the speed of an algorithm. However, one of the aims of this paper is to show how one can use a stochastic global optimization algorithm to solve a financial planning problem. Even though there are many open questions, and some of our assumptions may be too stringent, we believe that the computation times tabulated below show the feasibility of this approach. In figures 1 and 2 we show some 3-dimensional efficient frontiers for the M-V-S and M-V-K problems respectively. The gaps that appear in the
916
P. Parpas and B. Rustem
frontiers are due to the way we generate the frontier. If we used constraints, instead of weights, to express the preferences of the DM, then we believe that the frontier would look more smooth. However, a formulation using constraints would lead to an optimization problem that could not be solved by our global optimization solver. We plan to address this deficiency in the future. Figures 3 and 4 show efficient frontiers using the M-V-S-K model. In figure 3 we plot the first three measures of interest, while in figure 4 we plot the mean, variance and kurtosis of the portfolio. Skewness
Kurtosis
0.009 0.008 0.007 0.006 0.005 0.004 0.003 0.002 0.001 0
0.045 0.04 0.035 0.03 0.025 0.02 0.015 0.01 0.005 0
0.12 0.1 0.08 Variance
0.06 0.04 0.02 0.495 0 0.494
0.496
0.497
0.498
0.499
0.5
0.501
0.502
Mean
Fig. 1. Mean-Variance-Skewness Skewness
6
0.496
0.498
0.499
0.5
0.501
0.502
Mean
Fig. 2. Mean-Variance-Kurtosis
0.06 0.05 0.04 0.03 0.02 0.01 0
0.12
Fig. 3. (Kurtosis)
0.495 0 0.494
0.497
Kurtosis
0.009 0.008 0.007 0.006 0.005 0.004 0.003 0.002 0.001 0
0.1 0.08 0.06 Variance 0.04 0.02
0.1 0.09 0.08 0.07 0.06 0.05 0.04 Variance 0.03 0.02 0.01
0.12
0.498 0.499 0.496 0.497 00.494 0.495 Mean
0.5
0.501 0.502
Mean-Variance-Skewness-
0.1 0.08 0.06 0.04 0.02
Variance
Fig. 4. Kurtosis
0.498 0.499 0.496 0.497 00.494 0.495 Mean
0.5
0.501 0.502
Mean-Variance-(Skewness)-
Conclusions
We considered the computational challenges associated with the global optimization of a financial planning model. We proposed a simple extension to the classical Markowitz model; in the proposed model higher order moments were included using scalar weights. The scenario generation problem was also addressed by matching the first four central moments of the postulated distribution. We also addressed the issue of imposing the arbitrage free property to the generated scenario tree. A stochastic algorithm was proposed for the two models. Our initial numerical results show that problems of realistic size can be solved using the proposed framework.
Global Optimization of the Scenario Generation
917
References 1. Markowitz, H.M.: Portfolio selection. J. Finance 7 (1952) 77–91 2. Markowitz, H.M.: The utility of wealth. J. Polit. Econom. (1952) 151–158 3. Dupacova, J., Consigli, G., Wallace, S.: Scenarios for multistage stochastic programs. Ann. Oper. Res. 100 (2000) 25–53 (2001) 4. G¨ ulpınar, N., Rustem, B., Settergren, R.: Simulation and optimization approaches to scenario tree generation. J. Econom. Dynam. Control 28 (2004) 1291–1315 5. Pr´ekopa, A.: Stochastic programming. Volume 324 of Mathematics and its Applications. Kluwer Academic Publishers Group, Dordrecht (1995) 6. Parpas, P., Rustem, B., Pistikopoulos, E.N.: Linearly constrained global optimization and stochastic differential equations. Accepted in the J. Global Optim. (2006) 7. Aluffi-Pentini, F., Parisi, V., Zirilli, F.: Global optimization and stochastic differential equations. J. Optim. Theory Appl. 47 (1985) 1–16 8. Chiang, T., Hwang, C., Sheu, S.: Diffusion for global optimization in Rn . SIAM J. Control Optim. 25 (1987) 737–753 9. Geman, S., Hwang, C.: Diffusions for global optimization. SIAM J. Control Optim. 24 (1986) 1031–1043 10. Gidas, B.: The Langevin equation as a global minimization algorithm. In: Disordered systems and biological organization (Les Houches, 1985). Volume 20 of NATO Adv. Sci. Inst. Ser. F Comput. Systems Sci. Springer, Berlin (1986) 321– 326 11. Konno, H.: Applications of global optimization to portfolio analysis. In C., A., P., H., G., S., eds.: Essays and Surveys in Global Optimization. Springer (2005) 195–210 12. Maranas, C.D., Androulakis, I.P., Floudas, C.A., Berger, A.J., Mulvey, J.M.: Solving long-term financial planning problems via global optimization. J. Econom. Dynam. Control 21 (1997) 1405–1425 Computational financial modeling. 13. Steinbach, M.C.: Markowitz revisited: mean-variance models in financial portfolio analysis. SIAM Rev. 43 (2001) 31–85 (electronic) 14. Parpas, P., Rustem, B.: Entropic regularization of the moment problem. Submitted (2005) 15. Bertsimas, D., Sethuraman, J.: Moment problems and semidefinite optimization. In: Handbook of semidefinite programming. Volume 27 of Internat. Ser. Oper. Res. Management Sci. Kluwer Acad. Publ., Boston, MA (2000) 469–509 16. Klaassen, P.: Discretized reality and spurious profits in stochastic programming models for asset/liability management. E.J of Op. Res. 101 (1997) 374–392 17. Harrison, J., D.M., K.: Martingales and arbitrage in multiperiod securities markets. J. Econom. Theory 20 (1979) 381–408 18. Zirilli, F.: The use of ordinary differential equations in the solution of nonlinear systems of equations. In: Nonlinear optimization, 1981 (Cambridge, 1981). NATO Conf. Ser. II: Systems Sci. Academic Press, London (1982) 39–46 19. Recchioni, M.C., Scoccia, A.: A stochastic algorithm for constrained global optimization. J. Global Optim. 16 (2000) 257–270
A Generalized Fuzzy Optimization Framework for R&D Project Selection Using Real Options Valuation* E. Ertugrul Karsak Industrial Engineering Department, Galatasaray University, Ortaköy, Istanbul 80840, Turkey [email protected]
Abstract. Global marketplace and intense competition in the business environment lead organizations to focus on selecting the best R&D project portfolio among available projects using their scarce resources in the most effective manner. This happens to be a sine qua non for high technology firms to sharpen their competitive advantage and realize long-term survival with sustainable growth. To accomplish that, firms should take into account both the uncertainty inherent in R&D using appropriate valuation techniques accounting for flexibility in making investment decisions and all possible interactions between the candidate projects within an optimization framework. This paper provides a fuzzy optimization model for dealing with the complexities and uncertainties regarding the construction of an R&D project portfolio. Real options analysis, which accounts for managerial flexibility, is employed to correct the deficiency of traditional discounted cash flow valuation that excludes any form of flexibility. An example is provided to illustrate the proposed decision approach.
1 Introduction In this paper, the research and development (R&D) project selection problem is addressed. R&D project selection examines the allocation of company’s scarce resources such as budget, manpower, etc. to a set of proposals to enhance its strategic performance on a scientific and technological basis. R&D is crucial for a company’s competitive advantage, survival and sustainable growth. R&D enables the company to develop new products or services, enhance existing ones, and increase efficiency while lowering cost of the production processes. This paper focuses on the problem of selecting a portfolio of R&D projects when both vagueness and uncertainty in data and interactions between candidate projects exist. For the case of crisp data, early work dates back to Weingartner [16], and since then R&D project selection has been an active area of research for academics and practitioners. Liberatore [10] proposed a decision framework based on the analytic hierarchy process (AHP) and integer programming for R&D project selection. Inadequate representation of project interdependencies, and the inability to incorporate the uncertainty inherent in projects and interactions between projects are the major *
This research has been financially supported by Galatasaray University Research Fund.
M. Gavrilova et al. (Eds.): ICCSA 2006, LNCS 3982, pp. 918 – 927, 2006. © Springer-Verlag Berlin Heidelberg 2006
A Generalized Fuzzy Optimization Framework for R&D Project Selection
919
shortcomings of the previously proposed analytical models for R&D project selection. Within the last decade, researchers, in particular, addressed the proper treatment of project interdependencies. Schmidt [13] presented a model that considered benefit, outcome and resource interactions and proposed a branch and bound algorithm to obtain a solution for the nonlinear integer programming problem with quadratic constraints. Meade and Presley [12] proposed the utilization of the analytic network process (ANP) that enables the decision-maker to take into consideration interdependencies among criteria and candidate projects. One should note the resource feasibility problem that may be encountered while using the ANP by itself. Lee and Kim [8] presented an integrated application of the ANP and zero-one goal programming for information system project selection to consider resource feasibility as well as project interdependence. Although the aforementioned valuable contributions considered interactions between projects, they all relied on crisp data. R&D projects comprise a high degree of uncertainty, which generally precludes the availability of obtaining exact data regarding benefit, resource usage, and interactions between projects. Fuzzy set theory appears as a useful tool to account for vagueness and uncertainty inherent in the R&D project selection process. Recently, a model that handles fuzzy benefit and resource usage assuming that a project can influence at most one other was developed; however, an optimization procedure was not presented to solve the proposed model [7]. Furthermore, the research studies cited above use the traditional discounted cash flow (DCF) techniques such as net present value (NPV) in its static form for calculating the benefits from R&D investments. Lately, options valuation approach has been proposed as a more suitable alternative for determining the benefits from R&D projects [9]. Options approach deviates from the conventional DCF approach in that it views future investment opportunities as rights without obligations to take some action in the future [3]. The asymmetry in the options expands the NPV to include a premium beyond the static NPV calculation, and thus presumably increase the total value of the project and the probability of justification. Carlsson and Fullér [1] further extended the use of options approach in R&D project valuation by considering the possibilistic mean and variance of fuzzy cash flow estimates. This paper presents a novel fuzzy formulation for R&D project selection accounting for project interactions with the objective of maximizing the net benefit based on expanded net present value which incorporates the real options inherent in R&D projects in a fuzzy setting. Although a fuzzy optimization model is provided for R&D portfolio selection in [14], the project interactions are completely ignored. Compared with the real options valuation procedures delineated in [1, 14], the valuation approach utilized in this paper models exercise price as a stochastic variable enabling to deal with technological uncertainties in real options analysis, and considers both the benefits of keeping the development option alive and the opportunity cost of delaying development. Moreover, this paper’s focus is not limited to valuation of R&D projects using fuzzy cash flow estimates since the proposed optimization framework enables constructing an optimal portfolio of R&D projects considering the commonly encountered project interdependencies regarding resource usage. The proposed model will lead to a binary integer program with nonlinear constraints. In this paper, a solution procedure based on linearization of the nonlinear constraints is provided and thus the resulting linear problem can be solved with widely available solvers. The
920
E.E. Karsak
proposed optimization approach is also advantageous compared with heuristics in that the obtained solution is the global optimum of the problem. The rest of the paper is organized as follows. Section 2 reviews the sequential exchange options for valuing R&D projects. Section 3 outlines the approach to incorporate fuzzy cash flows into the valuation methodology. A fuzzy optimization model with nonlinear constraints which is later converted into a crisp linear binary integer program is introduced in Section 4. A comprehensive example is presented in the subsequent section to illustrate the application of the proposed framework. Finally, conclusions and directions for future research are provided in Section 6.
2 Real Options Approach to Valuation of R&D Projects An option is the right, but not the obligation, to buy (if a call) or sell (if a put) a particular asset at a specified price on or before a certain expiration date. The buyer of an option may choose to exercise his right and take a position in the underlying asset while the option seller, also known as the option writer, is contractually obligated to take the opposite position in the underlying asset if the buyer exercises his right. The price at which the buyer of an option may buy or sell the underlying asset is the exercise price. An American option can be exercised at any time prior to expiration, while a European option allows exercise only on its expiration date. An American exchange option, which can be cited among options with more complicated payoffs than the standard European or American calls and puts, gives its owner the right to exchange one asset for another at any time up to and including expiration. While financial options are options on financial assets, real options are opportunities on real assets that can provide management with valuable operating flexibility and strategic adaptability. Akin to financial options, real options enable their owners to revise future investment and operating decisions according to the market conditions. A substantial part of the market value of companies operating in volatile and unpredictable industries such as electronics, telecommunications, and biotechnology can be attributed to the real options that they possess [3]. Real options preclude the traditional passive analysis of investments, and imply active management approach with an ability to respond to changing conditions. Real options approach enables the firm to evaluate the project in a multi-stage context, providing the means to revise the decisions based on new information. It is reported that American sequential exchange options provide a more realistic valuation of R&D projects compared with other option models when R&D projects incorporate stages of research and/or sequential investment opportunities [9]. In this paper, an efficient method for valuing American sequential exchange options when both underlying assets pay dividends continuously and there exists a possibility of early exercise is employed. The earlier work on exchange options dates back to Margrabe [11], who derived a pricing equation for the exchange options on non-dividendpaying assets. Although elegant by its ability to model exercise price as a stochastic variable enabling to deal with technological uncertainties in real options analysis, easy to use formula developed by Margrabe [11] falls short of incorporating dividends into the analysis, which may especially be crucial in valuation of real options due to the fact that the underlying assets are generally not traded.
A Generalized Fuzzy Optimization Framework for R&D Project Selection
921
Here, an option to exchange asset D for asset V at time T is considered. Asset D is referred as the delivery asset, and asset V as the optioned asset. The payoff to this European option at time T is given as max(0, VT - DT), where VT and DT are the underlying assets’ terminal prices. The asset prices prior to expiration, i.e. V and D, are assumed to follow geometric Brownian motion as
dV = (α v − δ v )dt + σ v dZ v , V dD = (α d − δ d )dt + σ d dZ d , D
(1)
⎛ dV dD ⎞ cov⎜ , ⎟ = ρ vd σ v σ d dt . ⎝ V D ⎠ where α v and α d are the expected rates of return on the two assets, δ v and δ d are the corresponding dividend yields, σ v2 and σ d2 are the respective variance rates, and dZ v and dZ d are increments of the Wiener processes at time t (t ∈ [0, T ]) . The rates ⎛ dV ⎞ ⎛ dD ⎞ of price change, i.e. ⎜ ⎟ and ⎜ ⎟ , can be correlated with the correlation coeffi⎝ V ⎠ ⎝ D ⎠ cient denoted by ρ vd . The parameters δ v , δ d , σ v , σ d , and ρ vd are non-negative constants. δ , which denotes the difference in dividend yields, can be defined as δv − δd . In this paper, the model proposed by Carr [2] for valuing American exchange options on dividend-paying assets is used. Carr [2] generalized the solution of Geske and Johnson [4], which was initially developed for valuing an American put option, to American exchange options on assets with continuous dividends. Geske and Johnson [4] viewed an American put option as the limit to a sequence of pseudo-American puts. A pseudo-American option can only be exercised at a finite number of discrete exercise points. In the limiting case, the value of a pseudo-American option approaches the exact value of a true American put option. Geske and Johnson [4] achieved accuracy by considering put options which can be exercised at a small number of discrete time points, and then employed the values obtained at these exercise dates to extrapolate to the value of a put option that can be exercised at any date. The details of the valuation formula for the general pseudo-American exchange option are not provided here due to limited space. The reader may refer to Karsak and Özogul [6] for a detailed presentation.
3 Using Fuzzy Sets for Modeling Uncertainty in Project Selection The fuzzy set theory deals with problems in which a source of imprecision and vagueness is involved. A fuzzy set can be defined mathematically by assigning to each possible individual in the universe of discourse a value representing its grade of membership in the fuzzy set. This grade corresponds to the degree to which individual is compatible with the concept represented by the fuzzy set. A convex and normalized
922
E.E. Karsak
fuzzy set defined on ℜ with a piecewise continuous membership function is called a fuzzy number. Uncertainty and imprecision in parameters such as cash flow estimates can be incorporated into the R&D project selection framework using fuzzy numbers. ~ A = (c L , c R , s L ,s R ) is a trapezoidal fuzzy number with the membership function defined as
f
⎧ ( x − (c L − s L )) / s L , ⎪ 1, ⎪ ~ ( x) = ⎨ A ⎪((c R + s R ) − x) / s R , ⎪⎩ 0,
cL − sL ≤ x ≤ cL cL ≤ x ≤ cR cR ≤ x ≤ cR + sR
(2)
otherwise
where cL and cR are the left and right core values, and sL and sR are the left and right ~ spreads, respectively. The support of A is (cL - sL, cR + sR). If cL = cR = c, the resulting ~ fuzzy number is a triangular fuzzy number denoted as A = (c, s L , s R ) . Further, when ~ sL = sR = s, a symmetric triangular fuzzy number A = (c, s ) is obtained. ~ The possibilistic mean and variance of a trapezoidal fuzzy number A are defined as [1] ~ c + cR sR − sL E ( A) = L + , 2 6
(3)
(c − c L )2 (c R − c L )(s L + s R ) (s L + s R )2 ~ . Var ( A) = R + + 4 6 24
(4)
4 Fuzzy Optimization Framework for R&D Project Selection This paper considers constructing an R&D project portfolio, where there are m candidate R&D projects. The binary decision variable xi (i = 1, …, m) corresponds to the ith R&D project, where xi = 1 if R&D project i is selected and xi = 0 otherwise. The objective is to maximize the total net benefit obtained from the R&D project portfolio. Resource constraints related to the initial expenditures for the R&D projects and the skilled workforce (in man-hours) are considered as well as the interdependencies among the R&D projects regarding the use of these resources. There is an estimate for ~ budget limit for initial expenditures ( TB ) and an estimate for skilled workforce limit ~ required for the development phase of R&D projects ( TW ). Both of these estimates as well as the estimates for resource usages and shared resources for the R&D projects are represented as fuzzy numbers due to the imprecise nature of the problem. The proposed model also enables to account for project contingencies, which indicate a project cannot be implemented unless a related project is also selected. Other restrictions regarding the construction of the R&D project portfolio such as mutually exclusive projects or mandated projects can be readily appended to the proposed model.
A Generalized Fuzzy Optimization Framework for R&D Project Selection
m
923
(5)
max Z * = ∑ Vie x i , i =1
subject to m ~
m −1 m
i =1
i =1 j =i +1
∑ Ci xi − ∑
m− 2 m −1
~
∑ C ij x i x j + ∑
∑
m
~
~
∑ C ijk x i x j x k ≤ T B ,
(6)
i =1 j =i +1 k = j +1
m ~
m −1 m ~
m− 2 m −1
i =1
i =1 j =i +1
i =1 j =i +1 k = j +1
∑ Wi x i − ∑ ∑ Wij x i x j + ∑ ∑
m ~
~
∑ Wijk xi x j x k ≤ TW ,
(7)
∑ xi ≥ Y j x j , j ∈ ΘY ,
(8)
x i ∈ {0,1}, i = 1, … , m .
(9)
i∈Y j
Formula (5) represents the objective of maximizing the total net benefit of the R&D project portfolio, where Vie denotes the net benefit obtained from project i using the expanded net present value (ENPV). Both formulae (6) and (7) represent resource constraints, which enable project interactions to be taken into account. Uncertain initial expenditure and shared initial expenditure parameters are denoted re~ ~ ~ ~ ~ ~ spectively as C i and C ij , C ijk , while Wi and Wij , Wijk represent fuzzy workforce and fuzzy shared workforce parameters, respectively, for the R&D projects. Although the current formulation assumes that interactions exist among at most three R&D projects, it can be easily extended to include higher number of interdependent projects. In addition to the resource constraints, the formulation includes contingency constraints given by formula (8) indicating that the implementation of the project j is contingent
upon the implementation of all the projects in Y j ⊂ {1,
, m} , where Y j indicates
the cardinal of Yj and Θ Y ⊂ {1, , m} . The formulation given above is a fuzzy nonlinear integer programming model and can be linearized using the approach delineated in [15]. For instance, the nonlinear term x i x j can be linearized by introducing a new variable x ij := x i x j , where x ij ∈ {0,1} , and appending the following linear constraints to the model: x i + x j − x ij ≤ 1 ,
(10)
− x i − x j + 2 x ij ≤ 0 .
(11)
After performing the linearization, the fuzzy linear integer programming formulation can be converted to a crisp mathematical programming model using the possibility theory. Formulae (6) and (7) that incorporate fuzzy parameters can be
924
E.E. Karsak
rewritten as crisp constraints employing the possibilistic approach [5]. For example, an inequality constraint given as ~ a~1 j x1 + a~ 2 j x 2 + … + a~ mj x m ≤ b j (12) can be rewritten as sR ∑ aijcRxi + λ j ∑ aijsRx i ≤ bcR j + (1 − λ j )b j , m
m
i =1
i =1
(13)
where λ j is the satisfaction degree of the constraint, aijcR , bcR j denote the right core ~ values of a~ ij , b j , and aijsR , b sR j denote their right spreads, respectively.
5 Illustrative Example In this section, we consider a technology firm analyzing six R&D project alternatives, where each R&D project consists of a two-stage investment, namely the initial stage and the development stage. The company acquires the right to make a development investment by making the initial investment for each R&D project. Due to imprecise and uncertain nature of R&D investments, project cash flow estimates regarding the development stage are given as fuzzy numbers in Table 1. Although the methodology delineated in the previous section enables to use trapezoidal fuzzy numbers, in order to save space, fuzzy cash flow estimates are pro~ vided as symmetric triangular fuzzy numbers A j = (c, s ) where c is the most likely (core) value and s is the spread, respectively. Resource data for the R&D project alternatives and data related to the project interactions are provided in Table 2 and Table 3, respectively. Project 2 is assumed to be contingent upon the implementation of project 5. Table 1. Fuzzy cash flow estimates (in thousands of dollars) regarding the development stage of the R&D project alternatives
A0 A1 A2 A3 A4 A5 A6
Project 1 (-27000,6000) (7000,1500) (8000,1800) (8500,2200) (8000,2100) (6700,1700) (6000,1600)
Project 2 (-30000,6500) (8000, 1800) (9000, 2100) (9500, 2300) (10500,2400) (8800, 2500) (8000, 2200)
Project 3 (-35000,7500) (11500,2700) (14000,3200) (12000,2900) (10600,2800) (9500,2500) (8500, 2200)
Project 4 (-40000,8500) (7500,1800) (9300,2400) (10500,2500) (11500,3000) (14000,3200) (12500,3000)
Project 5 (-25000,5500) (4000,900) (5400,1300) (6500,1600) (7000,1700) (7500,1900) (9000,2200)
Project 6 (-41000,9000) (11000,2500) (13000,3100) (13500,3400) (16500,4000) (17500,4300) (15000,3800)
Equation (3) is employed to compute the expected value of the development stage investment and the expected value of returns from the development stage investment for each R&D project alternative. The standard deviation of the rate of change of the
A Generalized Fuzzy Optimization Framework for R&D Project Selection
925
returns from the development stage investment and the standard deviation of the rate of change of the development stage investment are taken to be 0.1 and 0.3, respectively. It is assumed that the correlation between development investment and expected value of returns from development investment is 0.5 for each R&D project. The time interval in which the development investment can be realized is taken to be two years. The opportunity cost of delaying development investment ( δ v ) is set as a constant proportional to expected value of returns from the development investment as 0.04 while the depreciation of development investment is determined as a constant proportional to value of development investment as δ d = 0.02. In practice, δ v depends on competitive intensity and market structure characteristics in addition to the anticipated increase in demand and can be measured from market information using econometric methods, whereas δ d can be estimated based on expert opinion [6]. Table 2. Resource data for the R&D project alternatives
Projects 1 2 3 4 5 6
Initial expenditures ($) 6,000,000 9,500,000 13,500,000 7,000,000 3,000,000 20,000,000
Workforce (in man-hours) (200,000, 20,000) (240,000, 20,000) (300,000, 30,000) (320,000, 30,000) (160,000, 20,000) (360,000, 40,000)
Table 3. Data regarding the R&D project interdependencies
Interdependent projects 1, 3 1, 6 2, 4 2, 5 3, 6 2, 4, 5
Shared initial expenditures (in thousands of dollars) (2,000, 400) (1,500, 200) (2,500, 400) (2,000, 200) (3,000, 500) (2,000, 300)
Shared workforce (in man-hours) (30,000, 6,000) (40,000, 8,000) (50,000, 10,000) (24,000, 4,000) (40,000, 10,000) (30,000, 8,000)
NPV for the development stage of each R&D project is calculated using an interest rate of 10%. ENPV for the development stage of each R&D project is obtained employing the real options valuation approach delineated in Section 2. The difference between ENPV and NPV gives the option value for each R&D project. The net benefit obtained from each R&D project using ENPV is computed using equation (14). Net Benefit (R&D project) = Initial Expenditure + ENPV (development stage)
(14)
The results of these valuations are reported in Table 4. In Table 4, C denotes the initial expenditure, V represents the net benefit calculation based on NPV, and V e
926
E.E. Karsak
denotes the net benefit using ENPV for the respective R&D project. As shown in Table 4, the net benefit calculation based on NPV results in negative figures for projects 1, 2, 4 and 5, whereas the net benefit using ENPV yields positive results for every R&D project alternative. In other words, an optimization model that maximizes the total net benefit based on NPV would eliminate projects 1, 2, 4 and 5 as a result of an erroneous valuation procedure ignoring any form of flexibility. Table 4. Valuation results (in dollars) for the R&D project alternatives
NPV ENPV Option Value V = NPV – C V e = ENPV - C
Project 1 5,372,507 6,821,550 1,449,043 -627,493
Project 2 8,999,775 9,900,180 900,405 -500,225
Project 3 13,977,295 14,527,765 550,470 477,295
Project 4 5,996,415 8,695,080 2,698,665 -1,003,585
Project 5 2,500,989 4,614,975 2,113,986 -499,011
Project 6 20,489,506 20,748,132 258,626 489,506
821,550
400,180
1,027,765
1,695,080
1,614,975 748,132
~ ~ Considering T B = (30,000,000, 4,000,000) dollars, TW = (1,000,000, 100,000) hours and a satisfaction degree of 0.8 for the resource constraints ( λ j = 0.8), the optimal solution of the crisp mathematical programming model, which is obtained by applying the linearization scheme delineated in Section 4 and the possibility theory to the fuzzy nonlinear integer programming formulation represented by formulae (5)-(9), is determined as projects 1, 2, 4 and 5. The optimal R&D project portfolios for varying satisfaction degrees of resource constraints are presented in Table 5. As can be seen in Table 5, the portfolio including projects “1, 3, 4, 5”, which yields a higher net benefit figure compared with the portfolio consisting of projects “1, 2, 4, 5”, becomes infeasible as the satisfaction degree for resource constraints increases to 0.8. It is also worth noting that both the real options valuation approach and the optimization framework used in this paper enable further sensitivity analyses regarding parameter flexibilities. Table 5. Optimal solutions for varying λ j values
λj 0.6 0.7 0.8 0.9
Z* 5,159,370 5,159,370 4,531,785 4,531,785
Selected projects 1, 3, 4, 5 1, 3, 4, 5 1, 2, 4, 5 1, 2, 4, 5
6 Conclusions This paper aims to develop a fuzzy optimization approach to select an R&D project portfolio while accounting for project interactions and determining the benefits resulting from the R&D investments using real options valuation in a fuzzy setting. Although a fuzzy approach to R&D project selection enables to hedge against the R&D
A Generalized Fuzzy Optimization Framework for R&D Project Selection
927
uncertainty, another important issue that needs to be considered is that traditional DCF valuation methods oftentimes undervalue the risky project. In this paper, American sequential exchange options are employed to address this problem. Since R&D projects generally involve phased research and sequential investment opportunities with uncertain expenditures as well as returns, the sequential exchange option model appears to be more suitable than other option pricing techniques. Sensitivity analysis with respect to parameters of the model, which is confined to a minimum here due to limited space, can be easily extended. A more efficient linearization scheme may also be applicable for the cases which include higher number of interdependent projects. The implementation of the proposed approach using real data remains as a future research objective.
References 1. Carlsson, C., Fullér, R.: A fuzzy approach to real option valuation. Fuzzy Sets and Systems 139 (2003) 297-312 2. Carr, P.: The valuation of sequential exchange opportunities. The Journal of Finance 43 (1988) 1235-1256 3. Dixit, A.K., Pindyck, R.S.: The options approach to capital investment. Harvard Business Review May-June (1995) 105-115 4. Geske, R., Johnson, H.E.: The American put option valued analytically. The Journal of Finance 39 (1984) 1511-1524 5. Inuiguchi, M., Ramik, J.: Possibilistic linear programming: a brief review of fuzzy mathematical programming and a comparison with stochastic programming in portfolio selection problem. Fuzzy Sets and Systems 111 (2000) 3-28 6. Karsak, E.E., Özogul, C.O.: An options approach to valuing expansion flexibility in flexible manufacturing system investments. The Engineering Economist 47 (2002) 169-193 7. Kuchta, D.: A fuzzy model for R&D project selection with benefit, outcome and resource interactions. The Engineering Economist 46 (2001) 164-180 8. Lee, J.W., Kim, S.H.: Using analytic network process and goal programming for interdependent information system project selection. Computers & Operations Research 27 (2000) 367-382 9. Lee, J., Paxson, D.A.: Valuation of R&D real American sequential exchange options. R&D Management 31 (2001) 191-201 10. Liberatore, M.: An extension of the analytic hierarchy process for industrial R&D project selection and resource allocation. IEEE Trans. Eng. Manage. 34 (1987) 12-18 11. Margrabe, W.: The value of an option to exchange one asset for another. The Journal of Finance 33 (1978) 177-186 12. Meade, L.M., Presley, A.: R&D project selection using the analytic network process. IEEE Trans. Eng. Manage. 49 (2002) 59-66 13. Schmidt, R.L.: A model for R&D project selection with combined benefit, outcome and resource interactions. IEEE Trans. Eng. Manage. 40 (1993) 403-410 14. Wang, J., Hwang, W.-L.: A fuzzy set approach for R&D portfolio selection using a real options valuation model. To appear in Omega (2006) 15. Watters, L.J.: Reduction of integer polynomial programming problems to zero-one linear programming problems. Operations Research 15 (1967) 1171-1174 16. Weingartner, H.M.: Capital budgeting of interrelated projects: survey and synthesis. Management Science 12 (1966) 485-516
Supply Chain Network Design and Transshipment Hub Location for Third Party Logistics Providers Seungwoo Kwon1, Kyungdo Park2, Chulung Lee3,*, Sung-Shick Kim3, Hak-Jin Kim4, and Zhong Liang5 1
Korea University Business School, Korea University College of Business Administration, Sogang University 3 Department of Industrial Systems and Information Engineering, Korea University [email protected] 4 Yonsei School of Business, Yonsei University 5 The Logistics Institute Asia Pacific, National University of Singapore 2
Abstract. Transshipment hubs are the places where cargo can be reconsolidated and transportation modes can be changed. Transshipment hubs are widely used in logistics industry. In this article, we consider a supply chain network design problem for a third party logistics provider in which we obtain the locations of transshipment hubs among promising hub locations for the objective of minimizing the total system cost that includes the transportation cost, the fixed cost and the processing cost. In the problem, the unit cost of transporting cargo between a pair of origin-destination is non-linear and is a decreasing step function, and each unit cargo through the transshipment hub is charged a fixed processing cost. The problem is considered a mixed integer programming problem. However, due to the complexity of the problem, a heuristic solution approach is proposed. The proposed solution is then implemented to a third party logistics provider and their experience shows that significant cost savings are obtained, compared with the current practice.
1 Introduction Logistics industry has been a fast growing industry in the past forty years. The cost of the business logistics system globally is estimated to exceed US$2 trillion in 1999 (The Logistics Institute Asia Pacific, 2004). In the United States, the logistics industry was worth more than $920 billion, equivalent of approximately 10 percent of the gross domestic product, while the figures were 16 percent in 1980’s. Logistics costs comprise more than 10-15% of the final product cost of finished goods. The total logistics cost included $377 billion of the inventory carrying cost and $585 billion of the transportation cost. From 1980 to 2000, with 1980 serving as the base, the inventory carrying cost has been declined by more than 50 percent, the transportation costs have been reduced by 22 percent and total logistics costs have decreased by 37 percent. The trends of more efficient inventory investment and fast cycle procurement have driven the productivity during the 1990s. Manufacturing and distribution *
Corresponding Author.
M. Gavrilova et al. (Eds.): ICCSA 2006, LNCS 3982, pp. 928 – 933, 2006. © Springer-Verlag Berlin Heidelberg 2006
Supply Chain Network Design and Transshipment Hub Location
929
industries have realized that the logistics activity is not their core business and in order to achieve competitive edges, they must turn to supply chain management specialists whose core competencies are logistics and supply chain management. Regardless, for 1999, only less than 5% of the logistics activities were outsourced worldwide (Viswanadam et al., 2003). The outsourced segment of logistics expenditures by manufacturing and distributing businesses was estimated to be $6 billion and increasing very fast. The centre in the growth is the third party logistics services providers. Third-party logistics service providers (3PLs) take over most of routine material handling activities such as warehousing, order picking, assembly, packaging and shipping, as well as the handling of returned parts and goods (Viswanadham et al., 2003). In addition, Third-party logistics providers are extending their business areas to marketing and customer relationship (especially in industries such as mobile phones and other high tech gadgets), assembly and ad-hoc manufacturing (for delayed differentiation), and supply hubs (supplier management). These new activities help them to create greater values (rather than merely cutting down operating costs) for their clients. Third-party logistics services grew by 24 percent in 2000. Dedicated contract carriage grew by 21 percent. Domestic transportation management grew by 21 percent in the face of competition from the socalled dot com freight exchanges. Warehouse based integrated services grew by 23 percent. Total contract logistics revenues grew by 24 percent or $56.4 billion in 2000 (The Logistics Institute Asia Pacific, 2004). In this study, we study a supply chain network design and control problem for a real-life third party logistics provider. Our objective is to obtain a high quality solution for their transshipment hub locations and transportation, simultaneously. The rest of the paper proceeds as follow. Chapter 2 provides a literature review on the transshipment problem. Chapter 3 briefly discusses negotiation processes for the supply chain design. In Chapter 4, we discuss a quantitative model and the corresponding heuristic solution approach. The implementation issue and results are reported in Chapter 5.
2 Literature Review Diks and De Kok (1996) investigate a two-echelon inventory system with a central depot and a number of retailers. In their study, the retailers satisfy customer demand with the stock kept in the central depot. The inventory is periodically reviewed and replenished. From the central depot to the retailers, stock is replenished by the share rationing policy, where the ratio of the projected net inventory at a retailer over the system-wide projected net inventory is kept constant and determined by the customer service level required by the retailer. Under the policy, when an order arrives, the total net stocks amongst the retailers are reallocated by the transshipment of stocks in order to maintain the constant ratio. The authors proposed a heuristics to obtain the ratios that minimize the total expected cost. Their numerical results show that, compared with the system without transshipment, such transshipment significantly reduces the total expected cost. Existing studies on inventory sharing in a supply chain focus on the scenario where the inventory policies for each retailer are coordinated. Such scenario is applicable if
930
S. Kwon et al.
a single “parent firm” owns all the retailer and it tries to identify the inventory allocation that maximize the supply chain performance (usually the total profit). Rudi et al. (2001) consider transshipment among different firms, where each firm tries to maximize its own profit. Three different scenarios – without transshipment, coordinated transshipment and decentralized transshipment – are examined. While in general, the decentralized transshipment decisions do not maximize the total profit for the supply chain, there exist transshipment prices that induce the locations to maximize the total expected profit. Apart from inventory sharing, cargoes may be consolidated in transshipment hubs, as commonly seen in postal service industries and in third party logistics providers. Cargo consolidation reduces transportation costs due to economies of scale inherent in freight transportation and distribution. Many firms have been consolidating cargoes for a number of years as an effective way to reduce transportation costs. Consolidation has also become a subject of greater interest as the difference between truckload and less-than-truckload rates has increased (Higginson and Bookbinder 1994). The hub and spoke system (e.g., airline routes, supply chain and telecommunication networks) connect origins and destinations via one or more hubs and these hubs are connected by an efficient transportation system. For the design of the hub and spoke system, heuristic algorithms have been developed (Pirkul and Schilling 1998, SkorinKapov and O’Kelly 1996, Wiles and van Brunt 2001). Amongst them, Pirkul and Schilling (1998) provides an efficient heuristic algorithm with tight bounds. The computational experiments on eighty-four standard test problems show that the average gaps are 0.048%, and the maximum gap is less than 1%. De Rosa et al. (2002) investigates an arc routing and scheduling problem with transshipment, in which goods are first collected and taken to a transshipment hub for processing and transported to the same final destination by a high-capacity vehicle. A tight lower bound is provided by the sum of a lower bound on the traversal cost of all the required edges and a lower bound on the total cost of a truck schedule. Based on the bounds, an efficient heuristics employing Tabu search techniques are then developed. Shigeno et al. (2000) provide a comprehensive literature review on the minimum cost network flow problem, focusing on cycle and cut canceling algorithms. Marin and Pelegrin (1996) consider a transshipment problem with fixed cost for each transshipment hub. The number of hubs is assumed to be given. They decompose the problem into two sub-problems by the Lagrangean decomposition. Using a dual ascent algorithm, a lower bound for the optimal objective function value is obtained. Furthermore, an efficient heuristics algorithm is proposed. The transshipment problem considered in this paper is different since the number of transshipment hubs is not given in this study. We also consider transportation cost to be non-linear and a stepdecreasing function.
3 Negotiation Process for the Supply Chain Network Design In a supply chain network design, many companies are involved. These companies will confront problems even though they follow the rules and contracts; therefore they
Supply Chain Network Design and Transshipment Hub Location
931
need to resolve those problems together. In order to increase the effectiveness of the supply chain network design, the companies need to follow an integrated negotiation approach rather than a distributive negotiation approach. Then, they can reach a winwin solution or integrated solution. If there is just one issue to be resolved, negotiators should approach the problem with distributive strategy. In this case it is a zero-sum game, since a negotiators gain is the other party’s loss and the other party’s gain is the focal negotiator’s loss. In addition, the amount of loss and gain are exactly same. In many cases, however, negotiators deal with several issues at the same time and negotiators may have different importance for the specific issues. Here, there is a potential for a win-win solution, if they can trade-off those two issues. That is, a negotiator can concede on an issue that is less important to him or her but important to the other party. At the same time the negotiator can be firm about an important issue to himself or herself but less important to the other party and ask the other party to make concession. By adopting integrative negotiation approach, they can increase the effectiveness of the total supply chain.
4 Model and Solution Approach In this study, we consider a supply chain network that consists of nodes and arcs. A set of cities (in-between transportation of goods are required) and a set of promising transshipment hub locations comprise nodes in the network, and the nodes are connected by arcs in the network. Every pair of nodes is connected by at least one arc. When more than two arcs connect a pair of nodes, each arc represents distinctive transportation mode. All the arcs are capacitated, i.e., there exists an upper bound for the amount of goods to transfer via each arc, and also the transshipment hubs are capacitated, i.e., there exists an upper bound for the amount of cargo each transshipment hub can handle. For each unit of cargo flowing through a transshipment hub, a fixed transshipment handling cost is charged. We further assume that the unit transportation cost of each arc follows a specific decreasing step function, i.e. when the total cargo flow on an arc exceeds certain amount, the exceeding amount will be charged a lower unit cost. The object of our problem is to determine which of the potential transshipment hubs should be selected and what pattern of the cargo flow should follow, in order to minimize the total cost of the whole transportation system, which is the sum of the costs of transportation, of transshipment handling, and of hub lease. The mixed integer programming formulation of the problem is as follows: The objective function is the total system cost that consists of transportation cost, transshipment handling cost and hub lease cost, respectively. The constraints set consists of flow balance constraints, capacity constraints and integrity constraints. Due to the complexity of this problem, solving the original mixed integer programming model is very difficult. Instead, we propose a two-stage heuristic approach to solve the problem. In the first stage, we obtain the hub location, and then in the second stage, transportation decisions that include how many goods will be transported from a node to the other nodes and what transportation mode will be selected for the flow are determined.
932
S. Kwon et al.
In order to solve the first stage problem, we compute the cost increase (decrease) by removing a location from the set of transshipment hubs, one at a time. Starting with all the promising locations to be transshipment hubs, we remove the location that reduces the total cost the most at each iteration. The iterative search is terminated when there is no location whose removal may further reduce the total cost. In the second stage, based on the given transshipment hub location, we design the transportation flow by solving a multi-commodity network flow problem.
5 Implementation This study was commissioned partly by a multi-national logistics service provider, and the solution obtained from the proposed method is then applied to a network design problem that the company is facing in a busy part of their Asia Pacific distribution network. The company analysis results show that the total cost can be reduced by more than 20% by relocating transshipment hubs and considering multimodal transportation, as obtained by the solution procedure in this study. Furthermore, the proposed heuristic approach provides solutions redesign cargo flows in very short time, and thus, can be employed to reroute cargoes in cases of operational contingency such as 9-11.
6 Conclusions In this study, we study a multi-modal transportation network with transshipment. Our objective is to select hub locations amongst several promising locations and design the cargo flow in the distribution network to minimize the total expected cost, which is the total of transportation costs, transshipment handling costs and fixed costs. In addition, we consider the supply chain where many companies are involved to maximize their own profits. In order for the win-win contract design that maximizes the effectiveness of the supply chain network design, the companies need to follow an integrated negotiation approach rather than a distributive negotiation approach. The mixed integer programming model for the distribution network design problem is hard to solve. Thus, a two-stage heuristic approach to solve the mixed integer programming problem is proposed. In the first stage, we obtain the hub location, and then in the second stage, transportation decisions that include how many goods will be transported from a node to the other nodes and what transportation mode will be selected for the flow are determined. The proposed two-stage heuristic approach is applied to a real-life third party logistics provider’s problem and reduces about 20% of the total system cost.
References Diks, E. B. and de Kok, A. G., “Controlling a Divergent 2-echelon Network with Transshipments Using the Consistent Appropriate Share Rationing Policy”, International Journal of Production Economics 45 (1996) 369-379 Higginson, J. and Bookbinder, J., “Policy Recommendation for a Shipment Consolidation Program”, Journal of Business Logistics 15 (1994) 87-112
Supply Chain Network Design and Transshipment Hub Location
933
Marin, A., and Pelegrin, B., “A Branch-and-bound Algorithm for the Transportation Problem with Location of p Transshipment Points”, Computers and Operations Research 24 (1997) 659-678 Pirkul, H., and Schilling, D. A., “An Efficient Procedure for Designing Single Allocation Hub and Spoke Systems”, Management Science 44 (1998) 235-242 Rosa, B. D., Improta, G., Ghiani, G., and Musmanno, R., “The Arc Routing and Scheduling Problem with Transshipment”, Transportation Science 36 (2002) 301–313 Rudi, N., Kapur, S., and Pyke, D. F., “A Two-Location Inventory Model with Transshipment and Local Decision Making”, Management Science 47 (2001) 1668–1680 Shigeno, M., Iwata, S., and McCormick, S. T., “Relaxed Most Negative Cycle and Most Positive Cut Canceling Algorithms for Minimum Cost Flow”, Mathematics of Operational research 25 (2000) 76-83 Skorin-Kapov, D. and O’Kelly M. E., “Tight linear programming relaxations of uncapacitated p-hub median problems”, European Journal of Operational Research 94 (1996) 582–593. The Logistics Institute Asia Pacific, “Third - Party Logistics Results and Findings of the 2004 Ninth Annual Study”, Singapore (2004) Viswanadham, N., Jarvis, J. J., Gaonkar, R. S., “Ten Mega Trends in Logistics”, White Paper, The Logistics Institute Asia Pacific, 2003 Wiles, P. G., and van Brunt, B., “Optimal Location of Transshipment Depots” Transportation Research Part A 35 (2001) 745-771
A Group Search Optimizer for Neural Network Training S. He1 , Q.H. Wu1, , and J.R. Saunders2 1
Department of Electrical Engineering and Electronics 2 School of Biological Sciences, The University of Liverpool, Liverpool, L69 3GJ, UK [email protected]
Abstract. A novel optimization algorithm: Group Search Optimizer (GSO) [1] has been successfully developed, which is inspired by animal behavioural ecology. The algorithm is based on a Producer-Scrounger model of animal behaviour, which assumes group members search either for ‘finding’ (producer) or for ‘joining’ (scrounger) opportunities. Animal scanning mechanisms (e.g., vision) are incorporated to develop the algorithm. In this paper, we apply the GSO to Artificial Neural Network (ANN) training to further investigate its applicability to realworld problems. The parameters of a 3-layer feed-forward ANN, including connection weights and bias are tuned by the GSO algorithm. Two real-world classification problems have been employed as benchmark problems trained by the ANN, to assess the performance of the GSOtrained ANN (GSOANN). In comparison with other sophisticated machine learning techniques proposed for ANN training in recent years, including some ANN ensembles, GSOANN has a better convergence and generalization performances on the two benchmark problems.
1
Introduction
Artificial Neural Networks (ANNs) have been widely applied to a variety of problem domains such as pattern recognition [2] and control [3] since their renaissance in the mid-1980’s. Various ANN architectures and training algorithms have been proposed. Among them, the most popular ANN architecture and training algorithm are feed-forward ANNs and the BP training algorithm, respectively. However, the gradient-based BP training algorithm is easy to be trapped by local minima and therefore deteriorates the performance of ANNs. On the other hand, designing a near optimal ANN architecture to achieve good generalization performance is also a hard optimization problem. In the past two decades, Evolutionary Algorithms (EAs) have been introduced to ANNs to perform various tasks, such as connection weight training, architecture design, learning rule adaption, input feature selection, connection weight initialization, rule extraction from ANN, etc.[4]. The combinations of ANNs and EAs are usually referred to as Evolutionary ANNs (EANNs). The earliest attempt to combine EAs and ANNs can be traced back to the late 1980s.
Corresponding author.
M. Gavrilova et al. (Eds.): ICCSA 2006, LNCS 3982, pp. 934–943, 2006. c Springer-Verlag Berlin Heidelberg 2006
A Group Search Optimizer for Neural Network Training
935
Since then, the successful marriage of ANNs and EAs has attracted more and more attention [5] [6]. In [7], an improved genetic algorithm was used to tune the structure and parameters of a neural network. An improved Genetic Algorithm (GA) with new genetic operators were introduced to train the proposed ANN. Two application examples, sunspots forecasting and associative memory tuning, were solved in their study. Palmes et al. proposed a mutation-based genetic neural network (MGNN) [8]. A simple matrix encoding scheme was used to represent an ANN’s architecture and weights. The neural network utilized a mutation strategy of local adaption of evolutionary programming to evolve network structures and connection weights dynamically. Three classification problems, namely iris classification, wine recognition problem, and Wisconsin breast cancer diagnosis problem were used in their paper as benchmark functions. Cant´ u-Paz and Kamath presented an empirical evaluation of eight combinations of EAs and ANNs on 11 well studied real-world benchmaks and 4 synthetic problems [9]. The algorithms they used included binary-encoded, real-encoded GAs, and the BP algorithm. The tasks performed by these algorithms and their combinations included searching for weights, designing architecture of ANNs, and selecting feature subsets for ANN training. We have proposed a novel GSO for continuous optimization problems [1], it is quite logically to apply our GSO algorithm to ANN weight training. The ANN weight training process can be regarded as a hard continuous optimization problem, since the search space is high-dimensional multi-modal and is usually polluted by noises and missing data. The objective of ANN weight training process is to minimize an ANN’s error function. However, it has been pointed out that minimizing the error function is different from maximizing generalization [10]. Therefore, to improve ANN’s generalization performance, in this study, an early stopping scheme is introduced. The error rates of validation sets are monitored during the training processes. When the validation error increases for a specified number of iterations, the training will stop. The GSO algorithm and early stopping scheme have been applied to traing an ANN for two benchmark functions - Wisconsin breast cancer data set and Pima Indian diabetes data set. The rest of the paper is organized as follows. We present the GSO algorithm in Section 2. In Section 3, GSOANN will be introduced and the details of implementation will be given. In Section 4, we describe the benchmark functions, experimental settings and the experimental results. The paper is concluded in Section 5.
2
Group Search Optimizer
Optimization, which is a process of seeking optima in a search space, is analogous to the resource searching process of animals in nature. It is quite natural to draw inspiration from animal searching behavior, especially group searching behavior to develop an optimization algorithm. The GSO which was inspired by animal searching behavior and group living theory was proposed.
936
S. He, Q.H. Wu, and J.R. Saunders
The GSO algorithm employs the Producer-Scrounger (PS) model [11] as a framework. The PS model was proposed to analyze social foraging strategies of group living animals. There are two foraging strategies within groups: (1) producing, e.g., searching for food; and (2) joining (scrounging), e.g., joining resources uncovered by others. In the PS model, foragers are assumed to use producing or joining strategies exclusively. Concepts of resource searching from animal scanning mechanism and the PS model are used to design optimum searching strategies for GSO. Basically GSO is a population based optimization algorithm. The population of the GSO algorithm is called a group and each individual in the population is called a member. In an n-dimensional search space, the ith member at the kth searching bout (iteration), has a current position Xik ∈ Rn , a head angle ϕki = (ϕki1 , . . . , ϕki(n−1) ) ∈ Rn−1 and a head direction Dik (ϕki ) = (dki1 , . . . , dkin ) ∈ Rn which can be calculated from ϕki via a Polar to Cartesian coordinates transformation: dki1 =
n−1
cos(ϕkip )
p=1
dkij = sin(ϕki(j−1) ) ·
n−1
cos(ϕkip )
p=i
dkin
=
sin(ϕki(n−1) )
(1)
In the GSO, a group comprises three kinds of members: producers, scroungers and rangers. The behaviors of producers and scroungers are based on the PS model. We also employ ‘rangers’ which perform random walks to avoid entrapment in local minima. For accuracy [12] and convenience of computation, in the GSO algorithm, there is only one producer at each searching bout and the remaining members are scroungers and rangers. The simplest joining policy, which assumes all scroungers will join the resource found by the producer, is used. During each search bout, a group member, located in the most promising area, conferring the best fitness value, acts as the producer. It then stops and scans the environment to search resources (optima). Scanning can be accomplished through physical contact or by visual, chemical, or auditory mechanisms [13]. Vision, as the main scanning mechanism used by many animal species, is employed by the producer in GSO. In order to handle optimization problems of whose number of dimensions usually is larger than 3, the scanning field of vision is generalized to a n dimensional space, which is characterized by maximum pursuit angle θmax ∈ Rn−1 and maximum pursuit distance lmax ∈ R1 as illustrated in a 3D space in Figure 1. In the GSO algorithm, at the kth iteration the producer Xp behaves as follows: 1) The producer will scan at zero degree and then scan laterally by randomly sampling three points in the scanning field [14]: one point at zero degree: Xz = Xpk + r1 lmax Dpk (ϕk )
(2)
A Group Search Optimizer for Neural Network Training
um M a x im
p u r s u it
a n g le
937
θ max 0o
Maxim um pursuit angle θ max
M ax im um
p ur su it d is
(Fo rw ard dire cted )
ta nc e l m ax
Fig. 1. Scanning field in 3D space [13]
one point in the right hand side hypercube: Xr = Xpk + r1 lmax Dpk (ϕk + r2 θmax /2)
(3)
and one point in the left hand side hypercube: Xl = Xpk + r1 lmax Dpk (ϕk − r2 θmax /2)
(4)
where r1 ∈ R1 is a normally distributed random number with mean 0 and standard deviation 1 and r2 ∈ Rn−1 is a random sequence in the range (0, 1). 2) The producer will then find the best point with the best resource (fitness value). If the best point has a better resource than its current position, then it will fly to this point. Otherwise it will stay in its current position and turn its head to a new angle: ϕk+1 = ϕk + r2 αmax
(5)
where αmax is the maximum turning angle. 3) If the producer cannot find a better area after a iterations, it will turn its head back to zero degree: ϕk+a = ϕk (6) √ where a is a constant given by round( n + 1). At each iteration, a number of group members are selected as scroungers. The scroungers will keep searching for opportunities to join the resources found by the producer. The commonest scrounging behavior [11] in house sparrows (Passer domesticus): area copying, that is, moving across to search in the immediate area around the producer, is adopted. At the kth iteration, the area copying behavior of the ith scrounger can be modeled as a random walk towards the producer: Xik+1 = Xik + r3 (Xpk − Xik )
(7)
where r3 ∈ Rn is a uniform random sequence in the range (0, 1). In nature, group members often have different searching and competitive abilities; subordinates, who are less efficient foragers than the dominant will be
938
S. He, Q.H. Wu, and J.R. Saunders
dispersed from the group [15]. This may result in ranging behavior. Ranging is an initial phase of a search that starts without cues leading to a specific resource [16]. The ranging animals - rangers, may explore and colonize new habitats. In our GSO algorithm, rangers are introduced to explore a new search space therefore to avoid entrapments of local minima. The rangers perform search strategies which include random walks and systematic search strategies to locate resources efficiently [17]. In the GSO algorithm, random walks, which are thought to be the most efficient searching method for randomly distributed resources [18], are employed by rangers. If the ith group member is selected as a ranger, at the kth iteration, it generates a random head angle ϕi : ϕk+1 = ϕki + r2 αmax i
(8)
where αmax is the maximum turning angle; and it chooses a random distance: li = a · r1 lmax
(9)
Xik+1 = Xik + li Dik (ϕk+1 )
(10)
and move to the new point:
In order to maximize their chances of finding resources, animals restrict their search to a profitable patch. One strategy is turning back into a patch when its edge is detected [19]. This strategy is employed by GSO to handle the bounded search space: when a member is outside the search space, it will turn back to its previous position inside the search space.
3
The GSO Based Training Algorithm for Neural Networks
Figure 2 presents a three-layer feed-forward ANN to be tuned by our GSO algorithm. The ANN consists three layers, namely, input, hidden, and output layers. The nodes in each layer receive input signals from the previous layer and pass the output to the subsequent layer. The nodes of the input layer supply respective elements of the activation pattern (input vector), which constitute the input signals from outside system applied to the nodes in the hidden layer by the weighted links. The output signals of the nodes in the output layer of the network constitute the overall response of the network to the activation pattern supplied by the source nodes in the input layer [20]. The subscripts n, h, and k denote any node in the input, hidden, and output layers, respectively. The net input u is defined as the weighted sum of the incoming signal minus a bias term. The net input of node h , uh , in the hidden layer is expressed as follows: uh =
n
whn yn − θh
where yn is the output of node n in the input layer, whn represents the connection weight from node n in the input layer to node h in the hidden layer, and θh is the
A Group Search Optimizer for Neural Network Training in p u t la y e r x1
h id d e n la y e r w hn
f1 ( ∑ )
o u tp u t la y e r
w kn
f1 ( ∑ )
yˆ 1
xn
f h (∑ )
f k (∑ )
yˆ k
xN
f H (∑ )
f K (∑ )
yˆ K
θh -1
939
θk
b ia s
-1
b ia s
Fig. 2. A three-layer feed-forward ANN
bias of node h in the hidden layer. The activation function used in the proposed ANN is the sigmoid function. Therefore, in the hidden layer, the output yh of node h, can be expressed as yh = fh (uh ) =
1 1 + euh
The output of node k in the output layer can be also described as: yk = fk (uk ) = where uk =
1 1 + euk
(11)
wkh yh − θk
h
where θk is the bias of node k in the output layer. The parameters (connection weights and bias terms) are tuned by the GSO algorithm. In the GSO-based training algorithm, each member of the population is a vector comprising connection weights and bias terms. Without loss of generality, we denote W1 as the connection weight matrix between the input layer and the hidden layer, Θ1 as the bias terms to the hidden layer, W2 as the one between the hidden layer and the output layer, and Θ2 as the bias terms to the output layer. The ith member in the population can be represented as: Xi = [W1i Θ1i W2i Θ2i ]. The fitness function assigned to the ith individual is the least-squared error function defined as follows: 1 i 2 (dkp − ykp ) 2 p=1 P
Fi =
K
k=1
(12)
940
S. He, Q.H. Wu, and J.R. Saunders
i where ykp indicates the kth computed output in equation (11) of the ANN for the pth sample vector of the ith member; P denotes the total number of sample vectors; and dkp is the desired output in the kth output node.
4
Experimental Studies
In order to evaluate the GSOANN’s performance, two well-studied benchmark problems from the UCI machine learning repository were tested. These problems are Wisconsin breast classification data and Pima Indian diabetes data. They are all real-world problems which are solved by human experts in practice. The data sets of these problems usually contain missing attribute values and are polluted by noise. Therefore, they represent some of the most challenging problems in the machine learning field [6]. We evaluate the GSO algorithm on these two problems and compare our results with the latest results published in the literature. 4.1
Experimental Setting
The parameter setting of the GSO algorithm is as follows. The initial population of GSO is generated at uniformly random in the search space. The initial head angle ϕ0 of each individual is set to be π4 . The maximum pursuit angle θmax is aπ2 . The maximum turning angle α is set to be 2aπ2 . The maximum pursuit distance lmax is calculated from: n lmax = Ui − Li = (Ui − Li )2 i=1
where Li and Ui are the lower and upper bounds for the ith dimension. The parameter which needs to tune is the percentage of rangers; our recommended percentage of rangers is 20%, which was used throughout all our experiments. The population size of the GSO algorithm was set to 50. The data sets of the four classification problems were partitioned according the guidelines of Prechelt [21]. Each set of data was divided into three sets: 50% of the patterns were used for learning, 25% of them for validation and the remaining 25% for testing the generalization of the trained ANN. The maximum epoches varied according to the convergence rates of the training algorithms on different problems. Several runs were executed to determine the maximum epoches for the five benchmark problems. All experiments were repeated 30 runs in order to get average results. The proposed algorithm and the other six training algorithms were implemented in MATLAB 6.5 and executed on a Pentium 4, 2.0 GHz machine. 4.2
The Wisconsin Breast Cancer Data Set
The breast cancer data set was obtained by W. H. Wolberg et al. at the University of Wisconsin Hospitals, Madison. The data set currently contains 9 integer-valued attributes and 699 instances of which 458 are benign and 241 are malignant example. In order to train ANNs to classify a tumor as either benign
A Group Search Optimizer for Neural Network Training
941
or malignant, we partitioned this data set into three sets: a training set which contains the first 349 examples, a validation set which contains the following 175 examples, and a test set which contains the final 175 examples. The comparisons between the results produced by GSOANN and those of 9 other machine learning algorithms are tabulated in Table 1. Among these algorithms, MGNN [8] and EPNet [6] evolved ANN structure as well as connection weights; COOP [22] is an evolutionary ANN ensemble evolved by cooperative coevolution; CNNE [23] is a constructive algorithm for training cooperative ANN ensembles. CCSS [24], OC1-best [25] and EDTs [26] are state-of-the-art decision tree classifiers, including the decision tree ensembles [24] [26] and hybrid evolutionary decision tree [25]; GANet-best is the best result from [9], which was generated by an EANN based on a real-encoded EA [27] to evolve connection weights; the SVM-best is the best result of 8 least squares SVM classifiers [28]. It is worth to mention that the decision trees [24] [26] [9] and SVM [28] techniques used k-fold cross-validation which generated more optimistic results. In comparison with the sophisticated classifiers mentioned above, we can find that this simple GSOANN produced the best average result. Table 1. Comparison between GSOANN and other approaches in terms of average testing error rate (%) on the Wisconsin breast cancer data set Algorithm GSOANN GANet-best[9] COOP [22] CNNE [23] EPNet [6] Test error rate (%) 0.65 1.06 1.23 1.20 1.38 Algorithm MGNN [8] SVM-best [28] CCSS [24] OC1-best [25] EDTs [26] Test error rate (%) 3.05 3.1 2.72 3.9 2.63
4.3
The Pima Indian Diabetes Data Set
The Pima Indian diabetes data was originally donated by Vincent Sigillito at the Johns Hopkins University. The diagnostic, binary-valued variable investigated is whether a patient shows signs of diabetes according to World Health Organization criteria. There are 8 numeric-valued attributes and 768 instances. The data set contains 500 instances of patients with signs of diabetes and 268 instances of patients without. The data set were partitioned: the first 384 instances were used as the training set, the following 192 instances as the validation set, and the final 192 instances as the test set. This problem is one of the most difficult machine learning problems since the data set is relatively small and was heavily polluted by noise. Results from other state-of-the-art classifiers are tabulated in Table 2. COVNET [29] is a cooperative coevolutionary model for evolving artificial neural networks. EENCL is evolutionary ensembles with negative correlation learning presented in [30]. Twelve-fold cross-validation was used by EENCL. The GANet-best is the best result produced by an ANN trained by a subset of features selected by a binaryencoded GA [9]. From Table 2, it can be seen that GSOANN is outperformed by COOP [22] and CNNE [23] which are both ANN ensembles. However, GSOANN produced
942
S. He, Q.H. Wu, and J.R. Saunders
Table 2. Comparison between GSOANN and other approaches in terms of average testing error rate (%) on the Pima diabetes disease data set Algorithm GSOANN GANet-best[9] COOP[22] CNNE[23] COVNET[29] Test error rate (%) 19.79 24.70 19.69 19.60 19.90 Algorithm EENCL [30] EPNet [6] SVM-best [28] CCSS [24] OC1-best [25] Test error rate (%) 22.1 22.38 22.7 24.02 26.0
better results than those of the rest classifiers including evolutionary ANN ensembles COVNET [29] and EENCL [30].
5
Conclusion
In this paper, our GSO algorithm has been applied to train ANN’s connection weights. Our initial goal was not to propose a sophisticated ANN which can achieve the best generalization performance. Instead, we aimed to access GSO’s global search performance on real-world problems by applying it for ANN training since the training process can be regarded as a hard continuous optimization problem. Our experimental results show that, it is not enough for solving the real-world classification problems to compare the results of GSOANN with those of other sophisticated ANNs such as ANN ensembles. However, since the GSOANN is relatively simple, only on the aspect of ANN’s weight training, this GSO based ANN training algorithm provides relatively better results on the two benchmark problems we tested.
References 1. He, S., Wu, Q.H., Saunders, J.R.: Group search optimizer - an optimization algorithm inspired by animal behavioral ecology. (Subimtted to IEEE Trans. on Evolutionary Computation) 2. Thrun, S.B., et. al.: The MONK’s problems: A performance comparison of different learning algorithms. Technical Report CS-91-197, Pittsburgh, PA (1991) 3. Wu, Q.H., Hogg, B.W., Irwin, G.W.: A neural network regulator for turbogenerators. IEEE Trans. on Neural Networks 3(1) (1992) 95–100 4. Yao, X.: Evolving artificial neural networks. Proceeding of the IEEE 87(9) (1999) 1423–1447 5. Fogel, D.B., Fogel, L.J., Porto, V.W.: Evolving neural networks. Biol. Cybern. 63 (1990) 487–493 6. Yao, X., Liu, Y.: A new evolutionary system for evolving artificial neural networks. IEEE Trans. on Neural Networks 8(3) (1997) 694–713 7. Leung, F.H.F., Lam, H.K., Ling, S.H., Tam, P.K.S.: Tuning of the structure and parameters of a neural network using an improved genetic algorithm. IEEE Trans. on Neural Networks 14(1) (2003) 79–88 8. Palmes, P.P., Hayasaka, T., Usui, S.: Mutation-based genetic neural network. IEEE Trans. on Neural Networks 16(3) (2005) 587–600
A Group Search Optimizer for Neural Network Training
943
9. Cantu-Paz, E., Kamath, C.: An empirical comparison of combinations of evolutionary algorithms and neural networks for classification problems. IEEE Transactions on Systems, Man, and Cybernetics-Part B: Cybernetics 35(5) (2005) 915–927 10. Wolpert, D.H.: A mathematical theory of generalization. Complex Systems 4(2) (1990) 151–249 11. Barnard, C.J., Sibly, R.M.: Producers and scroungers: a general model and its application to captive flocks of house sparrows. Animal Behaviour 29 (1981) 543– 550 12. Couzin, I., Krause, J., Franks, N., Levin, S.: Effective leadership and decisionmaking in animal groups on the move. Nature 434 (2005) 513–516 13. Bell, J.W.: Searching Behaviour - The Behavioural Ecology of Finding Resources. Chapman and Hall Animal Behaviour Series. Chapman and Hall (1990) 14. O’Brien, W.J., Evans, B.I., Howick, G.L.: A new view of the predation cycle of a planktivorous fish, white crappie (pomoxis annularis). Can. J. Fish. Aquat. Sci. 43 (1986) 1894–1899 15. Harper, D.G.C.: Competitive foraging in mallards: ’ideal free’ ducks. Animal Behaviour 30 (1988) 575–584 16. Dusenbery, D.B.: Ranging strategies. Journal of Theoretical Biology 136 (1989) 309–316 17. Higgins, C.L., Strauss, R.E.: Discrimination and classfication of foraging paths produced by search-tactic models. Behavioral Ecology 15(2) (2003) 248–254 18. Viswanathan, G.M., Buldyrev, S.V., Havlin, S., da Luz, M.G., Raposo, E., Stanley, H.E.: Optimizing the success of random searches. Nature 401(911-914) (1999) 19. Dixon, A.F.G.: An experimental study of the searching behaviour of the predatory coccinellid beetle adalia decempunctata. J. Anim. Ecol. 28 (1959) 259–281 20. Haykin, S.: Neural Networks. A Comprehensive Foundation. Prentice Hall, New Jersey, USA (1999) 21. Prechelt, L.: Problem1 - a set of neural network benchmark problems and benchmarking rules. Technical report, Fakultat fur Infromatik Universitat Karlsruhe, 76128 Karlsruhe, Germany (1995) 22. Garcia-Pedrajas, N., Hervas-Martinez, C., Ortiz-Boyer, D.: Cooperative coevolution of artificial neural network ensembles for pattern classification. IEEE Trans. on Evolutionary Computation 9(3) (2005) 271–302 23. Islam, M., Yao, X., Murase, K.: A constructive algorithm for training cooperative neural network ensembles. IEEE Trans. on Neural Networks 14(4) (2003) 820–834 24. Dzeroski, S., Zenko, B.: Is combining classifiers with stacking better than selecting the best one? Machine Learning 54(3) (2004) 255–273 25. Cantu-Paz, E., Kamath, C.: Inducing oblique decision trees with evolutionary algorithms. IEEE Trans. on Evolutionary Computation 7(1) (2003) 54–68 26. Dirtterich, T.G.: An experimental comparison of three methods for constructing ensembles of decision trees: Bagging, boosting, and randomization. Machine Learning 40(12) (2000) 139–157 27. Deb, K., Anand, A., Joshi, D.: A computationally efficient evolutionary algorithm for real-parameter optimization. Evolutionary Computation 10(4) (2002) 371–395 28. Gestel, T.V., et. al.: Benchmarking least squares support vector machine classifiers. Machine Learning 54(1) (2004) 5–32 29. Garcia-Pedrajas, N., Hervas-Martinez, C., Munoz-Perez, J.: Covnet: a cooperative coevolutionary model for evolving artificial neural networks. IEEE Trans. on Neural Networks 14(3) (2003) 575–596 30. Liu, Y., Yao, X.: Evolutionary ensembles with negative correlation learning. IEEE Trans. on Evolutionary Computation 4(4) (2000) 380–387
Application of Two-Stage Stochastic Linear Program for Portfolio Selection Problem Kuo-Hwa Chang, Huifen Chen, and Ching-Fen Lin Department of Industrial Engineering, Chung Yuan Christian University, Chung-Li, 320, Taiwan
Abstract. We consider a portfolio selection problem under the consideration of dynamic closing time of the portfolio. The selection strategy is to take the long position on the stocks and the short position on an index future. We will close our portfolio whenever our profit exceeds the predetermined target during the investment period, otherwise, we will own the portfolio till the maturity date of the future. Our purpose is to have a profitable portfolio with steady return which is higher than the interest rate of savings and independent of the market. To deal with the stocks selection problem and, at the same time, the uncertainty on the closing time due to the fluctuation of the market, we define the corresponding optimization problem as a two-stage stochastic linear program (two-stage SLP). Our models are tested by the real-world data and the results are consistent with what we expected. Keywords: Portfolio selection, Futures, Two-stage stochastic linear program.
1
Introduction
Portfolio theory deals with the problem how to allocate wealth among serval assets. The basic theory of portfolio optimization was presented by Markowitz [11]. By employing the standard deviation and expected value of the asset as the parameters, Markowitz introduced the famous mean-variance (MV) model. There has been a tremendous amount of researches on improving this basic model. Kane [6] considers skewness in the model additionally, called MVS(mean-varianceskewness) model. Single index model was another improved model. By assuming a linear relation between the return of assets and the return of the market index, the single-index model reduces the amount of input data. Safety-first models consider limiting the risk of bad outcomes. Telser [12] proposes safety-first a model to maximize expected return, subjected to the constraint that the probability of a return less than, or equal to, some predetermined limit will not greater than some predetermined number. Because of the computational difficulty in solving a large-scale MV model, Konno and Yamazaki [8] defines mean-absolute deviation (MAD) to replace covariance as the measure of risk. The major advantage of the MAD model is that it can be solved as a linear program instead of a quadratic program for M. Gavrilova et al. (Eds.): ICCSA 2006, LNCS 3982, pp. 944–953, 2006. c Springer-Verlag Berlin Heidelberg 2006
Application of Two-Stage SLP for Portfolio Selection Problem
945
an MV model. Konno and Kobayashi [7] constructs an integrated stock-bond portfolio optimization model by minimizing the absolute deviation of the return rate of the portfolio with a given expected return rate. Cai et al. [1] provides a portfolio selection rule whose objective was to minimize the maximum individual risk and the MAD was used as the risk measure. Konno et al. [9] further solves an MVS model as a linear program. Some risk-aversion investor may want to have a portfolio whose return rate is relatively stable comparing with market but is higher than saving interest rate. Financial derivatives such as futures and options help investors to divert and further to hedge their investment risks. An intuitive portfolio would be a bucket of stocks taken in long position and an index future in short position. Chang [3] studies this kind of portfolio and the holding time of this portfolio starts from the first date when the future is issued and ends on the maturity date of this future. In this paper, we further add more flexible investment strategy that we can close all positions during the investment period once the profit exceeds the predetermined target. However, under the uncertainty of the closing time, we should be able to find an alternative optimization model to deal with the uncertainty. In this paper, we model the proposed portfolio problem as a twostage stochastic linear programming(SLP) problem. Stochastic program is one of the most widely applicable models incorporating uncertainty within optimization objective. It is used in many fields, such as production planning. It is also used for solving some financial planning models, especially the asset liability management (ALM). For example, Kouwenberg [10] presents a scenario generation method for solving the stochastic program associated with an ALM model. Two-stage SLP provides a suitable framework for modeling decision problems that incorporate uncertainty and, most importantly, it can be solved by using simplex method. Chang et al. [2] extends MAD model to a two-stage SLP model for the portfolio selection problem with transaction costs. In their model, investment decision is made in the first stage and the uncertainty from the return rates is handled by the second stage. Our purpose is, under the consideration that we can close our portfolio once our profit exceeds our profit target or we will close it on the maturity date of the future, to determine an optimal portfolio with stocks in long positions and an index future in short position. The uncertainty comes from the closing time. We model this problem as a two-stage SLP and solve it by the corresponding decomposition procedure. Let x be the column vector representing decision variables for the main problem(stage 1) and y be the one for the stage 2. The general framework of two-stage SLP can be formulated as follows: M in cx + E [ h (x, ω) ] s.t. Ax = b
(Stage 1)
x≥0 where h(x, ω) = M in dy
(Stage 2)
946
K.-H. Chang, H. Chen, and C.-F. Lin
s.t. W y = r(ω) − T (ω)x y≥0 and ω s are defined on a probability space (Ω, A, P). The function E[h(x, ω)] is often referred as the recourse function. In the above formulation, AX = b represents the linear constraints in stage 1 and W y = r(ω) − T (ω)x represents those in stage 2, in which T (ω)x accounts for the uncertainty associated with the feasible solution x. Given x and ω, let the dual problem of stage 2 denoted by h (x, ω) = M ax{g|πW ≤ d}, where g = π(r(ω) − T (ω)x) is the objective function. Let X = {x|Ax = b, x ≥ 0} denote the set of feasible solutions of the first stage and let the set Π = {π|πW ≤ d} denote the set of dual feasible solutions of the second stage problem (S). The two-stage SLP are solved by stochastic decomposition procedure(Higle and Sen [5]). The purpose of the decomposition procedure is to approximate E[h(x, ω)] by a sequence of cutting planes, which are generated by optimal dual solutions whose average values can be used as bounds for E[h(x, ω)]. Let Vk denote the set of the solutions for the dual of the second stage for the first k realizations. The basic stochastic decomposition algorithm may be stated as follows. See Higle and Sen [5] for more details. (0) k ← 0. Let V0 be an empty set and η0 (x) = −∞. Choose an initial solution x1 from X for the first stage, and a lower bound L. (1) k ← k + 1. Randomly generate an observation of ω k , the kth realization, independent of any previously generated observations. (2) Determine ηk (x), a piecewise linear approximation of E[h(x, ω)] by the following steps: a) Solve the dual problem of the second stage for the kth realization with the objective function g(π, ω k , xk ). Then obtain the optimal solution π k = argmax{g(π, ω k , xk )|π ∈ Π}. Update Vk = Vk−1 π k . b) Determine the coefficients of the kth cutting plane: αkk + βkk x = 1 k k i k i k i=1 g(πi , ω , x), where πi = argmax {g(π, ω , x )|π ∈ Vk } for i = k 1, . . . , k − 1. c) Update the coefficients of all previous generated cuts for i = 1, . . . , k − 1: k−1 k−1 αki = k−1 + k1 L and βik = k−1 . k αi k βi k k d) Let ηk (x) =Max {αi + βi x|i = 1, . . . , k}. (3) Let xk+1 be the optimal solution of the following linear program. M in cx + ηk s.t. Ax = b αki + βik x ≤ ηk (x) x≥0 Repeat step (1) until the sequence x1 , x2 , . . . converges. (4) The limiting xk is the optimal solution.
Application of Two-Stage SLP for Portfolio Selection Problem
947
The rest of this paper is organized as follows. In Section 2, we model our problem as a two-stage SLP. There are two models: one is with a fixed target and the other one is with a dynamic target. Both dual problems of the second stages are presented. In section 3, we test our model by considering the stocks in Taiwan Stock Exchange and the index future in Taiwan Future Exchange. We conclude our study in Section 4.
2
Portfolio Optimization Model
Assume there are n assets we can choose from. Let Ri denote the random variable representing the return rate of asset i with mean ri and let σij be its covariance with asset j. Let RL denote the lowest return rate we can tolerate. In the primary model, let xi be decision variable representing the fraction of the fund invested on asset i. By adopting Telser’s safety-first model and assuming the normalities of the returns as usual, we have the following primary optimization problem M ax
n i=1
s.t.
n
ri xi n n ri xi ≥ RL + zα σij xi xj
i=1 n
i=1 j=1
xi = 1, xi ≥ 0
i = 1, . . . , n,
i=1
where zα is the critical point for the standard normal distribution such that P (X ≥ zα ) = α. The first constraint presents the requirement that the probability of a return greater than or equal to RL should not be less than 1 − α. Due to the computational difficulty associated with solving such problem with a dense covariance matrix, we use the mean-absolute deviation (MAD) to replace covariances of the return rates. MAD transforms the portfolio selection problem from a quadratic program into a linear program. Assume that we have M samples of Ri . Let rim be the mth sample M of Ri and let r¯i represent the corresponding average return, r¯i = (1/M )( m=1 rim ). Then the mean absolute deviation is M n defined (1/M )( m=1 | i=1 (rim − r¯i )xi |) and it is used to replace the term as n n i=1 j=1 σij xi xj . Our investment strategy is to take the long position on stocks and the short position on the index future which is used to reduce the investment risk. There are n stocks. g is the price of one index of the index future. Assume that we issue the portfolio at time zero(t=0). Let xi be the number of units of the stock i will be purchased at price Si0 at time t = 0. Let F0 be the index level of the future at time t = 0. Let rim and rF m be the respective return rate of stock i and future sampled from past period m, m = 1, . . . , M and let r¯i and r¯F be the corresponding average returns. The portfolio selection problem above with MAD approximation can be equivalently formulated as
948
M ax
K.-H. Chang, H. Chen, and C.-F. Lin n
r¯i xi Si0 − g¯ rF F0
i=1
s.t.
n
r¯i xi Si0 −g¯ rF F0 ≥ C + zα
i=1 n
xi Si0 ≤ B,
xi ≥ 0
M n 1 | (rim − r¯i )xi Si0 −g(rF m − r¯F )F0 | M m=1 i=1
i = 1, . . . , n,
i=1
where C is the minimum profit we would like to obtain and B is the total budget for our investment. As mentioned in the previous section, we further consider that we will close our portfolio once our profit exceeds the predetermined target l during the investment period, otherwise, we will have the portfolio till the maturity date of the future. To deal with the uncertainty of the closing time due to the uncertain price of each stock in the coming period, we model it as a two-stage stochastic linear problem as follows. M in −
n
r¯i xi Si0 + g¯ rF F0 + E [ h (x, ω) ]
i=1
s.t.
n
r¯i xi Si0 − g¯ rF F0 ≥ C +zα
i=1 n
xi Si0 ≤ B,
xi ≥ 0
M n 1 | (rim − r¯i )xi Si0 −g(rF m − r¯F )F0 | M m=1 i=1
i = 1, . . . , n,
i=1
where h(x, ω) = M in
−
T −1
qt yt
n + (1 − y¯T )( r¯i xi Si0 − g¯ rF F0 − l)
t=1
s.t. qk
k
i=1
yt ≥ qk
k = 1, . . . , T − 1
t=1
qt yt ≥ 0 T −1
t = 1, . . . , T − 1
(P1 )
yt + y¯T = 1
t=1
yt ≥ 0, t = 1, . . . , T − 1,
y¯T ≥ 0,
yt and y¯T are integers,
where qt represents the surplus profit over the target at time t and T is the length of the investment period. In the first stage of this SLP, a portfolio is obtained under the safety-first criterion; in the second stage, the surplus profit of this given portfolio if it is closed during the investment period is obtained. For solving the problem in the second stage, we simulate those prices and the index future by using single index
Application of Two-Stage SLP for Portfolio Selection Problem
949
model(see Elton and Gruber [4]). Let simulated Sit (ω) and Ft (ω) denote the price of the stock i and the price of the index nfuture at time t during the investment period, then, for t = 1, . . . , T − 1, qt = i=1 xi (Sit (ω) − Si0 ) − g(Ft (ω) − F0 ) − l. In the second stage, yt is a binary variable that yt = 1 if and only if we close our portfolio at period t(once qt is greater than zero). If there is no qt greater than zero, the variable y¯T would be one. This means that all positions are closed till the maturity date. These constraints in the second stage will let yt = 1 for the first qt > 0 if there is any, where t starts from 1 to T −1 and let y¯T = 1 otherwise. Note that the first constraint is the key one that it will assign an 1 to the yt corresponding to the first qt > 0. Also note that y¯T = 1 means that we will close our portfolio at the last date) and it will imply E [ h (x, ω) ] = 0. day(maturity n − We define d+ ¯i )αi − g(rF m − r¯F )F0 for each m to replace m − dm = i=1 (rim − r the absolute-value terms in the first constraint. The corresponding dual problem of (P1 ) is M ax
T −1
qt πt
n + πT− + ( r¯i xi Si0 − g¯ rF F0 − l)
t=1
s.t.
T −1
i=1
qt πt + qk πk + π ¯T ≤ −qk
k = 1, . . . , T − 1
t=k n π ¯T ≤ −( r¯i xi Si0 − g¯ rF F0 − l)
πt ≥ 0,
i=1 πt
(D1 )
≥ 0 t = 1, . . . , T − 1, π ¯T ∈ R,
where π = (π1 , π2 , . . . , πT −1 , π1 , π2 , . . . , πT −1 , π ¯T ) is the row vector representing the dual variables. Note that the target l in the above model is fixed all the time. We can further choose l as a dynamic one depending on expected performance of the coming period. Here we choose the expected revenueof the portfolio estimated n in the first stage as our dynamic target. That is, l = i=1 r¯i xi Si0 − g¯ rF F0 . The corresponding second stage with dynamic l is h(x, ω) = M in
−
T −1
qt yt
t=1
s.t. qk
k
yt ≥ qk
k = 1, . . . , T − 1
t=1 T −1
yt + y¯T = 1
(P2 )
t=1
yt ≥ 0,
t = 1, . . . , T − 1,
y¯T ≥ 0,
yt and y¯T are integers.
n n In this model, qt = i=1 xi (Sit (ω)−Si0 )−g(Ft (ω)−F0 )−( i=1 r¯i xi Si0 −g¯ rF F0 ). The corresponding dual problem of (P2 ) is
950
K.-H. Chang, H. Chen, and C.-F. Lin
M ax
T −1
qt πt + π ¯T
t=1
s.t.
T −1
qt πt + π ¯T ≤ −qk
k = 1, . . . , T − 1
(D2 )
t=k
π ¯T ≤ 0 πt ≥ 0
t = 1, . . . , T − 1,
π ¯T ∈ R,
where π = (π1 , π2 , . . . , πT −1 , π ¯T ) is the row vector representing the dual variables.
3
Empirical Results
In this section, we test our models by considering the stocks selected from Taiwan Stock Exchange(TAIEX) and two index futures, TAIEX Futures (TX) and Mini TAIEX Futures (MTX), from Taiwan Futures Exchange. The price for one unit index for TX is NT$200(200 New Taiwan Dollars) and NT$ 50 for MTX. Historical data starting from 1988 up to the current month are used to estimate the parameters in our model for the coming month, based on which, the
Table 1. Return rates of Model-1 and Model-2 with TX Month Model-1/TX Model-2/TX (target l) market Nov-03 0.0295 -0.0046 (85159) -0.0099 Dec-03 0.0210 0.0212 (83468) -0.0194 Jan-04 0.0097 0.0562 (75808) 0.1100 Feb-04 0.0341 -0.0114 (82780) 0.0346 Mar-04 0.0118 0.0259 (54022) -0.0042 Apr-04 0.0328 0.0391 (42551) 0.0353 May-04 0.0441 0.0533 (44077) -0.1394 Jun-04 0.0373 0.0482 (54973) -0.0513 Jul-04 0.0415 0.0551 (52566) -0.0272 Aug-04 0.0156 0.0476 (51215) 0.0034 Sep-04 0.0313 0.0401 (68083) 0.0817 Oct-04 0.0281 0.0288 (62914) -0.0141 Nov-04 -0.0296 -0.0250 (41985) 0.0415 Dec-04 -0.0161 -0.0038 (40622) -0.0043 Jan-05 0.0115 0.0187 (48607) -0.0179 Feb-05 -0.0481 -0.0408 (48855) 0.0421 Mar-05 -0.0104 -0.0091 (34933) -0.0116 Apr-05 -0.0432 -0.0412 (32330) -0.0625 May-05 0.0303 0.0266 (28559) 0.0347 Jun-05 -0.0293 -0.0295 (60560) 0.0613 Jul-05 0.0353 0.0277 (31758) 0.0275 Aug-05 0.0406 0.0284 (33740) -0.0283 Sep-05 0.0326 0.0219(39964) -0.0280 Oct-05 -0.0013 0.0248(47982) -0.0615 Nov-05 -0.00045 0.0391(74970) 0.0618 Mean return 0.0123 0.0175 0.0022 Standard deviation 0.0276 0.0299 0.0530 Correlation with market -0.209 -0.097 Pearson P-Value 0.316 0.646
Application of Two-Stage SLP for Portfolio Selection Problem
951
Table 2. Return rates of Model-1 and Model-2 with MTX Month Model-1/MTX Model-2/MTX (target l) market Nov-03 -0.0009 0.0090 (18173) -0.0099 Dec-03 0.0118 0.0129 (17923) -0.0194 Jan-04 0.0134 0.0543 (18095) 0.1100 Feb-04 0.0449 0.0034 (18919) 0.0346 Mar-04 0.0158 0.0121 (12781) -0.0042 Apr-04 0.0134 0.0290 (7543) 0.0353 May-04 0.0390 0.0374 (9155) -0.1394 Jun-04 0.0350 0.0344 (6175) -0.0513 Jul-04 0.0350 0.0346 (8596) -0.0272 Aug-04 0.0352 0.0304 (15484) 0.0034 Sep-04 0.0436 0.0569 (16906) 0.0817 Oct-04 0.0122 0.0567 (16346) -0.0141 Nov-04 -0.0398 -0.0284 (9475) 0.0415 Dec-04 0.0053 0.0027 (10114) -0.0043 Jan-05 0.0344 0.0316 (12071) -0.0179 Feb-05 -0.0425 -0.0425 (11652) 0.0421 Mar-05 -0.0113 0.0088 (8183) -0.0116 Apr-05 -0.0447 -0.0247 (7897) -0.0625 May-05 -0.0112 0.0994 (13312) 0.0347 Jun-05 -0.0253 -0.0380 (7579) 0.0613 Jul-05 0.0447 0.0247 (5105) 0.0275 Aug-05 0.0370 0.0337 (8193) -0.0283 Sep-05 0.0325 -0.0011(7523) -0.0280 Oct-05 -0.0048 0.0186(7659) -0.0615 Nov-05 0.03137 0.0539(21563) 0.0618 Mean return 0.0122 0.0204 0.0022 Standard deviation 0.0283 0.0326 0.0530 Correlation with market -0.103 0.063 Pearson P-Value 0.624 0.765
returns of the assets in the coming month in the second stage are then simulated by using single index model. Single index model describes the relation of the return of each asset with the market in a simple linear regression model. We only need to simulate the return of the market as a Brownian motion and the corresponding error term in each regression model. We call the model with fixed target Model-1 and the one with dynamic target Model-2. In each model, we further consider one of the two index futures(TX or MTX). Therefore, four models, Model-1 with TX, Model-2 with TX, Model-1 with MTX and Model-2 with MTX, are tested by using MPL/Cplex software. We assume our budget is NT$1,200,000(around US$40,000 ) for the model with TX and NT$ 300,000 (around US$ 10,000 ) with MTX. For Model-1, the corresponding fixed targets for TX model and MTX model are NT$36,000 and NT$9,000, respectively. We let C be zero and α be 0.15. Our SLP models have been tested for each of the consecutive 25 months starting from Nov. 2003. During each month, we may close the portfolio at any day once the criterion is satisfied. The numerical performance results of Model-1 and Model-2 with TX and MTX are presented in Table 1 and Table 2, respectively, and the corresponding graphical comparisons diagrams are shown in Figure 1 and Figure 2. The mean monthly return rates of Model-1 with TX, Model-1 with MTX, Model-2 with TX and Model-2 with MTX are 1.23%, 1.122%, 1.75% and 2.04%,
952
K.-H. Chang, H. Chen, and C.-F. Lin
Fig. 1. Comparisons of the return rates between Model-1/TX, Model-2/TX and market
Fig. 2. Comparisons of the return rates between Model-1/MTX, Model-2/MTX and market
respectively. They are all far higher than the local monthly interest rate of savings, 0.1071%. Although beating the market is not what we concern, the results show that our returns are also better than the market return, 0.22%. The standard deviation of the return rates of Model-1 with TX, Model-1 with MTX, Model-2 with TX and Model-2 with MTX are 2.76%, 2.76%, 2.99% and 2.69, respectively. They are all significantly smaller than the standard deviation of the return rate of market, 5.33%. Also, by observing Figure 1 and Figure 2, the monthly returns of our portfolios fluctuates in a smaller scale than the market does. It indicates that the returns of our portfolios are steadier than market, furthermore, the correlations between each of these four models and market are all small and, according to Pearson test, the large P-values indicate that the return rates from our models are uncorrelated with the market.
Application of Two-Stage SLP for Portfolio Selection Problem
4
953
Conclusion
This research provides two-stage stochastic program models for selecting the portfolio along with one index future. These models are also incorporated with other statistical models such as single index model and MAD model. Empirical results show that the returns of the portfolio under our investment strategy are stable, independent of the market and higher than the interest rate of savings. These are the important investment criterions for risk-aversion investors. Twostage SLP shows itself a sophisticated way for solving the portfolio selection problem and it works well. Retaining the framework of two-stage SLP, multiple index model and MAD model for the skewness can further be considered in the future study.
References 1. Cai, X., Teo, K. L., Yang, X. and Zhou, X. Y. (2000). Portfolio Optimization under A Minimax Rule, Management Science, Vol. 46, No. 7, 957-972. 2. Chang, K.-H., Chen, H.-J. and Liu, C.-Y. (2002). A Stochastic Programming Model For Portfolio Selection, Journal of the Chinese Institute of Industrial Engineers, Vol. 19, No. 3, 31-41. 3. Chang, K-H. (2004). Safety-First Portfolio Selection Problem with Index Future, Technical report, Dept. of Industrial Engineering, Chung Yuan Christian University. 4. Elton, E. J. and Gruber, M. J. (1995). Modern Portfolio Theory and Investment Analysis, John Wiley and sons, New York. 5. Higle, J. L. and Sen, S. (1996). Stochastic Decomposition, Kluwer Academic Publishers, Dordrecht 6. Kane, A. (1982). Skewness Preference and Portfolio Choice, Journal of Financial and Quantitative Analysis, Vol. 17, 15-25. 7. Konno, H. and Yamazaki, H. (1991). Mean-Absolute Deviation Portfolio Optimization Model and Its Applications to Tokyo Stock Market, Management science, Vol. 37, No. 5, 519-531. 8. Konno, H. and Kobayashi, K. (1997). An Integrated Stock-Bond Portfolio Optimization Model, Journal of Economic Dynamics and Control, Vol.21, 1427-1444 9. Konno, H, Shirakawa, H. and Yamazaki, H. (1993). A Mean-absolute DeviationSkewness Portfolio Optimization Model, Annals of Operations Research, Vol. 45, 205-220. 10. Kouwenberg, R. (2001). ”Scenario Generation and Stochastic Programming Models for Asset Liability Management, European Journal of Operational Research, Vol. 134, 279-292. 11. Markowitz, H. M. (1952). Portfolio Selection, Journal of Finance, Vol. 7, 77-91 12. Telser, L. G. (1955). Safety First and Hedging, Review of Economics Studies, Vol. 23, 1-16.
Hierarchical Clustering Algorithm Based on Mobility in Mobile Ad Hoc Networks Sulyun Sung, Yuhwa Seo, and Yongtae Shin Dept. of Computer Science, Soongsil University, Sangdo-Dong, Dongjak-Gu, Seoul 156-764, Korea {ssl, zzarara, shin}@cherry.ssu.ac.kr
Abstract. This paper proposes a hierarchical clustering method based on the relative mobility pattern in mobile ad hoc environment. The relative mobility pattern between two nodes is evaluated by using a received message and the cluster is created by grouping nodes with a relative mobility below a specific threshold. Also, we create a hierarchical clustering structure by allowing a merge among clusters based on the mobility pattern. In this way, the proposed mechanism can increase a continuity of total cluster structure. Since we allow the combination of clusters, we can reduce the number of cluster and message required for a routing. To evaluate a performance of our mechanism, we compared ours with the existing LCC and WCA by a Glomosim. The simulation results show that our scheme can provide the higher stability and efficiency than existing schemes.
1 Introduction Ad hoc network is a dynamically reconfigurable wireless network with no fixed infrastructure or central administration. Each host acts as a router and supports a network function such as a traffic routing. For two nodes to communicate in ad hoc network, the routing on the wireless path of multi-hop is required. But, since ad-hoc environment doesn’t have a fixed infrastructure and the wireless link between two nodes cannot be moved, many problems may happen. The node mobility results to a frequent communication disconnection and the path re-setup by a node movement causes the network congestion. The efficiency of an adaptive routing algorithm depends on a timeliness and topology information. But the most important factor is the number of information exchange. Since the ad hoc environment has a frequent topology change, the exchange of updated routing information can make the network be saturated easily. Since the rate of link failure depends on a mobility of node, the high mobility will increase a traffic consumed to maintain a path. Therefore, the solution for an efficient routing is to minimize a response to mobility. To provide multi-hop communication in ad hoc networks, numerous routing protocols have been developed. Many of these protocols can be placed into one of two classes: proactive (table-driven) approaches, and reactive (on-demand) approaches. Proactive protocols are derived from the traditional distance vector and link state protocols commonly used in wired networks. These protocols maintain routes between each pair of nodes throughout the lifetime of the network. While this approach has the benefit that a M. Gavrilova et al. (Eds.): ICCSA 2006, LNCS 3982, pp. 954 – 963, 2006. © Springer-Verlag Berlin Heidelberg 2006
Hierarchical Clustering Algorithm Based on Mobility in Mobile Ad Hoc Networks
955
route is generally available the moment it is needed, proactive protocols have poor scaling properties due to their O(n2) overhead. Additionally, previous work has shown these protocols to not perform as well as reactive routing protocols in most scenarios. Reactive protocols, on the other hand, only establish routes on-demand, or when needed. These protocols thereby only incur overhead for route construction and maintenance when those routes are actually needed, since they do not maintain routes that are not utilized. The drawback to these protocols is that they introduce a route acquisition latency, or a period of waiting to acquire a route after the route is needed. These protocols have shown to also have limited scalability, due to their route discovery and maintenance procedures. One alternative to these protocols for improving scalability is clustering, or hierarchical, routing protocols. Hierarchical protocols place nodes into groups, often called clusters. These groups may have some sort of cluster leader that is responsible for route maintenance within its cluster and between other clusters. Also, the effect of dynamic topology change is limited to the inside of cluster. The reactive routing is used within the cluster and the proactive routing to deliver a data between clusters is used. The most important thing in these routing protocols is to reduce an overhead to create and maintain a cluster. If the clustering algorithm is very complex and cannot guarantee a stability of cluster, it cannot cause an efficient result. Also, when the mobility is very high, the overhead by a cluster re-generation will increase. Therefore, the mobility of node should be considered importantly when the cluster is created. This paper proposes a hierarchical clustering method based on the relative mobility pattern of the neighboring nodes in mobile ad hoc environment. The relative mobility pattern between two nodes is evaluated by using a received message and the cluster is created by grouping nodes having the mobility pattern below a specific threshold. Also, we create a hierarchical clustering structure by allowing a merge among clusters based on the mobility pattern. In this way, the proposed mechanism can increase a continuity of cluster and since we allow the combination of clusters, we can decrease the number of cluster and message required for a routing. The rest of the paper is organized as follows. Section 2 presents a new hierarchical clustering method based on the mobility pattern. Section 3 demonstrates the performance improvement of our scheme over the existing scheme using glomosim. Finally, we conclude in Section 4.
2 Clustering Algorithm Based on Mobility Pattern The proposed clustering algorithm uses a mobility pattern of mobile node as a major metric for a cluster formation. The proposed algorithm to improve a durability of created cluster uses relative mobility with neighboring nodes to be worked based on mobility pattern of each node. We can define that two nodes that have the similar velocity and direction relatively have the less relative mobility difference. One cluster is formed by nodes which have the less relative mobility difference and the more adjacent clusters than two can be combined if they have a similar relative mobility difference. As a result, since the mobile nodes with a similar mobility are grouped, the formed cluster in our scheme can be kept continuously more than one in existent clustering techniques. Also, as nodes incline to move by a unit of group in ad-hoc network that support multiplex connection communication, the proposed clustering
956
S. Sung, Y. Seo, and Y. Shin
algorithm can be applied more effectively[6],[7]. Because clusters with the less relative mobility difference can be combined, the proposed algorithm can reduce the number of cluster in all ad-hoc networks efficiently. This paper supposes following facts. Two nodes are connected by full-duplex link and each node can measure the signal strength. Also, the network topology can be defined by graph G=(V,E). At this time, V is set of node, and E is set of full-duplex links that act independently in each direction. The logical distance d(x,y) between two nodes belonging to graph G is defined by the number of minimum hop and becomes d(x,y) ≤ L of two nodes selected randomly in one cluster. Also, the real distance between adjacent two nodes x,y can be measured by a received signal strength by a hello message or beacon message that exchange periodically. The received electric power can be calculated by numerical formula (1) by using a Friss’s freedom space damage formula.
Pr = Pt × G t × G r ×
λ2 (4 × π × d ) 2
(1)
Pr in (1) is received electric power, Pt is transmitted electric power, Gt is electric power gains of sending antenna (dB) , Gr is electric power gains of receiving antenna (dB), λ is utilization wave length, d is distance. Generally, since an electric power is proportional in square of distance, the real distance between two nodes can be calculated by Pr of (1). As it is impossible to calculate correct distance by using the received electric power, the produced distance becomes a distance estimate. 2.1 Relative Mobility Calculation The proposed algorithm utilizes mobility model of [5] to calculate a relative mobility. The most general mobility model is Random Walk Mobility Model. We basically assume that the mobile node follows a Random Walk Mobility Model to increase a generality. Each mobile node identifies existence of neighboring nodes through hello message. When the mobile node received hello message, it knows the only location information at time t. Therefore, to measure relative mobility with neighboring nodes, the mobile node uses location information collected for specific time. If the movement pattern of neighboring nodes becomes known by other additional method, each mobile node can calculate a relative mobility by an estimate vector defined in [5] without collecting a location information for a specific time. First, this paper supposes that it is impossible to predict a next location of each node. Therefore, this section describes a method to calculate relative mobility by Random Walk Mobility Model. Since the mobile node which follows a Random walk mobility model selects a direction and speed randomly, ad hoc node cannot predict a direction vector after t time. Each node selects the new speed and direction in predefined extent [the minimum speed, the maximum speed] and [0,2 ]. The mobile node exchanges a hello message periodically with neighborhood nodes. The mobile node can calculate the received electric power using (1) from hello message. The node A can calculate distance E[dAB]t at time t like a formular (2) by a received electric power.
π
Hierarchical Clustering Algorithm Based on Mobility in Mobile Ad Hoc Networks E [D
AB
]t
=
k pr
957
k
= Pt × G t × G
r
×
λ2 (4 × π × d ) 2
(2)
The relative mobility RMAB between A and B shows a distance difference about whether A and B moved relatively in some degree. When RMAB value is smaller, we can conclude that two nodes A, B are moving to more similar direction with similar speed. If a distance difference between node A and B during t1-t0 is calculated, the relative mobility of two nodes for t1-t0, RMAB can be produced.
RM
AB
= E [D AB
]t
− E [D AB
]t −1
(3)
Also, the average relative mobility, RMAB_T during a specific time T between node A and B can be calculated like a formula (4). Each node updates neighboring node management table by using a (4).
RM
AB_T
=
1 N
N
∑ ( E [D ] i =1
AB
i
− E [D AB
]i −1 )
(4)
2.2 Clustering Algorithm Each node transmits a hello message periodically. Each node can identify the existence of neighboring node by a received hello message and decide a role of itself in cluster. The Hello message each node transmits includes <node ID, Mode, Cluster Head ID, Seq_num, Level_info>. At this time, node ID is unique identification data that each node is given, Mode has one value among 4 values(cluster head, general node, gateway, relay gateway). If one node belongs in more cluster head than two, the mode of this node becomes gateway. The relay gateways are nodes that lie in separate clusters, but that are within transmission range of one another. Such paris of nodes can also be used to route between clusters. Cluster Head ID is ID of cluster head node of itself and seq_num is used to identify a hello message and is augmented by 1 whenever is transmitted. Level_info is a value for organizing clusters hierarchically and will be explained in detail in 3.2.1 chapter. Node can not decide own mode when initialized. Therefore, each node sets up it’s mode to a general node and informs own existence to neighboring nodes by a hello message. After each node transmits hello message, it set up a timer to receive a hello message from other node. After each node receives a hello message from neighboring node for a specific time, it creates neighboring node management table by this message. The information in a neighborhood node management table is ID of node that transmits hello message, electric power Pr produced by (1), relative mobility at time t produced by (3), relative mobility values during limited time T produced by (4). After each node receives hello message, it measures the signal strength of received message and creates new entry in neighborhood node management table by using node id in message. After each node adds the signal strength in neighborhood node management table, it stores a distance produced by (2) and time t together in neighborhood node management table. If the neighborhood node management table
958
S. Sung, Y. Seo, and Y. Shin
already has the id of correspondent node, each node produces a relative mobility by (3) and a distance value of time t-1and stores this in neighborhood management table. Each node runs this procedure repeatedly until the predefined timer is expired. Finally, when timer is expired, the mobility node can get an average relative mobility with neighboring node for time T. If the set of node that exists on neighboring node management table is Sm, each node selects node B which has the smallest RMAB_T, RMAB_T < Thresholdmob, B Sm and sends a Head_request message to node B. In other words, the cluster head is decided by (5). At this time, the mobility threshold value, Thresholdmob is a design parameter of algorithm and is determined by an experiment since it is a variable to control a stability of cluster.
∈
Cluster_Head = Leasti∈ Sm{ID | RMAB_T < Thresholdmob }
(5)
The node that receives hello message updates a neighboring node table first. If the mode in received hello message is cluster_head and the cluster_head_id of itself is undefined, it validates whether RMAB_t < Thresholdmob. If RMAB_t < Thresholdmo., it transmits cluster_ack message to the node which transmits a hello message and updates own neighboring node table.But if the mode of hello message received is not cluster_head or own cluster_head_id is not set up yet, the node stores a hello message in cache and updates a neighboring node management table after calculating a mobility. Each node repeats this work during T time. If RM value satisfies Leasti∈ Sm{ID | RMAB_T < Thresholdmob }} after T time, each node sends head_request message to the relevant node. After node that receives a head_request message corrects own Mode to cluster head, it transmits hello message to nodes in the neighborhood node table and cluster member table. At this time, the hello message is transmitted with a short interval to reduce a time required to form a cluster. If the node does not find a node that has smaller RMAB_T than Thresholdmob value during a limited time, it set up itself to a cluster head and transmits a hello message. Also, the node that receive hello message from the more cluster heads than two sets up its mode to gateway. Algorithm. 1-level cluster formation Variables Neighboring_node_listi = Node I’s neighboring node management table Cluster_member_listi = Cluster head, node I’s cluster member table RMij_T = Relative mobility between node i and node j nodeh = Node h that has the smallest RMih_t value than Thresholdmob among Neighboring_node_listi of node i Initially Neighboring_node_listi = nil, Cluster_member_listi = nil upon timer is expired; Send Hello message to nodes Neighboring_node_listi upon receiving Hello message from nodej ; update Neighboring_node_listi ; if(helloj->mode = cluster_head && cluster_head_idi = undefined ) if(RMij_t < Thresholdmob) Send cluster_ack message to nodej Update Neighboring_node_listi else
∀
∈
Hierarchical Clustering Algorithm Based on Mobility in Mobile Ad Hoc Networks
959
while(timer of T is not expired) Save Hello message in cache Measure power signal and calculate and save RMij_t If(nodej == {Leasti∈ Sm{ID | RMAB_T < Thresholdmob }}) Send Head_request message to nodej upon receiving cluster_ack message from nodej update cluster_member_listi upon receiving Head_request message from nodej update modei to cluster_head update Cluster_member_listi Send Hello message to nodes Neighboring_node_listi && Cluster_member_listi
∀
∈
2.2.1 Hierarchic Cluster Formation The Hello message that each node transmits includes Level_info value. This level_info value is increased by 1. The nodes that are not assigned a Level_info have undef value. All nodes except a cluster head have a Level_info value of 0. The cluster head of Level-1 has Level_info value of 1, and cluster head of Level-2 has Level_info value of 2. The value of Level_info is same with the hierarchical level of relevant cluster head. The cluster gateway and relay gateway receives a hello message periodically from node in neighboring clusters. Whenever both a gateway and a pair of joint gateways exist to connect tow clusters, the single gateway is favored because it is one fewer hop. Since each hello message includes cluster head ID and Level_info value, each node can hold the hierarchical level of each cluster. After the gateway receives hello message from adjacent nodes, it creates entry on neighboring node management table. At this time, the gateway and relay gateway may have the entry of node in other cluster. If the relative mobility with node in other cluster is smaller than Thresholdmob, two adjacent clusters can organize a hierarchical structure. By a hello message, the node that has smaller cluster head ID transmits Cluster_Join message to neighboring gateway and own cluster head to form a hierarchical structure. At this time, the relay gateway transmits Cluster_Join message to neighboring relay gateway and cluster head. The Cluster_Join message includes the hierarchical information of cluster of a sending The node that receives Cluster_Join message forwards a Cluster_Join message to cluster head. After receiving a Cluster_Join message, a cluster head sends a hello message after increasing its Level_info by 1. Basically, before forming a hierarchical structure, all cluster heads have Level_info value of 1. The node that has Level_info value of 1 becomes always a parent cluster and the cluster that has Level_info value more than 2 becomes child cluster of parent node. In this way, the hierarchical clustering structure is formed. In case that two cluster heads are located within a direct transmission range, they also can form a hierarchical structure. Though one of two cluster heads gives up a head and two clusters can be combined, this causes a ripple effect to whole ad hoc network. Therefore, this paper supposes that if there is two cluster heads in direct transmission range and the relative mobility is lower than Thresholdmob, the hierarchical clustering mechanism is used. At this time, in case that two cluster heads have same Level_info value, they decide a hierarchy using node id. And, in case that two cluster heads have a different Level_info value, the node that
960
S. Sung, Y. Seo, and Y. Shin
has big id adds 1 to Level_info of node id which has a small id relatively and set up the result value to its Level_info. After that, the node that has big id notifies a Level_info change by sending a hello message.
Fig. 1. Hierarchical cluster formation
After CH1 and CH2 move as an example (c) of figure 2, if they belong within each other’s transmission range, CH2 that has bigger id transmits Cluster_head_join message to CH1 and set up own Level_info to Level_info of CH1 + 1. CH2 transmits hello message to nodes in a neighboring node management table to notify a Level_info change. But, in case that CH1’s cluster includes CH2’s cluster as an example (4) of figure 2, CH2 gives up mode of cluster head and returns to a general node.
Fig. 2. Hierarchical clustering structure
Algorithm. Hierarchical cluster formation Initially Neighboring_node_listi != nil, Cluster_member_listi != nil upon receiving Hello message from nodej ; if(helloj->cluster_head_id!=cluster_head_idi && helloj->mode!=cluster_head) while(timer of T is not expired) Save Hello message in cache Measure power signal and calculate and save RMij_t if(RMij_t < Thresholdmob && node_idi < node_idj) Send cluster_join message to nodej else if(helloj->cluster_head_id!=cluster_head_idi && helloj-> mode=cluster_head)
Hierarchical Clustering Algorithm Based on Mobility in Mobile Ad Hoc Networks
961
⊆
if(Cluster_member_listi Cluster_member_listj) while(timer of T is not expired) Save Hello message in cache Measure power signal and calculate and save RMij_t if(RMij_t < Thresholdmob && node_idi < node_idj ) Send cluster_head_join message to nodej upon receiving cluster_join message from nodej if (modei = gateway || modei = joint_gateway) Forward cluster_join message to cluster head else if (modei = cluster_head) set to Level_infoi = level_infoi + 1; nodes Neighboring_node_listi && Send Hello message to Cluster_member_listi upon receiving cluster_head_join message from nodej if(node_idi > node_idj) set to level_infoi = level_infoj + 1; Neighboring_node_listi && send Hello message to nodes Cluster_member_listi
∀
∀
∈
∈
3 Performance Evaluation Each of the experiments was performed using the Glomosim network simulator[9] developed at UCLA. The IEEE 802.11 MAC layer protocol is used for channel access in each simulation. For the performance evaluation of proposed mechanism, we compares ours with LCC and WCA by Jorge Nuevo[11]. Glomosim is developed at the University of California, Los Angeles using PARSEC[10]. The simulation is executed for 500 seconds. The node density is fixed to 80 and transmission range of each node is limited by 350. The mobility model is set to a random direction model. The maximum speed of a node is varied between 0m/s and 10m/s. The Thresholdmob is set to Mmob + k δmob( =1.5). The simulation is executed to evaluate a stability and efficiency. We measured the number of cluster change according to a speed of a node to evaluate a stability of our mechanism and the control message number according to a speed to evaluate an efficiency of our mechanism. To improve the stability of our mechanism, we compared ours with existing schemes, LCC, WCA. [Figure 3] shows a comparison result of cluster change. In this simulation, the number of node is fixed to 50. Since the cluster change is resulted from a cluster head change, we measured the nodes changed from the general node to the cluster head and the nodes changed reversely. As the average node speed increases, the number of such changes rises. At higher speeds, nodes change neighbors more rapidly. The LCC results in the greatest number of leadership changes during the simulation. Our mechanism results in the least number of cluster changes. In this result, we can show that our mechanism can provides the higher stability than existing schemes. The stability of the cluster topologies is more closely reflected by the second one [figure 3]. This figure represents the number of cluster leaders at each second during the first 170 seconds of
· k
962
S. Sung, Y. Seo, and Y. Shin
the simulations. Our mechanism results in a fairly stable number of cluster leaders that fluctuates between 11 and 14. But, LCC and WCA show much wider variance. A high number of cluster leader changes results in significant communication overhead for routing updates and cluster reconfigurations, and leads to an unstable topology. 16
350 300
14
ours WCA
250
LCC
d12 a e h r e t10 s lu c
e 200 g n a h c 150
8
ours
100
LCC
6
50
WCA
4
0 0
1
3
5
10
0
20
40
60
speed(m/s)
80
100
120
140
160 170
time(s)
Fig. 3. Number of cluster and cluster head
[Figure 4] shows a simulation result of efficiency of our mechanism. The first one of [Figure 4] represents the number of packets able to be delivered by our scheme running over AODV. AODV-C means the AODV with our clustering scheme. In figure 4, our scheme outperforms AODV without our clustering scheme. Since our clustering scheme can provide a stable backbone, the speed of node have a little effect on our scheme relatively. The second one of [Figure 4] shows the number of RREQ. 3.5
35
AODV(n=50) AODV-C(n=50) AODV(n=250) AODV-C(n=250)
3 2.5
e k c a p l a t o t
2
AODV
30
AODV-C
25
Q20 E R R15
1.5 1
10
0.5
5
0 0
1
3 speed (m/s)
5
10
0 0
1
3
5
10
Speed(m/s)
Fig. 4. Total data packet and RREQ comparison result
In our scheme, the all control messages are processed by a cluster head. Therefore, our scheme has far fewer control messages than AODV without our scheme.
Hierarchical Clustering Algorithm Based on Mobility in Mobile Ad Hoc Networks
963
4 Conclusion This paper proposes a hierarchical clustering method based on the relative mobility pattern of the neighboring nodes in mobile ad hoc environment. The relative mobility pattern between two nodes is evaluated by using a received message and the cluster is created by grouping nodes having the mobility pattern below a specific threshold. Also, we create a hierarchical clustering structure by allowing a merge among clusters based on the mobility pattern. In this way, the proposed mechanism can increase a continuity of cluster. Since we allow the combination of clusters, we can reduce the number of cluster and message required for a routing. To evaluate a performance of our mechanism, we compared ours with the existing LCC and WCA[10] by a Glomosim. The simulation results show that our scheme can provide the higher stability and efficiency than existing schemes.
References [1] S.Basagni, Distributed clustering for ad hoc networks, in:Proceedings of the 1999 International Symposium on Parallel Architectures, Algorithms, and Networks, Australia (June 1999) pp. 310-315 [2] E.M Belding-Royer, Hierarchical routing in ad hoc mobile networks, to appear in the Wireless Communications and Mobile Computing (2002). [3] C.-C. Chiang, H.-k, Wu, W. Liu and M.Gerla, Routing in clustered multihop, mobile wireless networks with fading channel, in: Proceedings of IEEE Singapore International Conference on Networks(SICON) (April 1997) pp. 197-211 [4] X. Hong, M.Gerla, G. Pei, and C. Chiang. A group mobility model for ad hoc wireless networks, In Proceedings of ACM/IEE MSWiM, Seattle, WA, Aug. 1999 [5] R. Ramanathan and M.Steenstrup, Hierarchically-organized, multihop mobile wireless networks for Quality-of-service support, ACM/Baltzer Mobile Networking and Applications 3(1) (1998) 101-118 [6] Hass Zj. A new routing protocol for the reconfigurable wireless networks. Proceedings of the ICUPC ’97 1997 [7] Lee S-J, Su W. Hsu j, Gerla M, Bagrodia R. A performance comparison study of ad hoc wireless multicast protocols. Proceedings of the IEEE INFOCOM 2000 [8] Boris Mitelman and Arkady Zaslavsky. Link State Routing Protocol with Cluster Based Flooding for Mobile Ad-hoc Computer Networks. In Proceedings of the workshop on Computer Science and Information Technologies(CSIT), Moscow Russia, 1999 [9] GloMoSim: Global Mobile Information Systems Simulation, http://pcl.cs.ucla.edu/ projects/glomosim/ [10] PARSEC: Parallel Simulation Environment for Complex Systems, http://pcl.cs.ucla.edu/ projects/parsec/ [11] C.R.Lin and M.Gerla. Adaptive clustering for mobile wireless networks. IEEE Journal on Selected Areas in Communications, 15(7):126501275, Sept. 1997 [12] A.D.Amis, R. Prakash, T.H.P. Vuong, Max-min d-cluster formation in wireless ad hoc networks. In proceedings of IEEE INFOCOME ’00, Vol. 1, pages 32-41, Mar. 2000
An Alternative Approach to the Standard Enterprise Resource Planning Life Cycle: Enterprise Reference Metamodeling Miguel Gutiérrez1, Alfonso Durán1, and Pedro Cocho2 1
Departamento de Ingeniería Mecánica, Universidad Carlos III de Madrid, Av. de la Universidad 30, 28911 Leganés (Madrid), Spain {miguel.gutierrez, alfonso.duran}@uc3m.es http://www.uc3m.es/uc3m/dpto/dpcleg1.html 2 Adalid MyO, c/Berlín 3F, of. 1.16, 30395 Cartagena (Murcia), Spain [email protected] http://www.adalidmyo.com
Abstract. The Enterprise Resource Planning (ERP) systems development is based on the initial definition of an enterprise reference model, whose richness and generality will basically determine the flexibility of the resulting software package to accommodate specific requirements. The desired accommodation is rarely accomplished without costly customization processes, which are frequently unaffordable for the small and medium enterprises. In this paper, an alternative ERP development approach, stemming from the implementation of an enterprise reference metamodel, is proposed. We present an analysis of the resulting alternative ERP life cycle, particularly focusing on the metamodel that constitutes the core of the proposal. The results obtained in a complete enterprise software development project, encompassing software package development, tailored implementation and some post-implementation customization, show that the proposed approach facilitates the identification and fulfillment of the customer’s specific requests at a reduced cost, even if they arise after the implementation.
1 Introduction Enterprise Resource Planning (ERP) systems are commercial software packages that provide an integrated solution to support all the areas of a company (financial, human resources, production…) [1]. The solution starts from a generic reference model of the business entities of a company and their functional relationships [2]. A well known example is the set of models proposed by Scheer [3]. Flexibility at the implementation stage is basically contingent on the richness of the initial model [4]. Where the accommodation to the enterprise requirements is not satisfactory, as it is normally the case, a code customization process is required [5], [6]. This is particularly onerous to the small and medium enterprises (SMEs) that frequently cannot afford it [5], [7]. Even when that customization is not required, after the software has been in production for some time new requirements will eventually appear. The postimplementation flexibility of ERP systems, that is, the ability to accommodate these M. Gavrilova et al. (Eds.): ICCSA 2006, LNCS 3982, pp. 964 – 973, 2006. © Springer-Verlag Berlin Heidelberg 2006
An Alternative Approach to the Standard Enterprise Resource Planning Life Cycle
965
new requirements, is, however, very limited. In execution time, no new entities can be added to the initial model, no new attributes assigned to the existing entities, and no new functional relationships established. This paper is an outcome of a research project undertaken to tackle this lack of flexibility exhibited by enterprise software by allowing, at execution time, not only the creation of new instances of previously modeled entities, but also the definition of new entities, the definition and assignment of new attributes, and the creation of new relationships. This approach leads to increasing by one layer the abstraction level of the enterprise model, i.e., shifting the codification boundary to the metamodel layer [8], which, in turn, leads to an alternative ERP life cycle. In the following section, a schematic representation of the basic phases encompassed by the prevailing ERP life cycle is depicted as a reference, and the current related research literature is briefly reviewed. In the third section, the metamodel that constitutes the core of the proposal is described, and it is positioned in the four-layer metamodel architecture. The fourth section presents the proposed alternative software life cycle based on the abovementioned metamodel, highlighting its conceptual and practical differences. The fifth section presents the experimentation carried out in a real development project, to finalize in the conclusions section by summarizing the major advantages and implications of the proposed approach.
2 Enterprise Software Systems Development Cycle. Current Status ERP systems do not adhere to a fixed life cycle. However, a generic sequence of stages can be identified (Fig. 1) [1], [4], [5], [6], [7]: • Reference enterprise model implementation. The starting point is a reference model of an enterprise —a superset of the models of the spectrum of target organizations— depicted in Figure 1 as a network of interrelated elements. • Industry-specific solution. In order to facilitate implementations, the major ERP vendors have adopted two strategies: a modular design which allows building a first cut of a specific solution by combining some basic modules (like the Solution Composer of the ERP market leader SAP [9]), and the development of some prepackaged industry-specific solutions, such as pharmaceutical, chemical, automotive or insurance [10], [11]… often used as the implementation starting point. • Company-specific submodel – Parametrization. Whether the starting point is the generic or the industry-specific model, the next implementation stage involves distilling the submodel that corresponds to the specific requirements of the company that is implementing the system. This normally involves a substantial and costly consulting engagement, which is aimed at establishing the set of parameters (more than 5000 in SAP R/3 [7]) that particularize the initial model. • Code Customization. Even after parametrization, the model normally does not address all the specific requirements, thus forcing the company to either modify/supplement the software or/and change its business processes and procedures to conform to those embedded in the software. This requires expensive ad-hoc devel-
966
M. Gutiérrez, A. Durán, and P. Cocho
opment, depicted in Figure 1 through the addition of new elements and the modification (represented through a color/pattern change) of existing ones. • Post-implementation. Some companies embark in successive ad-hoc developments in the post-implementation stage, also involving substantial outlays. This is depicted in Figure 1 through an additional color/pattern change in one element.
Fig. 1. Generic ERP life cycle
These stages encompass the basic foundations of the current ERP life cycle. Further refinements proposed by some authors provide some useful insights. Brehm et al. propose a typology of what they call ERP tailoring types, including the role played by third-party packages (bolt-ons) and legacy systems [4]. Luo and Strong combine the standard stages with three options in process customization (no change, incremental, radical change) [5]. Scheer and Habermann highlight the problem of implementation costs, stating that the major ERP vendors estimate those costs (processes alignment plus software customization) in three to seven times the cost of the software license [7]. In summary, what stands out from the literature review is the troublesome nature of ERP implementations, the criticality of the alignment between the systems and the enterprise processes [6], and the generalized need to carry out customization processes to fill the gap between the required model and the ERP reference model. To ease the implementation process, the current efforts in ERP systems are primarily directed towards improving the enterprise modeling languages. Soffer et al. analyze the desirable characteristics of such languages [12]. Rosemann and van der Aalst discuss the limitations of the current enterprise modeling languages and propose an improved version of the Event-Driven Process Chains (EPCs) used by SAP [2]. A valuable reference is the research carried out by the UEML (Unified Enterprise Modeling Language) group. UEML combines both the intention of standardizing the enterprise modeling languages and of providing a framework to information sharing between enterprises [13]. The latter is aligned with the current research focus on applying metamodeling to enterprise software, which is providing a common repository of information to be shared by enterprises or for data warehousing purposes [14].
An Alternative Approach to the Standard Enterprise Resource Planning Life Cycle
967
3 Enterprise Metamodeling This paper takes an approach that differs from the abovementioned: to base the software development in the implementation of an enterprise reference metamodel. To facilitate the positioning of the proposed approach, we will first describe the conventional four-layer metamodeling architecture and then the enterprise metamodel, including an example of application in the reference architecture. 3.1 Four-Layer Metamodeling Architecture The conventional four-layer metamodeling architecture establishes the basic metamodeling concepts through a hierarchy of modeling levels or (meta)layers (Mn; n=0,1,2,3), in which a model of a given layer is an instance of a model of the following layer, in the sense that each element of the former is an instance of an element of the latter [15]. In the current specification of the metamodeling language MOF (Meta Object Facility) [8], the OMG describes four layers as follows: • The information layer (M0) is comprised of the data that we wish to describe. • The model layer (M1) is comprised of the metadata that describes data in the information layer. Metadata is informally aggregated as models. • The metamodel layer (M2) is comprised of the descriptions (i.e., meta-metadata) that define the structure and semantics of metadata. Meta-metadata is informally aggregated as metamodels. • The meta-metamodel layer (M3) is comprised of the description of the structure and semantics of meta-metadata. In terms of this architecture, the standard ERP development starts by creating a reference model corresponding to the model layer (M1). The alternative approach proposed in this paper involves implementing a reference metamodel corresponding to the M2 layer, named the Enterprise Metamodel, which is defined as an instance of MOF (as the OMG defines the UML). The Enterprise Metamodel encompasses an Entity Metamodel and a Relationship Metamodel, which are described in the next subsections. 3.2 Entity Metamodel The Entity Metamodel (Figure 2) comprises a hierarchy of Enterprise_Entity, each of which is characterized through the assignment of a set of Enterprise_Feature, and the corresponding Enterprise_Object (for the sake of legibility, from now onwards we will omit the “Enterprise_” preceding the elements of the metamodel). The hierarchy is established by the generalization class, and allows feature inheritance from parent entities (general) to child entities (specific). For each instance of Entity there will be multiple instances of Object, which will be characterized by assigning specific Values to the corresponding instances of Characterization. Therefore, features can conceptually exist independently from the entities. However, as shown in Figure 2, the feature instances only exist in connection with a specific instance of an Object, never autonomously. On the other hand, values have their own identity. Since, in the proposed approach, it is the metamodel that gets implemented in code, it is possible to assign data types defined at runtime to the
968
M. Gutiérrez, A. Durán, and P. Cocho
features. A relevant aspect, quite common in object orientation, is the possibility that one of the features in a class has a predefined value (Specific_Characterization), affecting all objects in that class and those in the class tree inheriting from it. From the standpoint of the proposed model, this implies that there are two types of Characterization, namely generic and specific.
Fig. 2. Entity Metamodel package
3.3 Relationship Metamodel The other essential elements of an enterprise model are the relationships between entities, established with commercial or management purposes. As Fig. 3 shows, the relationships package is modeled following the same approach as the Entity Metamodel. The Participation element links the entities to the relationships, thus playing a similar role to that played by the Characterization element by linking features to entities. There is also a complete classification in generic and specific participations. In the Generic_Participation it is possible to define cardinality. The proposed metamodel conceives the enterprise relationship as a group of entities with a purpose, thus it does not require the details of the internal relationships between those entities. The Relationship inheritance tree, set through the generalization element, allows the gradual and intuitive definition of the enterprise activity. For example, it is possible to create a “Sell” Relationship, involving the basic entities (like “Customer”, “Product”, “Sales rep”), and then to create the subtypes “Sell_Perishable” (in which the entity “Product” is replaced by its subtype “Perishable”), “Sell_Drink” (replacing “Product” by “Drink”), and so on.
An Alternative Approach to the Standard Enterprise Resource Planning Life Cycle
969
Fig. 3. Relationship Metamodel package
There is also a functional hierarchy of Relationships, with an associated sequence, so that the software can interpret the logic of the day-to-day processes. For example, the relationship “Sell” might consist of the sequence of relationships “Customer Order” -> “Invoice” -> “Ship”. The logic of the relationship resides also in the Methods, which result from the need to perform arithmetical operations involving the values of the features of the entities that participate in a relationship; for example, the calculation of the tax (VAT) of a sale as a fixed percentage of the sale amount. Finally, the activity of the enterprise will materialize in successive instances of Function (for example “Sell_Drink number ######”). 3.4 Enterprise Metamodel The enterprise metamodel derives from the merging of the entity and the relationship packages, besides the fundamental linking of both through the central Datum association (Fig. 4). Datum becomes an essential element of the metamodel: embodies the transactional activity of the enterprise; when functions are executed, the characterization of the objects will consequently vary.
Fig. 4. Enterprise Metamodel
970
M. Gutiérrez, A. Durán, and P. Cocho
3.5 Metamodel Hierarchy Example Figure 5 shows a simplified example, in which the positioning of the proposed metamodel with respect to the four-layer metamodel architecture is illustrated. As mentioned before, the metamodel is an instance of MOF. The relationships between layers are represented by the “instance of” links, while the intra-layer relationships between object instances and entity instances are represented by the “snapshot” links, which cross the logical boundary that delimits the “model zone” and the “data zone”. That resembles the equivalent diagram of the UML Infrastructure specification [15]. Please note that not all the links and labels are included and that the metamodel is incomplete; otherwise the picture would become unclear.
M3 (MOF)
Class
metamodel boundary
Association
<
M2
<
Enterprise_Relationship
implementation boundary
Enterprise_Feature
Enterprise_Entity
Participation
Relationship_ Generalization
<
Entity_ Genera lization
Characterization Enterprise_ Object
Enterprise_ Function <
Employee participates_in
<
<
<
Product
Drink 1 : Drink S_D-1: Sell_Drink
participates_in
Sell participates_in
Sales Rep
Com 1 : Sales Rep
Custormer <<snapshot>
participates_in
M1
Sell_Drink
participates_in participates_in
<<snapshot>
Cust_1: Customer
Drink 2 : Drink
S_D-2: Sell_Drink
Drink
<<snapshot>
logical boundary
Fig. 5. Positioning of the metamodel example
In order to clarify the semantics of the metamodel, a simple model, which is an instance of the relationship package, is depicted in layer M1. Some conventions are adopted for the representation of enterprise models, taking UML as a reference. The instances of Entity are represented as double-lined boxes; the Generalization association is represented by a line ending in a double-lined triangle; the Relationships are represented as double-lined diamonds; the Participation instances are circle-ended lines linking entities to relationships, and introducing the expression “participates in” above the line; finally, all the object-level instances are represented, like in UML, by underlining the name and referring to the respective entity.
An Alternative Approach to the Standard Enterprise Resource Planning Life Cycle
971
4 Alternative Development Cycle Based on all the concepts developed throughout the paper, this section summarizes the alternative enterprise software development cycle —resulting from the implementation of the metamodel presented in the third section— represented by the sequence of stages depicted in Figure 6:
Fig. 6. The alternative ERP life cycle based on enterprise metamodeling
• Enterprise reference metamodel. The starting point for the alternative cycle is the implementation in a database of the proposed metamodel of enterprise entities and relationships. This approach implies a higher generality as well as a flexible capability to adapt to specific requirements. • Industry-specific dictionary. Starting a company definition from scratch would be very effort-consuming (even though feasible). However, there is a basic hierarchy of entities and relationships essentially common to all enterprises in any given sector. Thus, the role played in the ERPs by the module-based and industryspecific solutions is performed, in the proposed approach, by a set of database preloads, containing a pre-definition of entities, features and relationships. • Custom implementation. A remarkable trait of the alternative approach is that the result of the adaptation of the model to the company is already customized (in the figure, it is depicted as identical to the customized model of the traditional approach). Therefore, the ad-hoc customization costs are avoided. Additionally, the user company might find it easier to identify itself with the customized entity hierarchy, which is closer to its reality, thus facilitating its involvement and consequently improving the requirement identification. • Post-implementation. Finally, the alternative approach provides a special flexibility in the post-implementation stage, by facilitating the adaptation to changing requirements since there is no pre-coded enterprise model, but a pre-coded enterprise metamodel.
972
M. Gutiérrez, A. Durán, and P. Cocho
5 Experimentation The practical application of the proposed alternative development cycle has established its potential to achieve flexibility in packaged enterprise software for small and medium enterprises (SMEs). A first version of an enterprise management software package developed following this approach is currently being utilized by several SMEs; it has also been recommended by the Spanish reprographic association to its member companies due basically to its flexibility to implement ad-hoc requirements. The current version of the software package is oriented to small companies searching for low cost software. It is database oriented software, with client/server architecture, with most of the code residing in the database (programming SQL). The client applications are windows-based; essentially are GUIs (Graphical User Interfaces), developed in Delphi, which access to the server database through TCP/IP protocol. It runs over either Linux or MS Windows operating system, uses the open source database FireBird 1.5.2 and has low hardware requirements (Server: Intel Pentium/AMD 3GHz and 1Gb RAM; Client: Intel Pentium/AMD 1GHz and 256Mb RAM). It encompasses two main modules: • ArmorArqt. This is the module used to model the enterprise, i.e., to create and characterize the entities and to create and define the participations of the relationships. ArmorARQT is used to populate the database that implements the metamodel. An inherent characteristic of a metamodel is the intrinsic definition of a language (Domain-Specific Language (DSL)) [13], [16]. • ArmorPrise. This is the enterprise management software. It follows a generic logic that starts by identifying the main relationships (those which are the first level components of the master relationship which is the enterprise itself) and grouping them in a menu option. The software then looks for the component relationships of each one. The user selects the relationship to be executed. The corresponding function is then created, and the software asks for the required objects as defined in the participations of the relationship. The methods associated with the relationship can be called to execute the defined operations.
6 Conclusions This paper presents an enterprise reference metamodel whose implementation leads to an alternative viable ERP life cycle. The proposed approach offers greater flexibility to accommodate company requirements than the current ERP development approach based on reference models. There is a noticeable improvement in the implementation stage, since the user of the package can be more involved in the modeling of its company, which is carried out in a much more intuitive fashion, thus improving the identification of requirements. The improvement is also substantial in the post-implementation stage, given the possibilities provided by the proposed flexible approach to modify the initial company model and to accommodate the series of changes in the requirements that normally arise after the go live date. Furthermore, this flexibility encourages the company to execute more frequent low-cost adaptations. To conclude, the experimentation with a real implantation of a software package developed according to the
An Alternative Approach to the Standard Enterprise Resource Planning Life Cycle
973
approach proposed in this paper, suggests that the customized result of the implementation and the subsequent flexibility are particularly advantageous, since ad-hoc development costs are drastically reduced as compared with the traditional approach.
Acknowledgements This paper has been prepared in the context of ARMORsystem, a collaborative research project between the ADALID Co. and the Carlos III University of Madrid, with financial support from FEDER, CDTI and the Instituto de Fomento de Murcia.
References 1. Davenport, T.H.: Putting the Enterprise into the Enterprise System. Harvard Business Review 76 (4) (1998) 121–131 2. Rosemann, M., van der Aalst, W.M.P.: A configurable reference modeling language. Information Systems, available on-line, to appear in (2006). 3. Scheer, A.-W.: Business Process Engineering: Reference Models for Industrial Enterprises. 2nd edn. Springer-Verlag, Berlin [etc.] (1994) 4. Brehm, L., Heinzl, A., Markus, M.L.: Tailoring ERP Systems: A Spectrum of Choices and their Implications. In: Proceedings of the 34th Hawaii International Conference on System Sciences, Vol. 8. IEEE (2001) pap. 17 5. Luo, W., Strong, D.M.: A Framework for Evaluating ERP Implementation Choices. IEEE Transactions on Engineering Management 51 (3) (2004) 322–3333. 6. Soffer, P., Golany, B. Dori, D.: Aligning an ERP System with Enterprise Requirements: An Object-Process Based Approach. Computers in Industry 56 (6) (2005) 639–662 7. Scheer, A.-W., Habermann, F.: Making ERP a Success. Communications of the ACM 43 (4) (2000) 57–61 8. OMG: Meta Object Facility (MOF) Specification, Version 1.4. (2002) 2-1–2-5. Downloadable at http://www.omg.org 9. SAP: Solution Composer: What is it? SAP AG (2005). Downloadable at http://www.sap.com/solutions/businessmaps/composer/index.epx (accessed oct. 2005) 10. SAP: Designed for Your Industry, Scaled to Your Business, Ready for Your Future. SAP AG (2004). Downloadable at http://www.sap.com/industries/index.epx (accessed dec. 2005) 11. Oracle: Oracle Industries Solutions. http://www.oracle.com/industries (accessed dec. 2005) 12. Soffer, P., Golany, B. Dori, D.: ERP modeling: a comprehensive approach. Information Systems 28 (6) (2003) 673-690 13. Petit, M. (ed.), Domeingts, G. (approval): Report on the State of the Art in Enterprise Modelling. UEML Deliverable D1.1 (2002). http://www.ueml.org 14. Tannenbaum, A.: Metadata Solutions: Using Metamodels, Repositories, XML and Enterprise Portals to Generate Information on Demand. Addison Wesley, Boston [etc.] (2002) 15. OMG: Unified Modeling Language (UML) Specification: Infrastructure, version 2.0, ptc/04-10-14. (2004) Downloadable at http://www.omg.org/ 16. Kovse, J., Weber, C., Härder, T.: Metaprogramming for Relational Databases. In: Atzeni, P., Chu, W.W., Lu, H., Zhou, S., Ling, T.W. (eds.): Conceptual Modeling - ER 2004. Lecture Notes in Computer Science, Vol. 3288, Springer-Verlag, Berlin Heidelberg New York (2004) 654–667
Static Analysis Based Software Architecture Recovery Jiang Guo, Yuehong Liao, and Raj Pamula Department of Computer Science, California State University Los Angeles, Los Angeles, California, USA {jguo, yliao2, rpamula}@calstatela.edu
Abstract. Recover the software architectures is a key step in the reengineering legacy (procedural) programs into an object-oriented platform. Identifying, extracting and reengineering software architectures that implement abstractions within existing systems is a promising cost-effective way to create reusable assets and reengineer legacy systems. We introduce a new approach to recover software architectures in legacy systems. The approach described in this paper concentrate especially on how to find software architectures and on how to establish the relationships of the identified software components. This paper summarizes our experiences with using computer-supported methods to facilitate the reuse of the software architectures of the legacy systems by recovering the behavior of the systems using systematic methods, and illustrate their use in the context of the Janus System.
1
Introduction
Software industry has a lot of legacy programs needing reengineering and modernization. One modernization approach is to convert old (procedural) programs into an object-oriented platform to make them easier to understand, maintain and reuse. One important purpose of the software reengineering is to develop large software systems built on several legacy systems to make use of the partial or full functionalities of these legacy systems [1]. A critical issue for reengineering the legacy systems is the identifying, extracting, modeling and analysis of software architectures. Software architecture is the nature of interactions among the software components. The software architecture concerns the design of the gross structure of a software system, including its overall behavior and its decomposition in simpler computational elements. An architectural description singles out the components from which a system is built, describing their functional behavior and providing a complete description of their interactions. The use of explicit descriptions of the architecture of software systems enhances system comprehension and promotes software reuse. Besides, software architecture helps verify the structural properties of the system to be developed. Modernizing the architecture of old software helps to gain control over maintenance cost and to improve system performance, while supporting the move to a distributed or more efficient environment [2]. Studies by Gall [3] also found that a M. Gavrilova et al. (Eds.): ICCSA 2006, LNCS 3982, pp. 974–983, 2006. c Springer-Verlag Berlin Heidelberg 2006
Static Analysis Based Software Architecture Recovery
975
re-architecturing of old procedural software to object-oriented architecture results in object-oriented software that helps reduce future maintenance cost, since modern maintenance technology can then be applied. Thus, when converting procedural programs into object-oriented ones, we must identify the software architecture and objects that are reusable from procedural code. In the software architecture identification phase, we need reengineering means. The methods described in this paper concentrate especially on how to find software architecture and objects (classes) and on how to establish the relationships of the identified classes. The essence of software reengineering is to improve or transform existing software so that it can be understand, controlled, and used anew. The need for software reengineering has increased greatly, as heritage software systems have become obsolescent in terms of the platforms on which they run, and their suitability and stability to support evolution to support changing needs [4]. Software reengineering is important for recovering and reusing existing software assets, putting high software maintenance costs under control, and establishing a base for future software evolution [5]. Identifying, extracting and reengineering software architecture and components that implement abstractions within existing systems is a promising cost-effective way to create reusable assets and reengineer legacy systems [6]. The research was motivated by the need for better techniques for the extraction and utilization of desirable functionality of an existing system for reengineering, reuse, and maintenance. We present a new program slicing process for identifying and extracting software architectures implementing functional abstractions. Once the slicing criterion has been identified the slice is isolated using algorithms based on dependence graphs. Both symbolic execution and program slicing are performed by exploiting the Data Flow Graph (DFG) and Control Flow Graph (CFG), a fine-grained dependence based program representation that can be used for software architecture recovery tasks. The work described in this paper is aiming to explore reverse engineering and reengineering techniques for reusing software architecture from existing systems.
2
Related Work
Developing complex software systems requires a description of the structure or structures, which comprise software components, the externally visible properties of those components, and the relationships among them [7]. Such a description, called software architecture, also is basic for further engineering activities concerning reuse, maintenance, and evolution of existing software components and systems. Architecture recovery refers to all techniques and processes used to abstract a higher-level representation (i.e., software architecture) from available information such as existing artifacts (e.g., source code, profiling information, design documentation) and expert knowledge (e.g., software architects, maintainers) [8]. Basically this means the extraction of those building blocks that constitute architectural properties and finally the software architecture. From point of this view we think of architectural styles and patterns that are inherent in almost any design and thus are primary objectives for architecture recovery [9].
976
J. Guo, Y. Liao, and R. Pamula
Architecture recovery has received considerable attention recently and various frameworks, techniques and tools have been developed [10]. Basically, existing knowledge, obtained from experts and design documents, and various tools are mandatory to solve the problem. Hence, a common idea is to integrate several tools in architecture workbenches such as Dali [11]. In this a variety of lexical-based, parser-based and profiling-based tools are used to examine a system and extract static and dynamic views to be stored in a repository. Analyses of these views are supported by visualization and specific analysis tools. They enable an interaction with experts to control the recovery process until the software architecture is reconstructed. Concerning architecture reconstruction much work has been on techniques that combine bottom-up and topdown approaches. Fiutem et al. describe such an approach [12]. They use reverse engineering tools to extract source models (e.g., Abstract Syntax Tree) and top-down they apply queries to extract expected patterns. They use a hierarchical architectural model that drives the application of a set of recognizers. Each recognizer works on the Abstract Syntax Tree (AST) and is related to a specific level of the architectural model. They produce different abstract views of the source code that describe some architectural aspects of the system and are represented by hierarchical architectural graphs. Harris et al. outline a framework that integrates reverse engineering technology and architectural style representations [13]. In bottom-up recovery the birds eye view is used to display the file structure and file components of the system, and to reorganize information into more meaningful clusters. Top-down style definitions place an expectation on what will be found in the software system. These expectations are specified by recognition queries that are then applied to an extracted AST. Each recognized style provides a view of the system and the collection of these views partially recovers the overall design of the software system. Guo et al. outline an iterative and semi-automatic architecture recovery method called ARM [14]. ARM supports patterns at various abstraction levels and uses lower-level patterns to build higher-level patterns and also composite patterns. In this way the approach aims particularly at systems that have been developed using design patterns whose implementations have not eroded over time. Another approach which uses source models and queries as basic inputs for architecture recovery is introduced by Sartipi et al. and called Alborz [15]. The problem is viewed as approximate graph matching problem whereas the extracted source models and defined queries are represented as attributed relational graphs.
3
Slicing-Based Static Analysis
We start the reengineering process by extracting the requirements and detailed design information from the source code and existing documents. It is the process of analyzing a subject system to identify the system’s components and their interrelationships and create software architecture representations of the system in another form or at a higher level of abstraction and expressed using data flow and control flow diagrams.
Static Analysis Based Software Architecture Recovery
977
The goal of source information extraction is to enable the recovery of the software architecture. We use statically extracting and dynamically extracting. A significant quantity of information may be extracted from the static artifacts of software systems, such as source code, makefiles, and design models, using techniques that include parsing and lexical analysis. We initiated the software architecture recovery process with a preprocessing step that restructures code. We built on the theory that unstructured code can be written using only D-structures [16] and relied on existing algorithms for that purpose [17]. Our research within this phase involves the use of program slicing techniques for isolating code fragments implementing functional abstractions. Program slicing is a static analysis technique that extracts all statements relevant to the computation of a given variable. This is accomplished by using data-flow analysis to analyze the program source code without the need to actually execute the program. Program slicing, an application of data-flow analysis, can be used to transform a large program into a smaller one containing only those statements relevant to the computation of a given variable. For example someone is interested in how the value for the out parameter r2 is computed in the following program: procedure myproc(a, b, c: integer; r1, r2: out integer) v, w : integer := 0; begin for i in 1..a loop w := w + b; end loop; for i in 1..c loop v := i*b + c + v; end loop; r1 := w; r2 := v*2; end myproc; All we need to know is the following program slice which is obviously easier to understand: procedure myproc( ... b, c: integer; ... r2: out integer) v, ... : integer := 0; begin ... for i in 1..c loop v := i*b + c + v; end loop; ... r2 := v*2; end myproc;
978
J. Guo, Y. Liao, and R. Pamula
A Program Slice is defined as follows: Given a syntactically correct source program P , in some programming language, and a slicing criterion C =< L, V >. Where L is a location in the program and V is a variable in the program. S is a slice of program P for criterion C if (1) S is derived from P by deleting statements from P , (2) S is syntactically correct, and (3) for all executions of P and S, in any given execution of P and of S with the same inputs, the value of V in the execution of slice S just before control reaches location L is the same as the value of V in program P just before control reaches location L. The function of the slicing criterion is to specify the program variable that is of interest along with a location in the program where the value of the variable is desired. Our program slicing tool constructs program slices from the control structure of the program and the pattern of assignment and reference to variables by backward chaining from the slicing criterion to the beginning of the program. The following definitions are helpful in understanding how program slices are constructed. Defs(n): The set of variables defined (assigned to) at statement n. Refs(n): The set of variables referenced at statement n. Reqs(n): A set of statements that is included in a slice along with statement n. The set is used to specify control statements (e.g., if or while) enclosing statement n or other characters that are syntactically part of statement n but are not contiguous with the main group of characters comprising the statement. An algorithm for constructing program slices must locate all statements relevant to a given slicing criterion. The essence of a slicing algorithm is the following: starting with the statement specified in the slicing criterion, include each predecessor statement that assigns a value to any variable in the slicing criterion, generate a new slicing criterion for the predecessor by deleting the assigned variables from the original slicing criterion, and add any variables referenced by the predecessor. For expression statement n, a predecessor of statement m, the Defs(n) set and the slicing criterion determines if an expression statement is included in a slice. {n} S
Static Analysis Based Software Architecture Recovery
S<m,v> = {n}
⎛ ⎝
x∈Ref s(n)
⎞ S
⎛
⎝
979
⎞ S
(2)
y∈Ref s(k) k∈Reqs(n)
The result of the revised rule is to include the set of required statements for statement n, Reqs(n), whenever statement n is included in a slice. Where, x, y are referenced variables at statement n and k. k is a statement included in a slice along with statement n. From unions and intersections of slices, a slice-based model of program structure that has applications to program understanding tasks can be built. Program slicing has been used both as structural and specification driven method. As structural method, program slicing has been used to identify external user functionalities in large programs. The isolation of an internal domain dependent function can be driven by its formal specification. The specification can be used together with symbolic execution techniques to identify a suitable slicing criterion. Code segmentation is needed in order to reduce the granularity and thus the complexity of the remaining processes. We have defined a segmentation scheme that separates the code into modular units while also removing syntactic sugar features of the code. We have also defined heuristics to attach in-code documentation to the appropriate segment. For a program P the result is a set of segments, such that SG = {sg1 , sg2 , . . . sgn } and Pf = sgi , where 1 < i < n and Pf represents code that is identical in functionality to P and sgi is a segment. Following the segmentation, we defined dependency algorithms that analyze each sgi . Specific slicing algorithms that are modified forms of the slicing algorithms [18] are employed at the statement, construct, and block levels. These algorithms provide information on all variables, which is the start point of the software architecture recovery.
4
Software Architecture Recovery
Right now, most software architecture recovery and reengineering approaches have in common that they all take into account patterns to reconstruct the architecture of a software system. We regard software components (functions, modules, etc.) and their relations as the key elements of software systems residing in all levels of abstraction. Thereby we start software element and relation recognition from the lowest level (i.e., source level) and use hot-spots to stepwise abstract higher-level patterns. Hot-spots indicate patterns and are represented by meaningful source code structures (e.g., variables, functions, data structures, program structures). To detect such hot-spots in source code we apply extended string pattern matching which facilitates fast and effective queries. The important step in the transformation of the code and software architecture is to reconstruct the existing software architecture. This is done by extracting architecturally important features from its code and employing architectural reconstruction rules to aggregate the extracted (low-level) information into an
980
J. Guo, Y. Liao, and R. Pamula
architectural representation [19]. This process is to extract the systems architecture, we performed a static extraction of the source code to identify function-call and built-from relations. The former identifies function calls within the system. The latter identifies how the executables in the system are built from the source files. In addition, we performed dynamic extraction to determine the run-time connectivity of the system as a whole. Based on the static analysis of the program slicing and the results collected from running the instrumented system, we can reconstruct the run-time connectivity of the architecture. We can construct a control-dependence graph (CDG). The CDG, in its most basic form, is a Directed Acyclic Graph (DAG) that has program predicates as its root and internal nodes, and nonpredicates as its leaves. Nodes in CDG may be basic blocks and statements. A leaf is executed during the execution of the program if the predicates on the path leading to it in the controldependence graph are satisfied. More specifically, let G =< N, E > be a flow graph for a procedure. A node m postdominates node n, written m pdom n, if and only if every path from n to Exit passes through m. Then node n is controldependent on node m if and only if (1) There exists a control-flow path from m to n such that every node in the path other than m is postdominates by n and (2) n does not postdominates m. The results of the restructuring, segmentation, and dependency steps are segment design representations and a global design representation. These representations include traditional methods, such as call graphs, structure charts, and hierarchical diagrams and other less conventional representations such as variable usage and state change descriptions (such as state-chart). These representations serve as input to perform object identification and to create formal specifications of object behavior. Results of our work include methods that recover the design information at varying levels of granularity, expressible in numerous forms from both data and functional viewpoints. The data and control dependency representations are the basis for our software architecture recovery research. Once the architecture has been extracted, we can view the components and connectors of the system as possessing architecturally significant properties. We listed the possible features of architectural elements (both components and connectors), and are divided into information that can be derived from a temporal perspective and information that can be derived from a static perspective of an architectural element. With the extracted information, we can reconstruct the software architecture with the relation partition algebra. A set is a collection of objects, called elements or members. If x is an element of S, given any object x and set S, we write x ∈ S. The notion of set and the relation is-element-of are the primitive concepts of set theory. A finite set can be specified explicitly by enumerating its elements. We use set to represent the software architecture components of a system. For example, Subsystems = {OS, Drivers, DB, Application} F unctions = {main, a, b, c, d} InitF unctions = {f | f ∈ F unctions ∧ f is called at initialization time }
Static Analysis Based Software Architecture Recovery
981
and we also have InitF unctions ⊆ F unctions. Relationships between software components play an important role in software architecture and design. Binary relations can express such relationships. For example, function-calls within a system can be seen as the binary relation named calls, such as calls(main, a) is an abstraction of the main program main calls a function a . Then, we can represent a software architecture in a directed graph. A directed graph consists of a set of elements, called vertices, and a set of ordered pairs of these elements, called arcs.
5
Case Study and Verification
We have explored software architecture recovery in the context of a case study that addresses the reengineering of the Janus System. Janus is a software-based war game that simulates ground battles between up to six adversaries. Janus is interactive in that command and control functions are entered by military analysts who decide what to do in crucial situations during simulated combat. The current version of Janus operates on a Hewlett Packard workstation and consists of a large number of FORTRAN modules, organized as a flat structure and interconnected with one another via FORTRAN COMMON blocks. This software structure makes modification of Janus very costly and error-prone. There is a need to modernize the Janus software into a maintainable and evolvable system and to take advantage of modern personal computers to make Janus more accessible to the Army. The first step in our process, system and requirements understanding, took the form of a series of brief meetings with the client, which also included a short demonstration of the current software system. Our goal was to gather as much information as we could about the currently existing system to aid in gaining a clearer understanding of its present functionality. Next, we proceeded to develop object models of the Janus System using the aforementioned materials and products, to create the modules and associations amongst them. This was probably the most difficult and most important step. It required a great deal of analysis and focus to transform the currently scattered sets of data and functions into small, coherent and realizable objects, each with its own attributes and operations. We used our approach to reuse the information extracted from the old system. The most important type of reuse was reuse of implicit domain models. We reused the domain analysis and knowledge since the domain was stable across the reengineering transformations. This greatly reduced the time and effort that needed be spent on domain related work, such as the analysis of the domain dependent functions. Second was reuse of software architecture. This kind of reuse included the user functionalities, functional abstraction, task flow, and user interface specifications. Third was the reuse of data models. The reuse of
982
J. Guo, Y. Liao, and R. Pamula
data models was very helpful to re-organize the data information although we needed to transform the old data structures into new data structures. Fourth was the reuse of algorithms. The code could not be reused directly because it had to be transformed into another language (Ada). However, the main algorithms were the same - we did not need to redesign the algorithms, we just rewrote them in new languages. Based on the feedback of our object models from the domain experts, the reengineering team revised the object models for the Janus core elements and developed a 3-tier object-oriented architecture for the Janus System. The new architecture of Janus uses an explicit priority queue of event objects to schedule the simulation events.
6
Conclusion
Discover the objects and software architecture from the legacy systems is the essence of software reengineering. The focus of the reengineering effort was to abstractly capture the systems functionality and then produce system models that would most accurately represent that functionality, while factoring out independent concerns and aspects that were likely to change. Large systems are divided into subsystems. These subsystems, also known as components and objects, and the dependencies that exist among the components and objects form the different layers within a software architecture. In this paper, we propose an approach to extract the software architecture based on the programming slicing and parameters analysis and dependencies between the objects based on the relation partition algebra. Our future work will focus on automating the approach of extracting reusable software components from legacy systems and deriving the software architecture based on the dependencies of the components.
References 1. Favre, J., Sanlaville, R., Continuous Discovery of Software Architecture in a Large Evolving Company, Workshop on Software Architecture Reconstruction, the Working Conference on Reverse Engineering (2002) 2. Hall, P., Architecture-driven Component Reuse, Information and Software Technology, Vol. 41, Issue 14 (1999) 3. Gall, H., R. Klsch, R. Mittermeir, Application Patterns in Reengineering: Identifying and Using Reusable Concepts, Proceedings of the 5th International Conference on Information Processing and Management of Uncertainty in Knowledge-Based Systems, Granada, Spain (1996) 4. Ali, F., Du, W., Toward reuse of object-oriented software design models. Information and Software Technology, Vol. 46 Issue 8 (2004) 5. Hakala, K., Hautamki, J., Koskimies, K., Annotating Reusable Software Architectures with Specialization Patterns Proceedings of the Working IEEE/IFIP Conference on Software Architecture (2001)
Static Analysis Based Software Architecture Recovery
983
6. Guo, J., Towards Semi-Automatically Reusing Objects from Legacy Systems, International Journal of Computers and Their Applications, Vol. 11, No. 3 (2004) 7. Goseva-Popstojanova, K., Trivedi, K., Architecture-Based Approaches to Software Reliability Prediction, Computers and Mathematics with Applications, Vol. 46 (2003) 8. Svetinovic, D., Godfrey, M., A Lightweight Architecture Recovery Process, Software Architecture Recovery and Modelling, Stuttgart, Germany (2001) 9. Ali-Babar, M., Zhu, L., Jeffery, R., A Framework for Classifying and Comparing Software Architecture Evaluation Methods,” Australian Software Engineering Conference, Melbourne (2004) 10. Pinzger, M., Gall, H., Pattern-Supported Architecture Recovery, Proceedings of 10th International Workshop on Program Comprehension, IEEE Computer Society Press, Paris, France (2002) 11. Kazman, R., Carriere, S., View Extraction and View Fusion in Architectural Understanding, Prococeedings of the 5th International Conference on Software Reuse, Victoria, BC, Canada (1998) 12. Fiutem, R., Tonella, A., Antoniol, G., Merlo, E., A Clich-based Environment to Support Architectural Reverse Engineering, Proceedings of the International Conference on Software Maintenance, Monterey, California (1996) 13. Harris, D., Reubenstein, H., Yeh, A., Reverse Engineering to the Architectural Level, Proceedings of the 17th International Conference on Software Engineering, Seattle, Washington (1995) 14. Guo, G., Atlee, J., Kazman, R., A Software Architecture Reconstruction method, Proceedings of the 1st Working IFIP Conference on Software Architecture, San Antonio, Texas, (1999) 15. Sartipi, K., Kontogiannis, K., Mavaddat, F., A Pattern Matching Framework for Software Architecture Recovery and Restructuring, Proceedings of the 8th International Workshop on Program Comprehension, Limerick, Ireland (2000) 16. Dijsktra, E., A Discipline of Programming, Prentice Hall (1976) 17. Boehm, C., Jacopini, G., Flow Diagrams, Turing Machines, and Languages with only Two Formation Rules, Communications of the ACM, Vol. 9 No. 5 (1966) 18. Atkinson, D., Griswold, W., Implementation Techniques for Efficient Data-flow Analysis of Large Programs, Proceedings of the International Conference on Software Maintenance (2001) 19. Carrire, S., Woods, S., Kazman, R., Software Architectural Transformation, Proceedings of 6th Working Conference on Reverse Engineering, Atlanta, Georgia (1999)
A First Approach to a Data Quality Model for Web Portals Angelica Caro1, Coral Calero2, Ismael Caballero2, and Mario Piattini2 1 Universidad del Bio Bio, Departamento de Auditoria e Informática, La Castilla s/n, Chillán, Chile [email protected] 2 ALARCOS Research Group, Information Systems and Technologies Department, UCLM-Soluziona Research and Development Institute, University of Castilla-La Mancha, Paseo de la Universidad, 4 – 13071 Ciudad Real, Spain {Coral.Calero, Ismael.Caballero, Mario.Piattini}@uclm.es
Abstract. The technological advances and the use of the internet have favoured the appearance of a great diversity of web applications, among them Web Portals. Through them, organizations develop their businesses in a really competitive environment. A decisive factor for this competitiveness is the assurance of data quality. In the last years, several research works on Web Data Quality have been developed. However, there is a lack of specific proposals for web portals data quality. Our aim is to develop a data quality model for web portals focused on three aspects: data quality expectations of data consumer, the software functionality of web portals and the web data quality attributes recompiled from a literature review. In this paper, we will present the first version of our model.
1 Introduction In the last years, a growing interest in the subject of Data Quality (DQ) or Information Quality (IQ) has been generated because of the increase of interconnectivity of data producers and data consumers mainly due to the development of the internet and web technologies. The DQ/IQ is often defined as “fitness for use”, i.e., the ability of a data collection to meet user requirements [1, 2]. Data Quality is a multi-dimensional concept [2], and in the DQ/IQ literature several frameworks providing categories and dimensions as a way of facing DQ/IQ problems can be found. Research on DQ/IQ started in the context of information systems [1, 3] and it has been extended to contexts such as cooperative systems [4-6], data warehouses [7, 8] or electronic commerce [9, 10], among others. Due to the characteristics of web applications and their differences from the traditional information systems, the community of researchers has recently started to deal with the subject of DQ/IQ on the web [11]. However, there are not works on DQ/IQ specifically developed for web portals. As the literature shows that DQ/IQ is very dependent on the context, we have centred our work on the definition of a Data Quality Model for web portals. To do so, we have used some works developed for differM. Gavrilova et al. (Eds.): ICCSA 2006, LNCS 3982, pp. 984 – 993, 2006. © Springer-Verlag Berlin Heidelberg 2006
A First Approach to a Data Quality Model for Web Portals
985
ent contexts on the web but that can be partially applied or adapted to our particular context. For example, we have used the work of Yang et al., (2004) where a quality framework for web portals is proposed including data quality as a part of it. As the concept of “fitness for use” is widely adopted in the literature (emphasizing the importance of taking into consideration the consumer viewpoint of quality), we have also considered, for the definition of our model, the data consumer viewpoint. First, we have combined the data quality expectations of data consumers with the software functionality of web portals. From the resultant matrix (data consumer expectations x functionalities), we have determined which web data quality attributes, recompiled in a literature review, can be applied. The structure of this paper is as follows. In section 2, the components of our model are presented. In section 3, we will deeply describe the first version of our DQ/IQ Web Portal Model. Finally, in section 4 we will conclude with our general remarks and future work.
2 Model Components Web Portals are emerging Internet-based applications that enable access to different sources (providers) through a single interface [12]. The primary objective of a portal software solution is to create a working environment where users can easily navigate in order to find the information they specifically need to perform their operational or strategic functions quickly as well as to make decisions [13], being responsibility of web portals’ owners the achievement and maintenance of a high information quality state [14]. In this section, we will present the three basic aspects considerated to define our DQ/IQ model for web portals: the DQ/IQ attributes defined in the web context, the data consumer expectations about data quality, and web portals functionalities. 2.1 Data Consumer Expectations When data management is conceptualized as a production process [1], we can identify three important roles in this process: (1) data producers (who generate data), (2) data custodians (who provide and manage resources for processing and storing data), and (3) data consumers (who access and use data for their tasks). As in the context of web-based information systems, roles (1) and (2) can be developed by the same entity [11], for web portals context we identify two roles in the data management process: (1) data producers-custodians, and (2) data consumers. So far, except for few works in DQ/IQ area, like [1, 2, 15, 16], most of the works on the subject have looked at quality from the data producer-custodian perspective. This perspective of quality differs from this in two important ways [15]: • •
Data consumer has no control over the quality of available data. The aim of consumers is to find data that match their personal needs, rather than provide data that meet the needs of others.
Our proposal of a DQ/IQ model for web portals considers the data quality expectations of data consumer because, at the end, it is the consumer who will judge whether a data is fitted for use or not [16].
986
A. Caro et al.
We will use the quality expectations of the data consumer on the Internet, proposed in [17]. These expectations are organized into six categories: Privacy, Content, Quality of values, Presentation, Improvement, and Commitment. 2.2 Web Portal Functionalities A web portal is a system of data manufacturing where we can distinguish the two roles established in the previous subsection. Web portals present basic software functionalities to data consumer deploying their tasks and under our perspective, the consumer judges data by using the application functionalities. So, we used the web portal software functions that Collins proposes in [13] considering them as basics in our model. These functions are as follows: Data Points and Integration, Taxonomy, Search Capabilities, Help Features, Content Management, Process and Action, Collaboration and Communication, Personalization, Presentation, Administration, and Security. Behind these functions, the web portal encapsulates the producer-custodian role. 2.3 Web Data Quality Review By using a DQ/IQ framework, organizations are able to define a model for data, to identify relevant quality attributes, to analyze attributes within both current and future contexts, to provide a guide to improve DQ/IQ and to solve data quality problems [18]. In the literature, we have found some proposals oriented to DQ/IQ on the web. Among them, we can highlight those showed in table 1. Related to such proposals, we can conclude that there is no agreement concerning either the set of attributes or, in several cases, their meaning. This situation, probably, is a consequence of the different domains and author’s focus of the studied works. However, from this revision we captured several data quality attributes. The most considered are (we present between brackets different terms used for the same concept): Accuracy (Accurate), in 60% of the works; Completeness, in 50% of the works and Timeliness (Timely), in 40% of the works; Concise (Concise representation), Consistent (Consistent representation), Currency (Current), Interpretability, Relevance, Secure (Security), in 30% of the studies. Accessibility (Accessible), Amount of data (Appropriate amount of information), Availability, Credibility, Objectivity, Reputation, Source Reliability, Traceability (Traceable), Value added are stated in 20% of the works. Finally, Applicable, Clear, Comprehensive, Confidentiality, Content, Convenient, Correct, Customer Support, Degree of Duplicates, Degree of Granularity, Documentation, Understand ability (Ease of understanding), Expiration, Flexibility, Freshness, Importance, Information value, Maintainable, Novelty, Ontology, Pre-decision availability, Price, Reliability, Response time, Layout and design, Uniqueness, Validity, and Verifiability are only studied in 10 % of the works. Summarizing the above-mentioned attributes, by means of similarity in their names and definitions, we have obtained a set of 28 attributes. Based on these DQ/IQ attributes we will try to identify which ones are applicable to the web portals context by classifying them into the matrix construed by the previous aspects (data consumer expectations x functionalities).
A First Approach to a Data Quality Model for Web Portals
987
Table 1. Summary of web DQ/IQ framework in the literature
Author [19] [20] [10] [9]
Domain Personal web sites Data integration e-commerce e-commerce
[21]
Web information systems (data evolution)
[6] [22]
e-service cooperative Decision making
[23] [11] [24] [25]
Web sites DQ on the web Web sites Organizational networks
[26] [27]
Data integration Web information portals
Framework structure 4 categories and 7 constructors 3 classes and 22 of quality criterion 7 stages to modelling DQ problems 4 categories associated with 3 categories of data user requirements. 4 categories, 7 activities of DQ design and architecture to DQ management. 8 dimensions 8 dimensions and 12 aspects related to (providers/consumers) 4 dimensions and 16 attributes 5 dimensions 5 categories and 10 sub-categories 6 stages to DQ analysis with several dimensions associated with each one 2 factors and 4 metrics 2 dimensions
3 Relationships Between the Components of the Model Based on the previous background, we will determine the relationship between the web portal functionalities and the quality expectations of data consumers. Then, we will present the definition of each function according to [13] and we will show their relationships (see figure 2). • Data Points and Integration. They provide the ability to access information from a wide range of internal and external information sources and display the resulting information at the single point-of-access desktop. The expectations applied to this functionality are: Content (Consumers need a description of portal areas covered, use of published data, etc.), Presentation (formats, language, and others are very important for easy interpretation) and Improvement (users want to participate with their opinions in the portal improvements knowing the result of applying them). • Taxonomy. It provides information context (including the organization-specific categories that reflect and support organization’s business), we consider that the expectations of data consumer are: Content (consumers need a description of which data are published and how they should be used, easy-to-understand definitions of every important term, etc.), Presentation (formats and language in the taxonomy are very important for easy interpretation, users should expect to find instructions when reading the data), and Improvement (user should expect to convey his/her comments on data in the taxonomy and know the result of improvements). • Search Capabilities. It provides several services for web portal users and needs searches across the enterprise, World Wide Web, and search engine catalogs and
988
•
•
•
•
•
•
A. Caro et al.
indexes. The expectations applied to this functionality are: Quality of values (Data consumer should expect that the result of searches is correct, current and complete), Presentation (formats and language are important for consumers, for the search and for easy interpretation of results) and Improvement (consumer should expect to convey his/her comments on data in the taxonomy and know the result of improvements). Help Features. They provide help when using the web portal. The expectations applied to this functionality are: Presentation (formats, language, and others are very important for easy interpretation of help texts) and Commitment (consumer should be easily able to ask and obtain answer to any question regarding the proper use or meaning of data, update schedules, etc.). Content Management. This function supports content creation, authorization, and inclusion in (or exclusion from) web portal collections. The expectations applied to this functionality are: Privacy (it should exist privacy policy for all consumers to manage, to access sources and to guarantee web portals data), Content (consumers need a description of data collections, that all data needed for an intended use are provided, etc.), Quality of values (consumer should expect that all data values are correct, current and complete, unless otherwise stated), Presentation (formats and language should be appropriate for easy interpretation), Improvement (consumer should expect to convey his/her comments on contents and their management and know the result of the improvements) and Commitment (consumer should be easily able to ask and have any question regarding the proper use or meaning of data, update schedules, etc. answered). Process and Action. This function enables the web portal user to initiate and participate in a business process of portal owner. The expectations applied to this functionality are: Privacy (Data consumer should expect that there is a privacy policy to manage the data about the business on the portal), Content ( Consumers should expect to find descriptions about the data published for the processes and actions, appropriate and inappropriate uses, that all data needed for the process and actions are provided, etc.), Quality of values (that all data associated to this function are correct, current and complete, unless otherwise stated), Presentation (formats, language, and others are very important for properly interpret data), Improvement (consumer should expect to convey his/her comments on contents and their management and know the result of improvements) and Commitment (consumer should be easily able to ask and to obtain answer to any questions regarding the proper use or meaning of data in a process or action, etc.). Collaboration and Communication. This function facilitates discussion, locating innovative ideas, and recognizing resourceful solutions. The expectations applied to this functionality are: Privacy (consumer should expect privacy policy for all consumers that participate in activities of this function), and Commitment (consumer should be easily able to ask and have any questions regarding the proper use or meaning of data for the collaboration and/or communication, etc). Personalization. This is a critical component to create a working environment that is organized and configured specifically to each user. The expectations applied to this functionality are: Privacy (consumer should expect privacy and security about their personalization data, profile, etc.), and Quality of values ( data about user profile should be correct, current). Presentation. It provides both the knowledge desktop and the visual experience to the web portal user that encapsulates all of the portal’s functionality. The expectations
A First Approach to a Data Quality Model for Web Portals
989
applied to this functionality are: Content (the presentation of a web portal should include data about covered areas , appropriate and inappropriate uses, definitions, information about the sources, etc.), Quality of values (the data of this function should be correct, current and complete.), Presentation (formats, language, and others are very important for easy interpretation and appropriate use of portals data.) and Improvement (consumer should expect to convey his/her comments on contents and their management and know the result of the improvements). • Administration. This function provides service for deploying maintenance activities or tasks associated with the web portal system. The expectations applied to this functionality are: Privacy ( Data consumers need security for data about the portal administration) and Quality of values (Data about tasks or activities of administration should be correct and complete). • Security. It provides a description of the levels of access that each user or groups of users are allowed for each portal application and software function included in the web portal. The expectations applied to this functionality are: Privacy (consumer need privacy policy about the data of the levels of access of data consumers.), Quality of values (data about the levels of access should be correct and current.) and Presentation (data about security should be in format and language for easy interpretation).
Fig. 2. Matrix stating the relationships between data consumer expectations and web portal functionalities
Concerning the relationships established in the matrix of figure 2, we can remark that Presentation is the category of data consumer expectation with more relations. This perfectly fits with the main goal of any web applications, which is to be useful and user-friendly for any kind of user. The next step is to fill in each cell of the matrix with Web DQ/IQ attributes obtained from the study presented in 2.3. As a result of this, we have a subset of DQ/IQ attributes that can be used in a web portal to evaluate data quality. In table 2, we will show the most relevant attributes for each category of data consumer expectations. To validate and complete this assignation we plan to work with portal data consumers through surveys and questionnaires. Once the validation is finished, we will reorganize the attributes obtaining the final version of the DQ/IQ web portal model.
990
A. Caro et al.
Table 2. Web Data Quality attributes applied to web portal functionalities in each category
Category of Data Consumer Web portal functionalities related to each Expectations category Web DQ/IQ attributes applying almost one functionality in each category Content management, Process and actions, Privacy Collaboration and Communication, Personalization, Administration, Security Security Data Points and Integration, Taxonomy, ConContent tent management, Process and actions, Presentation Accessibility, Currency, Amount of data, Understandability, Relevance, Concise Representation, Validity, Traceability, Completeness, Reliability, Credibility, Timeliness, Availability, Documentation, Specialization, Interpretability, Easy to use Data Points and Integration, Search CapabiliQuality of data ties, Content management, Process and actions, Personalization, Presentation, Security Accessibility, Currency, Amount of data, Credibility, Understandability, Accuracy, Expiration, Novelty, Relevance, Validity, Concise Representation, Completeness, Reliability, Availability, Documentation, Duplicity, Specialization, Interpretability, Objectivity, Relevance, Reputation, Traceability, Utility, Value-added, Easy to use Data Points and Integration, Taxonomy, Search Capabilities, Help Features, Content management, Process and actions, Collaboration and Communication, Presentation, Administration, Security Amount of data, Completeness, Understandability, Easy to use, Concise Representation, Consistent Representation, Validity, Relevance, Interpretability, User support, Availability, Specialization, Flexibility Data Points and Integration, Taxonomy, Search Improvement Capabilities, Content management, Process and actions, Presentation Accessibility, Reliability, Credibility, Understandability, User support, Traceability Help Features, Content management, Process and Commitment actions Accessibility, Reliability, User support, Presentation
4 Validation of the Model In order to valid our model we plan to elaborate a survey to check the DQ/IQ attributes identified as relevant to the web portals. We will use the Principles of Survey Research proposed in [28] where is said that a survey is part of a larger process and recognize that it is not just the instrument for gathering information. In this work the authors identify ten activities in the survey process. At this moment we are developing the first activities in our survey process. In particular setting specific and measurable objectives (in our case this phase consists in
A First Approach to a Data Quality Model for Web Portals
991
checking the DQ/IQ attributes identified as relevant to the web portals and in obtaining other than were not considered), planning and scheduling the survey, ensuring that appropriate resources are available and designing the data collection instrument. As survey design we have selected the descriptive design because we try to describe a phenomenon of interest [29] (in our case is to describe the DQ/IQ attributes more relevant for web portal data consumers). We plan to made a questionnaire for each one of the web portal functionalities presented previously. As it is quite impossible to survey the entire population [29], we are developing a web application to be linked in a web portal (www.castillalamancha.es). In that way, the users connected to this portal will be invited to answer some questions (selected randomly between the eleven questionnaires). So, each questionnaire will be constructed for each subject of the survey with the aim of having a correct distribution in the amount of answers given to each question. The application will have three modules: an administrator module (through it the researcher can generate the questionnaires deciding the number of questions, the type of answer, etc.), an analyzer module (that shows the results: statistics, graphics, ranking of responses, etc.), and a gather module (that presents the questions to the users). So, we will ask each subject about general demographic questions (as the expertise in the use of portals, expertise in technologies, range of age, sex, etc.) together with thirty questions selected randomly from all questions in the eleven questionnaires. When we have enough responses for each question in our questionnaires we will analyze the responses for obtaining a minimum and necessary set of DQ/IQ attributes for each aspect of our model. This set of attributes will be used in order to elaborate a complete framework for evaluating the DQ/IQ of a web portal. For example, we plan to give the minimum value necessary for each attribute for assuring the DQ/IQ quality. If this value is not achieved for some of the attributes, the framework will give some corrective actions applicable in order to have the correct level of quality.
5 Conclusions The great majority of works found in the literature show that data quality or information quality is very dependent on the context. The increase of the interest in the development of web applications has implied either the appearance of new proposals of frameworks, methodologies and evaluation methods of DQ/IQ or the adaptation of the already-existing ones from other contexts. However, in the web portal context, data quality frameworks do not exist. In this paper, we have presented a proposal that combines three aspects: (1) a set of web data quality attributes resulting from a data quality literature survey that can be applicable and useful for a web portal, (2) the data quality expectations of data consumer on the Internet, and (3) the basic functionalities for a web portal. These aspects have been related by obtaining a first set of data quality attributes for the different data consumer expectations X functionalities. Our future work, now in progress, consists of validating and refining this model. First of all, it is necessary to check these DQ/IQ attributes with data consumers in a web portal, for this we plan to make a survey as was presented in the previous section.
992
A. Caro et al.
Then, once we have validated the model, we will define a framework including the necessary elements to evaluate a DQ/IQ in a web portal. Our aim is to obtain a flexible framework where the data consumer can select the attributes used to evaluate the quality of data in a web portal, depending on the existing functionalities and their personal data quality expectations.
Acknowledgements This research is part of the following projects: CALIPO (TIC2003-07804-C05-03) supported by the Dirección General de Investigación of the Ministerio de Ciencia y Tecnología (Spain) and DIMENSIONS (PBC-05-012-1) supported by FEDER and by the “Consejería de Educación y Ciencia, Junta de Comunidades de Castilla-La Mancha” (Spain) and CALIPSO (TIN2005-24055-E).
References 1. Strong, D., Y. Lee, and R. Wang, Data Quality in Context. Communications of the ACM, 1997. Vol. 40, Nº 5: p. 103 -110. 2. Cappiello, C., C. Francalanci, and B. Pernici. Data quality assessment from the user´s perspective. in International Workshop on Information Quality in Information Systems, (IQIS2004). 2004. Paris, Francia: ACM. 3. Lee, Y., AIMQ: a methodology for information quality assessment. Information and Management. Elsevier Science, 2002: p. 133-146. 4. Winkler, W., Methods for evaluating and creating data quality. Information Systems, 2004. Nº 29: p. 531-550. 5. Marchetti, C., et al. Enabling Data Quality Notification in Cooperative Information Systems through a Web-service based Architecture. in Proceeding of the Fourth International Conference on Web Information Systems Engineering. 2003. 6. Fugini, M., et al., Data Quality in Cooperative Web Information Systems. 2002. 7. Zhu, Y. and A. Buchmann. Evaluating and Selecting Web Sources as external Information Resources of a Data Warehouse. in Proceeding of the 3rd International Conference on Web Information Systems Engineering. 2002. 8. Bouzeghoub, M. and Z. Kedad, Quality in Data Warehousing, in Information and Database Quality, M. Piattini, C. Calero, and M. Genero, Editors. 2001, Kluwer Academic Publishers. 9. Katerattanakul, P. and K. Siau, Information quality in internet commerce desing, in Information and Database Quality, M. Piattini, C. Calero, and M. Genero, Editors. 2001, Kluwer Academic Publishers. 10. Aboelmeged, M. A Soft System Perspective on Information Quality in Electronic Commerce. in Proceeding of the Fifth Conference on Information Quality. 2000. 11. Gertz, M., et al., Report on the Dagstuhl Seminar "Data Quality on the Web". SIGMOD Record, 2004. vol. 33, Nº 1: p. 127-132. 12. Mahdavi, M., J. Shepherd, and B. Benatallah. A Collaborative Approach for Caching Dynamic Data in Portal Applications. in Proceedings of the fifteenth conference on Australian database. 2004. 13. Collins, H., Corporate Portal Definition and Features. 2001: AMACOM.
A First Approach to a Data Quality Model for Web Portals
993
14. Kopcso, D., L. Pipino, and W. Rybolt. The Assesment of Web Site Quality. in Proceeding of the Fifth International Conference on Information Quality. 2000. 15. Burgess, M., N. Fiddian, and W. Gray. Quality Measures and The Information Consumer. in IQ2004. 2004. 16. Wang, R. and D. Strong, Beyond Accuracy: What Data Quality Means to Data Consumer. Journal of Management Information Systems, 1996. 12(4): p. 5-33. 17. Redman, T., Data Quality: The Field Guide. 2001: Digital Press. 18. Kerr, K. and T. Norris. The Development of a Healthcare Data Quality Framework and Strategy. in IQ2004. 2004. 19. Katerattanakul, P. and K. Siau. Measuring Information Quality of Web Sites: Development of an Instrument. in Proceeding of the 20th International Conference on Information System. 1999. 20. Naumann, F. and C. Rolker. Assesment Methods for Information Quality Criteria. in Proceeding of the Fifth International Conference on Information Quality. 2000. 21. Pernici, B. and M. Scannapieco. Data Quality in Web Information Systems. in Proceeding of the 21st International Conference on Conceptual Modeling. 2002. 22. Graefe, G. Incredible Information on the Internet: Biased Information Provision and a Lack of Credibility as a Cause of Insufficient Information Quality. in Proceeding of the Eighth International Conference on Information Quality. 2003. 23. Eppler, M., R. Algesheimer, and M. Dimpfel. Quality Criteria of Content-Driven Websites and Their Influence on Customer Satisfaction and Loyalty: An Empirical Test of an Information Quality Framework. in Proceeding of the Eighth International Conference on Information Quality. 2003. 24. Moustakis, V., et al. Website Quality Assesment Criteria. in Proceeding of the Ninth International Conference on Information Quality. 2004. 25. Melkas, H. Analyzing Information Quality in Virtual service Networks with Qualitative Interview Data. in Proceeding of the Ninth International Conference on Information Quality. 2004. 26. Bouzeghoub, M. and V. Peralta. A Framework for Analysis of data Freshness. in, IQIS2004. 2004. Paris, France: ACM. 27. Yang, Z., et al., Development and validation of an instrument to measure user perceived service quality of information presenting Web portals. Information and Ma-nagement. Elsevier Science, 2004. 42: p. 575-589. 28. Pfleeger, S. and B. Kitchenham, Principles of Survey Research Part1: Turning Le-mons into Lemonade. Software Engineering Notes, 2001. 26(6): p. 16-18. 29. Pfleeger, S. and B. Kitchenham, Principles of Survey Research Part2: Designing a Survey. Software Engineering Notes, 2002. 27(1): p. 18-20.
Design for Environment-Friendly Product Hak-Soo Mok1 , Jong-Rae Cho2 , and Kwang-Sup Moon1 1
2
Department of Industrial Engineering Pusan National University, Busan 609-735, Korea [email protected], [email protected] R&D Division for Hyundai Motor Company & Kia Motors Corporation, Gyunggi-Do 445-706, Korea [email protected]
Abstract. This paper presents an approach for systematically integrating design for x (DFX) into environmentally conscious product design. Evaluation methods and algorithms for design for assembly, design for disassembly, and design for recycling are proposed to estimate the assembly times and the disassembly times. By these results could be evaluated the assemblability, disassemblability, and recyclability of parts. Finally, a new integrated DFX system is implemented to help designers of product analyze environment-friendly products during the design process.
1
Introduction
In recent years, environmental issues have been becoming increasingly important to product designers and manufacturers (Holloway, 1998). Thus, environmentally conscious design (Eco-design) methodologies should be developed and integrated for a product’s entire life cycle. Generally, Eco-design includes a life cycle assessment (LCA) and design for x (DFX) including design for assembly (DFA), design for disassembly (DFD), and design for recycling (DFR). Figure 1 outlines the research and development methodologies of Eco-design. In the DFA and DFD modules, new methodologies are presented using process data and geometrical data. To evaluate the recyclability of parts, the outputs of DFD and LCA and a cost analysis are used in the DFR module.
Fig. 1. Scope of Research M. Gavrilova et al. (Eds.): ICCSA 2006, LNCS 3982, pp. 994–1003, 2006. c Springer-Verlag Berlin Heidelberg 2006
Design for Environment-Friendly Product
2
995
Design for Assembly and Disassembly
Assembly and disassembly are necessary and critical processes for the end-oflife (EOL) of a product, because they have an effect on recycling operations (Redford., 1994). This paper proposes a new DFA methodology that can estimate the standard assembly time and evaluate the assemblability of parts. And, to facilitate their disassembly of parts for recycling, this study attempts to develop a design for disassembly evaluation metrics to be used when designing new products. 2.1
Procedures for DFA and DFD
This study suggests three main decision factors - handling, access and insertion - as the criteria for the analysis of ssembly processes. The influencing factors associated with these three decision factors are determined in order to estimate assembly time and to evaluate assemblability. The decision factors for the disassembly process are fixing, grasping, access, disassembly, and handling. Finally, the total assembly and disassembly time and assemblability and disassemblability are estimated by the sum of the time and the disassemblability scores of the decision factors. The overall procedure for DFA and DFD is shown in Figure 2.
Fig. 2. Procedures for DFA and DFD
996
2.2
H.-S. Mok, J.-R. Cho, and K.-S. Moon
Time and Score Tables for Decision Factors
Table 1 shows an example of the decision and influencing factors and their respective levels for assembly. To obtain tables that include the time and score of each decision factor for assembly or disassembly, the influencing factors are divided into levels. And then the weights numbers in the brackets with the decision and the influencing factors are calculated by the AHP (Analytic Hierarchy Process) (Saaty, 1986), and difficulty scores are given to each level. This paper subdivided the difficulty scores into 5 sets: 1 (Good), 3 (Not bad), 5 (Not good), 7 (Bad), and 9 (Absolutely bad). The 1 to 9 scaling has proven to be an acceptable scale and is recommended for use in the AHP (Harker et al., 1987). The times for assembly and disassembly are estimated through a motion analysis of the assembly and disassembly operations. This paper mixed MTM (Methods Time Measurement) and the WF (Work factor) in the motion analysis. MTM has many distinct advantages, but because it is not sufficient to describe all of the assembly and disassembly processes, we revised it according to the WF.
Table 1. Decision and Influencing Factors and Their Levels for Assembly
Figure 3 shows the process for obtaining the assembly or disassembly time table, including the influencing factors, through motion analysis. When the levels of factor 4 and factor 5 are changed to 4’ and 5”, the assembly or disassembly time is extended by penalty times 14 and 30 TMU, respectively. The relationship between the penalty time and the change of the level of factor can be presented as
Design for Environment-Friendly Product
T1,2,3,4 ,5 = T1,2,3,4,5 + P T1,2,3,4→4 ,5 + P T1,2,3,4 ,5→5 = T1,2,3,4,5 + P T1,2,3,4→4 ,5 + P T1,2,3,4,5→5 + α
997
(1)
where T1,2,3,4 ,5 is the assembly or disassembly time when influencing factors 1, 2, 3, 4 and 5 are changed to 1, 2, 3, 4’, and 5”, and P T1,2,3,4→4 ,5 represents the penalty time when influencing factor 4 is changed to 4’. α denotes the added
Fig. 3. Calculation Process for Time and Scores in Tables Table 2. Time and Score Tables of Tying, Painting and Snapping
998
H.-S. Mok, J.-R. Cho, and K.-S. Moon
penalty time due to the changing levels. In Fig. 3, α is 8 TMU. The scores of tables can be calculated by equation (2). S=
n m
wi wj sj
(2)
i=1 j=1
where wi and wj represent the weights of the decision and influencing factors I and j respectively, and sj is the difficulty score of influencing factor j. Table 2 shows an example of time and score tables for the assembly method: tying, painting, and snapping. Time and score tables can be obtained for each decision factor of assembly and disassembly. However, because the influencing factors are different according to the assembly and disassembly methods, the time and score tables for decision factor, insertion and disassembly are obtained for each assembly and disassembly method.
3
Design for Recycling
It is currently widely acknowledged that the most ecologically sound way to treat a worn-out product is recycling (Mildenberger, 2000). This paper proposes a systematic approach in which a product is designed to evaluate its recyclability. All DFX areas including DFA, DFD and LCA are considered, and a methodology is suggested for integrating these areas with DFR. 3.1
LCA and Cost Analysis
LCA is a methodology for evaluating the environmental effects occurring throughout the entire life cycle of a product, process, or activity (Krikke, 1999). In this environmental support system, environmental information such as the environmental potential of materials and environmental impact, is given over to the LCA stage of the assembly and disassembly processes. Through the LCA, environmental impacts such as resource depletion, global warming, ozone depletion, photochemical oxidation creation, acidification, and others, are assessed. Generally, a cost analysis is achieved by a comparison of cost and profit. In this paper, the cost analysis for recycling is defined as CR = CC − CD − CM + C
(3)
where CR represents the recycling cost, and CC , CD , CM , and CS are the collection, disassembly, maintenance, and selling costs, respectively. 3.2
Evaluation of Recyclability of Parts
In this paper, the recyclability of parts is computed based on equation (4), as follows. SRecyclability = aSDisassemblability + bSLCA + cSCost (4)
Design for Environment-Friendly Product
999
Fig. 4. Normalization Process for Integration
SDisassemblability represents the disassemblability score already evaluated in the DFD module, SLCA is the score of the environmental impact potential of parts computed by the LCA, and SCost is the score determined by the cost analysis for recycling. The values a, b, and c are the weights that will be computed by the AHP. The recyclability can be considered as an integrated evaluation of technical information such as disassemblability, environmental information such as environmental impacts by LCA, and economical information such as cost data by cost analysis. The environmental impacts and cost values of each part should be normalized to obtain the final recyclablility of parts because these are different from disassemblability. Figure 4 shows the normalizing process of environmental impacts and cost data. The criteria that are used to obtain the environmental impacts such as resource depletion, global warming, ozone depletion, acidification, and others, are re-established and their weights are obtained. The values of these environmental impacts are divided into levels and scores are assigned to each level. Eventually, the weights and scores result in the normalized environmental impact values.
4
Development of DFX System
There is a need for such a DFX system that enables designers to effectively analyze the ease of assembly, disassembly, and recycling of the products. This paper presents a new approach to DFA, DFD, and DFR to help designers to estimate the assembly and disassembly times and to evaluate the assemblability, disassemblability, and recyclability.
1000
4.1
H.-S. Mok, J.-R. Cho, and K.-S. Moon
DFA and DFD Systems
Because the assembly time and assemblability are calculated by the three decision factors as the criteria for the evaluation of assembly, the DFA system needs three main input forms: handling, access, and insertion (see Figure 5).
Fig. 5. System Concept for DFA
The DFD system is similar to the DFA system. However, five input processes fixing, grasping, access, disassembly, and handling - are analized to get the disassembly time and disasseblability of parts. In the DFA and DFD modules, there are four output forms: assembly time, assemblability, disassembly time, and disassemblability. Figure 6 shows the output forms of assembly time and disassemblability and the score, of parts. Figure 6 shows the input forms of decision factor insertion and disassembly used to determine the assembly and disassembly time and the assemblability and disassemblability. As mentioned before, there are three and five decision factors with their influencing factors and levels. In the case of insertion, a new frame that includes the influencing factors is shown when the assembly method is chosen. 4.2
DFR System
The developed DFR system guides its users through three main input modules, LCA, DFD, and cost analysis. The results for the disassemblability of parts that are obtained in the DFD module, the environmental impacts by LCA, and the cost values by cost analysis for recycling are normalized to obtain the recyclability of parts (see Figure 7). The output form of the recyclability of parts is shown in Figure 8. The normalized values of environmental impact, cost, and disassemblability are given in a table.
Design for Environment-Friendly Product
1001
Fig. 6. Input Forms of DFA and DFD
Fig. 7. Output Forms for Assembly Time and Disassembly Difficulty Score
5
Conclusions
This paper has demonstrated a new approach for DFA, DFD, and DFR, and has shown how these DFX methodologies can be integrated into environmentally conscious design. This paper has also presented a new integrated DFX system to help designers to estimate the assembly and disassembly times, and to evaluate the assemblability, disassemblability, and recyclability. Following the
1002
H.-S. Mok, J.-R. Cho, and K.-S. Moon
Fig. 8. System Concept for DFR
Fig. 9. Output Form for Recyclability
above approach, companies will be able to design products that have low environmental impacts. Our future work will focus on generating redesign alternatives for environmentally friendly products.
Design for Environment-Friendly Product
1003
Acknowledgment. This work was supported by Cleaner Production Technology Development of Ministry of Commerce, Industry and Energy and Brain Busan 21 Project in 2004.
References 1 Harker, P. et al., ”The Theory of Ratio Scale Estimation: Saaty’s Analytic Hierarchy Process”, Management Science, Vol. 33, pp.1383-1402, 1987. 2 Holloway, L., ”Materials Selection for Optimal Environmental Impact in Mechanical Design”, Materials and Design, Vol. 19, pp.133-143, 1998. 3 Krikke, H. et al., ”Business Case Roteb: Recovery Strategies for Monitors”, Computers & Industrial Engineering, Vol. 36, pp.739-757, 1999. 4 Mildenberger, U., and Khare, A., ”Planning for an Environment-Friendly Car, Technovation”, Vol. 20, pp.201-214, 2000. 5 Redford, A. et al., Design for Assembly, McGraw-Hill, Inc., 1994. 6 Saaty, T., ”Axiomatic Foundation of the Analytic Hierarchy Process”, Management Science, Vol. 32, pp.841-855, 1986.
Performance of HECC Coprocessors Using Inversion-Free Formulae Thomas Wollinger1 , Guido Bertoni2 , Luca Breveglieri3, and Christof Paar4 1
2
Escrypt GmbH - Embedded Security - Bochum, Germany [email protected] STMicroelectronics - Advanced System Technology - Agrate B., Milano, Italy [email protected] 3 Politecnico di Milano, Italy [email protected] 4 Communication Security Group (COSY), Ruhr-Universitaet Bochum, Germany [email protected] Abstract. The HyperElliptic Curve Cryptosystem (HECC) was quite extensively studied during the recent years. In the open literature one can find results on how to improve the group operations of HECC as well as teh implementations for various types of processors. There have also been some efforts to implement HECC on hardware devices, like for instance FPGAs. Only one of these works, however, deals with the inversion-free formulae to compute the group operations of HECC. We present inversion-free group operations for the HEC y 2 + xy = 5 x + f1 x + f0 and we target characteristic-two fields. The reason is that of allowing a fair comparison with hardware architectures using the affine case presented in [BBWP04]. In the main part of the paper we use these results to investigate various hardware architectures for a HECC VLSI coprocessor. If area constraints are not considered, scalar multiplication can be performed in 19, 769 clock cycles using three field multipliers (of type D = 32), one field adder and one field squarer, where D indicates the digit-size of the multiplier. However, the optimal solution in terms of latency and area uses two multipliers (of type D = 4), one addition and one squaring. The main finding of the present contribution is that coprocessors based on the inversion-free formulae should be preferred compared to those using group operations containing inversion. This holds despite the fact that one field inversion in the affine HECC group operation is traded by up to 24 field multiplications in the inversion-free case. Keywords: Hyperelliptic curve cryptosystem, inversion-free formulae, hardware architecture, high performance explicit formulae, VLSI coprocessor.
1
Introduction
Koblitz and Miller proposed, independently from each other, Elliptic Curves (EC) for public-key cryptography. The generalizations of ECs are called
Accepted on 31 Jan 2005 for ISH 2005, published in 2006.
M. Gavrilova et al. (Eds.): ICCSA 2006, LNCS 3982, pp. 1004–1012, 2006. c Springer-Verlag Berlin Heidelberg 2006
Performance of HECC Coprocessors Using Inversion-Free Formulae
1005
HyperElliptic Curves (HEC) and were first proposed for cryptographical use in [Kob88]. It is important to point out that hyperelliptic curve cryptosystems are promising because of the short operand sizes, compared to other public key schemes. We introduce an analysis of the hardware implementation of the inversion-free HEC group operations, including a comparison with affine hardware architectures. Previously there have been some efforts for implementing a HECC coprocessor on FPGAs; however our contribution is the first one containing a thorough analysis of the different design options and a comparison between the two possible coordinate systems (affine and projective). Our analysis is based on the derived group operations and on the underlying field F283 . We investigated various hardware architectures by parallelizing the scalar multiplication of HECC at the following three levels: the field operation level, the group operation level and the scalar multiplication level. At the field operation level we used multipliers that handle different numbers of bits in parallel. At the group operation level we used different numbers of field multipliers. Our investigation of the parallelism reachable for the scalar multiplication level was based on the idea of overlapping the computation of consecutive group operations. The scalar multiplication can be performed most efficiently in 19, 769 clock cycles using a coprocessor design providing three field multipliers (of type D=32), one field adder and one field squarer. In order to find the optimal design for the HECC coprocessor, we considered speed as well as (silicon) area, namely the area-time product. The lowest area-time product was achieved by using two multipliers (of type D = 4), one adder and one squarer. Furthermore, we compared the results of this paper with those published for the group operations using inversion [BBWP04]. Thus we found out that the most efficient way to implement a HECC VLSI coprocessor is to use inversion-free formulae. If we compare the best coprocessor configuration for affine coordinates with that using inversion-free formulae, we see that a) the latency is better up to a factor of almost 3 and b) the area-time product figure for the inversionfree cases is always better. Hence, HECC coprocessors ought to be based on the inversion-free coordinate system. This result could be achieved even by trading one inversion in the affine group operation with up to 24 field multiplications in the inversion-free case. The paper is organized as follows. Section 2 presents our improved inversionfree HEC group operations and Section 3 describes the architecture of the HECC coprocessor as well as the different levels of parallelization. In Section 4 we present and discuss results. Conclusions list final considerations.
2
Inversion-Free Formulae
In the recent years a considerable effort has been focused on improving the group operations of the hyperelliptic curve cryptosystem (HECC), see [PWP04] for a summary. Analogously to EC, there also exist inversion-free group operations for HECC [MDM+ 02, Lan03]. Not having to use inversions might also be
1006
T. Wollinger et al.
profitable for HECC, however the overhead in multiplications is very high compared to the simpler ECC case. We investigated the needed field operations of the inversion-free group operations considering a genus-2 HEC y 2 + h(x)y = f (x), with h(x) = x and f (x) = x5 + f1 x + f0 . In addition, the characteristic of the underlying field was fixed to 2. We do not know of any security limitations using this kind of curves. The main reason of using these parameters, was to provide a fair comparison with the affine HECC coprocessor presented in [BBWP04]. Furthermore, we were able to reduce the number of finite field operations, see Table 1. Table 1. Efficient inversionfree group operations for genus-2 HEC coordinate system affine [Lan03]
projective
3
curve properties addition doubling h2 = 0, hi ∈ F2 f4 = 0 I+21M+3S I+17M+5S [?] h(x) = x f4 = f3 = f2 = 0 I+9M+6S [MDM+ 02] N.A. 67M 42M [Lan03] h2 = 0, hi ∈ F2 f4 = 0 47M + 4S 40M + 7S our work h(x) = x f4 = f3 = f2 = 0 45M + 5S 31M + 6S
Architecture of the HECC Coprocessor
Our coprocessor is designed according to a rather standard architecture, consisting of a 3-bus loop scheme that connects a set of functional units and a register file. A control unit drives the various function units and the register file. The register file stores temporary intermediate values and final results. The size of each register is the dimension of the field, namely 83 bits. The register file has two output ports to feed the operands to the function units and one input port to receive the result. The processor allows to load two field elements and store one field element at every clock cycle. This guarantees feasibility and ease of implementation. However, at any given clock cycle only one field operation can start. If the operation is unary, such as inversion, one input bus remains idle. In Table 2 we give the area and latency for each arithmetic function unit we used. The given estimates assume t 2-input gates and optimal field generator polynomials of type F (x) = xm + i=0 fi xi , where m−t ≥ D. One observes that the gate-consuming function units are the multipliers and the inverter, while in comparison the area used for the adder and the squarer is negligible. In the case of multipliers and inverter the numbers of XOR and AND gates are the same. These two considerations allow us to equal the costs of the XOR and AND gates, for the reason that, when comparing different system configurations, we do not want to use absolute gate sizes, but relative ones. We have assumed that the different components of the system work at the same frequency. Such an assumption might not be true. Usually, the frequency of this type of cryptosystem is dominated by the multiplier, and a smaller digit-size
Performance of HECC Coprocessors Using Inversion-Free Formulae
1007
Table 2. Components of the coprocessor: Area and time Area Add Sqr [OP00] Mul [SP97]
Gate count [m] [4(m − 1)]
[m] XOR [4(m − 1)] XOR [D · m] AND & [D · m] XOR [2Dm] Inv [BCH93] [6 · m + log2 m] AND & [6 · m + log2 m] XOR [2(6m + log2 m)]
Latency [clock cycles] 1 1 m/D 2·m
will yield to a higher clock frequency and thus will speed-up the whole system. For a correct estimation of the impact of the different clock frequencies, circuit synthesis is necessary. In our analysis we considered different levels of parallelization for the HECC coprocessor: 1. Parallelization at field operation level: we achieved it by varying the digit size of the multiplier. 2. Parallelization at group operation level: our architecture of the HEC coprocessor allows to change the number of used multipliers. 3. Parallelization at scalar multiplication level: we achieved it by overlapping two consecutive group operations. Taking into consideration all the different parallelization levels, we were able to identify the best architecture considering the execution time as well as the area requirements. We coded a software library that schedules the HECC scalar multiplication mapping it onto the proposed hardware architecture. This software tool applies the so-called Operation Scheduling [Gov03] procedure for parallelizing the sequence of group operations computing the HECC scalar multiplication, and it works according to the As Soon As Possible (ASAP) scheduling policy. We then constrain the scheduling policy by imposing limits on the available hardware resources (i.e. type and number of arithmetic function units) and realistic time delays required to execute the field operations. A more detailed description of the scheduling methodology can be found in [BBWP04].
4
Analysis and Results
The main contribution of this paper is to identify the optimal architecture for the HECC coprocessor. In order to do so we first need to investigate the different architectures for the inversion-free group operations. Afterwards our results are compared to those valid for affine coordinates, as they are presented in [BBWP04].
1008
T. Wollinger et al.
We target genus-2 HECs defined over a finite field F283 and we use the derived group operations represented by means of projective coordinates. As for the evaluation of the latency of the scalar multiplication operation (the so-called kD operation), we examined some average cases. Hence the 160-bit long integer k contains the same number of 0s and 1s. All our results are presented against different digit-sizes, we vary the number of multiplier units and we present the parallelism at the bit level and at the field operation level, respectively. The parallelism at the scalar multiplication level, namely the overlapping between two consecutive group operations, applies to the figures given for the scalar multiplication and the area-time product. We choose not to upper bound the number of used registers. All the considered system configurations require 21 registers for storing temporary values, where each register stores a field element of 83 bits. One could reduce the number of registers, at the cost of some additional latency, by avoiding the overlapping of two consecutive group operations, for example. 4.1
Latency
In this subsection we present our results targeting the latency of the complete scalar multiplication, see Table 3. Table 3. Latency of the scalar multiplication (in clock cycles, group order ≈ 2160 ) Digit-size 2 4 8 16 32
1 356,537 181,670 98,400 56,525 31,702
Number of multipliers 2 3 193,176 130,652 97,355 69,730 53,243 38,935 32,835 24,747 21,209 19,769
4 98,888 55,646 31,953 22,974 20,490
The conclusion one can draw from the table is that with increasing hardware resources (more multiplier units and higher digit size), latency drops. Thus we can compute the scalar multiplication of HECC in a shorter time. Scalar multiplication can be performed most efficiently in 19, 769 clock cycles by providing three field multipliers (of type D = 32), one field adder and one field squarer (Table 3, bottom row). Our scheduler is based on the method known as Operation Scheduling (and works according the ASAP scheduling policy), which does not grant an optimal solution [Gov03]; this is evident in the case of 4 multipliers of type D = 32 (Table 3, 5-th row). When carefully inspecting the results, we see that adding more hardware resources might be unnecessary.
Performance of HECC Coprocessors Using Inversion-Free Formulae
4.2
1009
Area-Time Product
An optimal implementation should achieve the maximum throughput and should consume the minimum area (contrary to some traditional cryptographic implementations, where only best time performances were taken into care). Hence we consider both the hardware requirements and the time constraints of the cryptographic application. Table 4. Normalized area-time product for the scalar multiplication Digit-size 2 4 8 16 32
1 1.2207 1.0367 1.0107 1.0967 1.1940
Number of 2 1.1024 1 1.0330 1.2367 1.5733
multipliers 3 4 1.0438 1.0157 1.0346 1.0796 1.1109 1.2034 1.3839 1.7043 2.1885 3.0167
In Table 4 we show the area-time product for the different design options. The analysis exhibits the area-time product normalized with respect to the lowest area-time product. Table 4 shows that the architecture using two multipliers (of type D = 4), one adder and one squarer, achieves the best area-time product and therefore is the optimal architecture. 4.3
Comparing Affine and Projective HECC
In this section we put our results into perspective with the HECC VLSI coprocessor implemented in affine coordinates, presented in [BBWP04]1 . This comparison is possible as the used methodology is the same, e.g. identical curve parameters. Figure 1 compares the latencies of the affine and projective HECC coprocessors; the lower bars identify the preferable coprocessor. One can draw the following conclusions: – In both coordinate systems latency drops by providing additional hardware resources (by increasing the digit-size and the number of multiplier units). – If the context allows to use projective coordinates, they are always preferable in terms of latency and area. However, affine coordinates might find an appropriate use in low-area applications. – One very important result in this contribution is that high-speed implementations definitely ought to use projective coordinates, as it can be seen in Figure 1, where projective coordinates using large digit-sizes result in the lowest latency. 1
Note, that it is not fair to compare our results with the previous work implementing HECC on FPGAs. The reason is the different architecture that is used.
1010
T. Wollinger et al. 5
4
4 D=2 D=4 D=8 D=16 D=32
3.5
3
3
2.5
2.5
2
1.5
1
1
0.5
0.5
1
2 3 4 Number of Multipliers
D=2 D=4 D=8 D=16 D=32
2
1.5
0
Projective Coordinates
x 10
3.5
Clock cycles
Clock cycles
5
Affine Coordinates
x 10
0
1
2 3 4 Number of Multipliers
Fig. 1. Latency comparison between affine and projective HECC coprocessors
If comparison between affine and projective coordinates is limited only to the scalar multiplication latency, then the projective processor is better than the affine one. The reason is that the design based on projective coordinates can calculate a scalar multiplication in 19, 769 clock cycles, while the affine one can not work faster than in 54, 593 clock cycles. This means a speed-up of 2.7, approximately. In Figure 2 we compare the five best Area-Time (AT) product figures using the two different coordinate systems. For each system configuration the areatime product and the latency are reported. Both these measures are normalized with respect to those of the system configuration with the best AT product. The left-most bars show the figures for affine coordinates and the five right-most bars show those for the projective case. The design option used for the HECC VLSI coprocessor is written at the bottom of each bar (M denotes the multipliers). It is interesting to analyze the implicit parallelism level achievable by using the projective formulae. By examining the best five area-time products, there appears to exist only one system configuration with one multiplier. In the case of the affine coordinates, we see that the two best solutions contain only one multiplier. Hence, potentially we cannot parallelize as many field operations. Another general statement one can draw form Figure 2 is that systems using projective coordinates are always better, in terms of the area-time product. We realize that there is a gap of about 35% in the area-time product between the affine and projective coordinates. Note that projective coordinates are not always better in terms of latency. The reason is the area-time product metric. Hence, the projective systems reaching a lower latency and a better AT product, compared to the affine ones, have
Performance of HECC Coprocessors Using Inversion-Free Formulae
1011
1,8 Area Time Product Latency
1,6 1,4 1,2 1 0,8 0,6 0,4 0,2 0 1M D=8
1M D=4
2M D=4 Affine
2M D=2
3M D=2
2M D=4
1M D=8
4M D=2
2M D=8
3M D=4
Projective
Fig. 2. Comparison of the best five area-time product figures using and an affine and a projective HECC coprocessor
smaller area requirements. However, if we look at the systems with similar areausage requirements, (i.e., if we compare the projective system using 3M and D = 4, with the affine one using 2M and D = 4), the AT product is better, as well as the latency and the area of the coprocessor based on projective coordinates. Latency is 17% better than in the affine case. Hence, the main finding of this contribution is that the projective coprocessors are more flexible than the affine ones, and thus allow the designer to choose the best compromise in terms of required latency and silicon area. If the application imposes to perform data conversion as well, the affine coprocessors become attractive for low-area requirements.
5
Conclusions
We presented newly derived explicit formulae for HECC using the inversion-free approach. Our formulae are better, up to 22.5%, than the best previous ones, and have been used as the basis of an extensive study of different architecture options for a HECC VLSI coprocessor. We analyzed the parallelization at field operation level, group operation level and scalar multiplication level. The analysis was carried out by using a HEC y 2 +h(x)y = f (x), with h(x) = x, f (x) = x5 +f1 x+f0 and a base field F283 . Comparing our results based on projective coordinates with those based on affine coordinates for HECC, we can draw the following conclusions: a) considering only the latency the coprocessors based on projective coordinates lead to a higher performance (latency is better up to a factor of almost 3), and b) considering the area-time product the projective coprocessor is preferable as well.
1012
T. Wollinger et al.
Hence, the group operations computed using inversion-free formulae ought to be the appropriate choice of any HECC VLSI hardware implementation.
References [BBWP04]
G. Bertoni, L. Breveglieri, T. Wollinger, and C. Paar. Finding optimum parallel coprocessor design for genus 2 hyperelliptic curve cryptosystems. In Information Technology: Coding and Computing, 2004. Proceedings. ITCC 2004. International Conference on, volume 2, pages 538–544. IEEE Computer Society, November 2004. [BCH93] H. Brunner, A. Curiger, and M. Hofstetter. On Computing Multiplicative Inverses in GF(2m ). IEEE Transactions on Computers, 42:1010–1015, August 1993. [Gov03] R. Govindaraian. Instruction scheduling. CRC Press, the compiler design handbook edition, 2003. [Kob88] N. Koblitz. A Family of Jacobians Suitable for Discrete Log Cryptosystems. In Shafi Goldwasser, editor, Advances in Cryptology - Crypto ’88, volume 403 of Lecture Notes in Computer Science, pages 94 – 99, Berlin, 1988. Springer-Verlag. [Lan03] T. Lange. Formulae for Arithmetic on Genus 2 Hyperelliptic Curves, 2003. Available at http://www.ruhr-uni-bochum.de/itsc/tanja/preprints.html. [MDM+ 02] Y. Miyamoto, H. Doi, K. Matsuo, J. Chao, and S. Tsuji. A Fast Addition Algorithm of Genus Two Hyperelliptic Curve. In The 2002 Symposium on Cryptography and Information Security — SCIS 2002, IEICE Japan, pages 497 – 502, 2002. in Japanese. [OP00] G. Orlando and C. Paar. A High-Performance Reconfigurable Elliptic Curve Processor for GF (2m ). In C ¸ . K. Ko¸c and C. Paar, editors, Cryptographic Hardware and Embedded Systems — CHES 2000, volume LNCS 1965. Springer-Verlag, 2000. [PWP04] J. Pelzl, T. Wollinger, and C. Paar. High performance arithmetic for special hyperelliptic curve cryptosystems of genus two. In Information Technology: Coding and Computing, 2004. Proceedings. ITCC 2004. International Conference on, volume 2, pages 513 – 517. IEEE Computer Society, November 2004. [SP97] L. Song and K. K. Parhi. Low-energy digit-serial/parallel finite field multipliers. Journal of VLSI Signal Processing Systems, 2(22):1–17, 1997.
Metrics of Password Management Policy Carlos Villarrubia, Eduardo Fernández-Medina, and Mario Piattini Alarcos Research Group, Information Systems and Technologies Department, UCLM-Soluziona Research and Development Institute, University of Castilla-La Mancha, Paseo de la Universidad, 4 – 13071 Ciudad Real, Spain {Carlos.Villarrubia, Eduardo.FdezMedina, Mario.Piattini}@uclm.es
Abstract. The necessity to management the computer security of an institution implies an evaluation phase and the most common method to carry out this evaluation it consists on the use of a set of metrics. As any system of information needs of an authentication mechanism being the most used one those based on password, in this article we propose a set of metric of password management policies based on the most outstanding factors in this authentication mechanism. Together with the metrics, we propose a quality indicator derived from these metrics that allows us to have a global vision of the quality of the password management policy used and a complete example of calculation of the proposed metric. Finally, we will indicate the future works to be performed to check the validity and usefulness of the proposed metrics. Keywords: Security management, assurance, metrics, passwords.
1 Introduction Information and its support processes together with systems and nets are important resources for any organization. These resources are continuously subjected to risks and insecurities coming from a great variety of sources, where there are threats based on malicious code, programming errors, human errors, sabotages or fires. This concern has encouraged many organizations and researchers to propose several metrics to evaluate security of their information systems. In general, there is a consensus regarding the fact that choosing these metrics depends on the concrete security need of each organization. The majority of performed proposals put forward methods to choose these metrics [1, 4, 19, 22, 26, 27]. In addition, sometimes, it is suggested the need of developing specific methodologies for each organization [7]. In any proposal, the need is to quantify the different security aspects to be able to understand, control, and improve confidence in the information system. If an organization does not use security metrics for its decision making process, the choices will be motivated by subjective aspects, external pressures and even purely commercial motivations. With the purpose of systematizing all these proposals, we have developed a classification outline of security metrics [29] where the proposed metrics in the existing literature have been included. In our work, we will conclude that the majority M. Gavrilova et al. (Eds.): ICCSA 2006, LNCS 3982, pp. 1013 – 1023, 2006. © Springer-Verlag Berlin Heidelberg 2006
1014
C. Villarrubia, E. Fernández-Medina, and M. Piattini
of proposed metrics are general. This class of metrics only measure generic actions related to security and in an indirect way, specific objectives such as confidentiality, integrity and availability. 1.1 Authentication Systems The use of an authentication system requires the integration of multiple elements; depending on the used techniques, it is necessary to use cryptography, medicine, psychology, systems analysis and protocol design. All authentication systems are designed to assure the identity of a participant to other participant and this process requires that the first participant demonstrates his identity according to any kind of information (knowledge evidence, possession evidence, and biological evidence). This authentication evidence can be a word or a password as it is used in the majority of operating systems and applications (knowledge evidence), a cryptographic card (possession evidence) or any biological characteristic of the individual to be authenticated and that is measured through a biometric device (biological evidence). Historically, the use of a mechanism based on passwords has been the most used method. The importance of this authentication mechanism has led to the elaboration of rules and recommendations of multiple levels [11, 12, 13, 14, 20, 21]. The fact that this method is very easy to use in all systems together with its low cost has motivated this acceptation [18]. Deficiencies of this method have been widely studied and measures have been proposed to limit these disadvantages [2, 9, 23]. In some designs, the main disadvantages are linked to the necessary confidence in users when dealing with passwords while in other occasions, these disadvantages are motivated by designs that assumed a secure environment (such as, intranets) and that have been used in other environments (for example, the Internet) [10]. All these problems should indicate that passwords are a mechanism to be replaced but the users’ acceptation of their use, their low cost together with the complexity and costs of the alternative methods guarantee their short and medium term continuance. In this paper, we will propose metrics and indicators related to the password management policy due to the lack of specific proposals in special relevant areas in information system security. In section 2, we will propose password management policy metrics justifying why they are necessary and classifying the proposed set according to several criteria. In section 3, we will put forward a classification according to levels of password management policies that allow organizations to know their current situation, to propose the relevant improvement and to relate comparisons between different institutions to know the best practices. Finally, we will present some of the obtained conclusions and a proposal of future work in this field.
2 Proposal of Password Management Metrics The methodology used to derive the password management metrics has consisted on a study of all the factors that intervene in the password management. For this purpose, it has been gathered of the existent literature these factors [2,3,9,13,18,20,21]. These metrics do not try to cover the whole problem but to capture the most representative problems. In this hypothesis, it is not included the use of passwords for the authentication between processes or hosts. On the contrary, it is only studied the participation of a
Metrics of Password Management Policy
1015
person as an entity to be authenticated. Multifactor authentication systems where one of the authentication mechanisms is a password are not included either. The definition of these metrics will be performed by defining the following aspects: • • •
• •
Name: Representative name of the metric. Description of the metric: Generally, it describes the name of the metric by indicating the method to calculate values. Life cycle phase: For a better understandability and analysis of measures, metrics are classified according to their role within the life cycle of passwords. • General: Those metrics that could be in two or more phases are included. • Assignment: All metrics related to the assigning of initial identificators and passwords to the users are included. • Storage: It contemplates the problem of storing passwords by the system. • Transmission: It includes the metrics related to the authentication protocols used by the user or the communication of the password to the user by the authentication system. • Use: Metrics that measure the way of use of the password by the user. • Renewal: Area of metrics related to the password modification. Scale: Set of values associated of this metric. Multivalue: Some of this metrics are susceptible of having several simultaneous measures. With this attribute it is indicated if the metric can have or not several simultaneous measures.
The names, description of the metrics, life cycle phase and multivalue that we have considered are as follows: Table 1. Password management metrics Name Users Training
Description Type of training received by users for dealing with and selecting, if it is the case, passwords. Group Password Existence of passwords used by a group of users or passwords necessary to access to resources that do not have an access control mechanism separated from the authentication mechanism. Action Register Type of register used by the information system to monitor the actions related to the password management. Alphabet Size Number of characters of the alphabet used for the creation of passwords valid in the system. Number of Different Number of classes which the alphabet is divided into Classes Demanded and that are required to determine a valid password. Minimum Length Number of minimum characters required for a valid password. Source Selection Set of agents that can be used to choose a password. Selection Restriction Set of restrictions that avoid that the selection source uses a password easy to be found out by third parties. User Identificator Class Type of user identificator used by the system. Predefined Users Treatment that predefined users receive from the system.
Phase
Mult.
General
Yes
General
No
General
Yes
Assignment
No
Assignment
No
Assignment
No
Assignment
No
Assignment
Yes
Assignment
No
Assignment
Yes
1016
C. Villarrubia, E. Fernández-Medina, and M. Piattini Table 1. (continued)
Storage Class
Way of passwords storage in the authentication system. Initial Communication Method of communication of the initial password or a re-assignation of the user by the authentication system. Net Transmission Mechanism of transmission used by the authentication protocol for the confidentiality and integrity of password. Input Visualization Method used by the system for the visualization of the password when it is required to the user. Maximum Number of Maximum number of failed attempts before the Erroneous Attempts authentication system makes a defense operation because of the risk of identity usurpation by a third party. Information about Use Group of mechanisms used by the authentication system to inform the user about the authentications performed in the past. Authentication Period Maximum time after which the access control asks for a user re-authentication. Block by User Procedures used to guarantee that users that were Cancellation legitimate in the past, are avoided to access the system. Minimum Life Time Minimum life time of a valid password. Maximum Life Time Maximum life time of a valid password. When this time goes by, the user is forced to change the password. Record Length Number of valid passwords used by the user in the past and that the system does not allow to reuse. Password Reassigning Procedure used to reactivate the credential of a user that does not remember his password.
Storage
Yes
Transmission
Yes
Transmission
No
Use
No
Use
No
Use
No
Use
No
Renewal
No
Renewal
No
Renewal
No
Renewal
No
Renewal
No
3 Indicator of Level of Security in the Password Management The definition of a set of metrics is not enough for an organization to be able to use them to manage the necessary changes in the field of those metrics. It is necessary to have information about the way of use and the impact of the values of the metrics on the system management. With this objective, we have proposed some pre-established values for each metric that facilitates its use. Except for some of them, these values are ordered according to a hierarchy, starting by a minimum value to a maximum one, passing through intermediate values in the majority of metrics. When an organization has a superior value in each metric, it will have a higher confidence in its authentication system. As a general principle of computer security, it is not generally adequate to increase the values in some metrics without a generalized increase in all of them. Taking this principle as an objective, it is proposed an indicator of quality of password management policy based on five levels. This proposal is based on the usefulness shown in the maturity models and in the metrics management programmes [5, 6, 8, 26]. These levels are structured from a minimum level (level 1) to a maximum level (level 5). The values required in each metric are defined in each level. In some of these metrics, it is also defined a recommended value for each level. These
Metrics of Password Management Policy
1017
recommendations have the purpose of providing the indicator with flexibility, making it possible to define the required values at the lowest possible measure in each level. When a metric has several values demanded in a level, this indicates that all those values should be had to consider that level has been reached. When in a metric it is demanded the same values in several levels, it is considered that the metric is in the higher level. Anyway, the character of having recommended in a value of a metric does not have influence in its level and only has meaning for the calculation of the value of the indicator of quality of password management like it is described later on this section. Finally, the value '+' it indicates that the value of that metric in that level is overcome because this metric have a bigger value that the one demanded or recommended for that level. Table 2 shows our analysis for each metric, considering the abovementioned levels. Table 2. Values of metric and associate level Users Training (Multivalue) No Training Information when the user registration is made Compulsory course Periodic course Group Password Existence of group passwords or access to resources passwords Unique existence of a group of administrators There are not group passwords Action Register (Multivalue) No action register Registration register Renewal and cancellation register Block and re-assignation register Alphabet Size Less than or equal to ten characters Between eleven and twenty-five characters Between twenty-six and fifty characters Between fifty-one and seventy-five characters More than seventy-five characters Number of Different Classes demanded One Two Three Four or more Minimum Length Less than or equal to four characters Between five and eight characters Between nine and twelve characters Between thirteen and sixteen characters Greater than sixteen characters
1 2 3
Oblig. Obligatory value. Rec.: Recommended value. +: Overcome value in this level.
Level 1 Oblig.1 Rec.2 Rec. +3 Level 1
Level 2
Level 3
Level 4
Level 5
Oblig. Rec. + Level 2
Oblig. Rec. + Level 3
Oblig. Oblig. Rec. Level 4
Oblig. Oblig. Oblig. Level 5
Oblig. + Level 2
Oblig. Rec. Level 3
Oblig. Level 4
Oblig. Level 5
Oblig. Rec. Rec. Level 2
Oblig. Oblig. Rec. Level 3
Oblig. Oblig. Oblig. Level 4
Oblig. Oblig. Oblig. Level 5
Oblig. Rec. + + Level 2
Oblig. Rec. Rec. Level 3
Oblig. Rec. Level 4
Oblig. Level 5
Oblig. + + Level 2
Oblig. + Level 3
Oblig. Rec. Level 4
Oblig. Level 5
Oblig. + + +
Oblig. + +
Oblig. +
Oblig.
Oblig. + + Level 1 Oblig. + + + Level 1 Oblig. + + + + Level 1 Oblig. + + + Level 1 Oblig. + + + +
1018
C. Villarrubia, E. Fernández-Medina, and M. Piattini Table 2. (continued) Source Selection
User System Selection Restriction (Multivalue) No restriction User information Keys combinations Dictionary password Variations of the previous ones User Identificator Class Public identificator Semi-public identificator Private identificator Predefined Users (Multivalue) No change Password change Identificator change Storage Class (Multivalue) Clear storage Irreversible storage Encrypted storage Initial Communication (Multivalue) Non-secure transmission Transmission with compulsory change of password Secure transmission Net Transmission Clear transmission Use of a challenge-response protocol Encrypted transmission Input Visualization Clear visualization Visualization of number of characters No visualization Maximum Number of Erroneous Authentication Attempts No limit Between eleven and fifty attempts Between four and ten attempts Less than or equal to three attempts Information about Use No information Information about the last use Authentication Period Work session Maximum of fifteen minutes inactivity Maximum of five minutes inactivity Block by User Cancellation Without an established method Periodic elimination (maximum of six months period) Time limit established during registration Minimum Life Time There is not minimum life time There is a minimum life time (equal to or greater than 1 day)
Level 1 Oblig. + Level 1
Level 2 Oblig. + Level 2
Level 3 Oblig. + Level 3
Level 4 Oblig. Rec. Level 4
Level 5 Oblig. Rec. Level 5
Oblig. + + + Level 1 Oblig. + + Level 1
Oblig. Rec. + + Level 2 Oblig. + + Level 2
Oblig. Rec. Rec. + Level 3 Oblig. Rec. + Level 3
Oblig. Oblig. Oblig. Rec. Level 4
Oblig. Oblig. Oblig. Oblig. Level 5
Oblig. Rec. Level 4
Oblig. Level 5
Oblig. + Level 1
Oblig. + Level 2
Oblig. Rec. Level 3
Oblig. Oblig. Level 4
Oblig. Oblig. Level 5
Oblig. + Level 1 Oblig. + + Level 1
Oblig. + Level 2
Rec. Oblig. Level 3
Rec. Oblig. Level 4
Rec. Oblig. Level 5
Oblig. + Level 2
Oblig. Rec. Level 3
Rec. Oblig. Level 4
Rec. Oblig. Level 5
Oblig. + Level 1
Oblig. + Level 2
Oblig. Rec. Level 3
Oblig. Level 4
Oblig. Level 5
Oblig. +
Oblig. +
Oblig. Rec.
Oblig.
Oblig.
Level 1
Level 2
Level 3
Level 4
Level 5
Oblig. + Level 3 Oblig. Rec. Level 3
Oblig. Rec. Level 4 Oblig. Rec. Level 4
Oblig. Level 5 Oblig. Rec. Level 5
Oblig. + Level 3
Oblig. Rec. Level 4
Oblig. Level 5
Oblig. + Level 2 Oblig.
Oblig. Rec. Level 3 Oblig.
Oblig. Level 4 Oblig.
Oblig. Level 5
+
Rec.
Rec.
Oblig.
Oblig. Rec. + + Level 1 Oblig. + Level 1 Oblig. + + Level 1 Oblig. Rec. + Level 1 Oblig. +
Oblig. Rec. + Level 2 Oblig. + Level 2 Oblig. + + Level 2
Metrics of Password Management Policy
1019
Table 2. (continued) Maximum Life Time Greater than twelve months Lower than or equal to twelve months Lower than or equal to six months Lower than or equal to three months Record Length One Lower than or equal to three Lower than or equal to ten Lower than or equal to twenty-five Greater than twenty-five Password Reassigning The previous password is reassigned A new password is assigned
Level 1 Oblig. + + + Level 1 Oblig. + + + + Level 1 Oblig. Rec.
Level 2
Level 3
Level 4
Level 5
Oblig. + + Level 2
Oblig. + Level 3
Oblig. Rec. Level 4
Oblig. Level 5
Oblig. + + + Level 2
Oblig. + + Level 3
Oblig. + Level 4
Oblig. Level 5
Oblig.
Oblig.
Oblig.
Oblig.
The calculation of the value of the indicator of quality of the password management policy requires them to be had as minimum the values of the metric ones with the requirement of obligatory, overcome or recommended. It is necessary to highlight that although the number of metric is twenty-two, the obtained values can be greater because several metric they can have several values simultaneously (for example, users training). The minimum number of values to reach the corresponding level is shown in the table 3. Table 3. Number of values in each level Level 1 2 3 4 5
Minimum number 22 22 23 28 30
3.1 Application of Metrics In this section a concrete case of application of metric is detailed. The used system of information has the following characteristic: the new user is informed the password management policy and in the maximum term of one month he receives a formation session where aspects of computer security are included. The election of password carries out it the user with the following restrictions: 8 minimum characters of an alphabet with discrimination between uppercase and lowercase and with a mixture of digits. In the communication of the initial password to the user puts under an obligation to this to a change of password and these they are stored encrypted and using a dispersion function to be irreversible. These characteristics together with others that are deduced from the table 4 allow us to obtain the following values for the metric.
1020
C. Villarrubia, E. Fernández-Medina, and M. Piattini Table 4. Values of each metric in the example
Metric: Value Level 1 Users Training: Information when the user Rec. registration is made Users Training: Compulsory course Rec. Group Password: Unique existence of a group of + administrators Action Register : Registration register + Alphabet Size : More than seventy-five characters + Number of Different Classes demanded: Two + Minimum Length: Between five and eight + characters Source Selection: User Oblig. Selection Restriction: User information Oblig. User Identificator Class: Public identificator Oblig. Predefined Users: Password change Oblig. Storage Class: Irreversible storage Oblig. Storage Class: Encrypted storage + Initial Communication: Transmission with + compulsory change of password Net Transmission: Encrypted transmission + Input Visualization: Visualization of number of Oblig. characters Maximum Number of Erroneous Authentication Rec. Attempts: Between eleven and fifty attempts Information about Use: No information Oblig. Authentication Period: Work session Oblig. Block by User Cancellation: Periodic elimination Rec. (maximum of six months period) Minimum Life Time: There is not minimum life Oblig. time Maximum Life Time: Lower than or equal to + twelve months Record Length: Lower than or equal to ten + Password Reassigning: A new password is Rec. assigned
Level 2
Level 3
Level 4
Level 5
Oblig.
Oblig.
Oblig.
Oblig.
Rec.
Rec.
Oblig.
Oblig.
Oblig.
Oblig.
Oblig. + Oblig.
Oblig. Rec.
Oblig. Rec.
Oblig. Oblig.
Oblig. Oblig. Oblig. Oblig. Oblig. +
Oblig. Oblig. Oblig. Oblig. Rec. Oblig.
Oblig. Oblig.
Oblig. Oblig.
Oblig. Rec. Oblig.
Oblig. Rec. Oblig.
Oblig.
Oblig.
Rec.
Rec.
+
Rec.
Oblig.
Oblig.
Oblig.
Oblig.
Oblig.
Oblig.
Oblig.
Oblig. Oblig. Oblig.
Oblig.
Oblig.
Oblig.
Oblig.
Oblig.
Oblig.
Oblig. +
Oblig.
Oblig.
Oblig.
Oblig.
Oblig.
With these measures the table 5 is obtained with a summary for level and for the obligatory, recommended or overcome character of each metric. The table 4 shows that the used password management policy has the levels 1 and 2 because has all the required values. However, to obtain the level 3 he has to improve in four metrics: number of different classes demanded, minimum length,
Table 5. Values for level in the example Total Obligatory value Recommended value Overcome value Total of values
Level 1 9 4 11 24
Level 2 18 2 4 24
Level 3 16 4
Level 4 13 3
Level 5 12 2
20
16
13
Metrics of Password Management Policy
1021
authentication period and maximum life time. Finally, to reach the level 4 he needs to improve in eight metrics and for the level 5 in ten metrics.
4 Conclusions and Future Work In this work, we have proposed a set of metric and an indicator of level of security in password management policy that they complete the objective of evaluating the authentication process through passwords. We have proposed twenty-two metrics grouped into six areas covering the whole cycle of password management. Due to the diversity of these metrics, where some of them have a potentially infinite value range (for instance, password length) and others have a very limited value range (for instance, password reassignation), the definition of metrics includes a limited set of values that simplifies the process of obtaining measures and the use of metrics for decision making. As a method of global valuation of the password management policy, it is proposed an indicator of quality whose range of values is formed by five levels. This indicator makes it possible to inform, in a single and comprehensible way, all actors involved in the organization security about the level of quality reached in an information system. It is included one application example, a supposition where the level of each metric one is obtained together with the indicator of level of security of the whole group of metric. In this supposition we show the simplicity in the orientation to the manager to direct their future actions. This proposal is made within the framework of a wider project of metrics definition that studies all security general areas. Nevertheless, in the area of identification and authentication, it is necessary to extend these metrics to the exploitation of the information system to complete the password management system. Furthermore, the majority of organizations have a diversity of information systems with different requirements as well as different authentication mechanisms. To obtain an overall vision, through a set of metrics, it is necessary to combine all this information in a coherent and useful way for the organization board of directors and technical staff. In this aspect, the proposed metrics must be completed with others taking into account these circumstances. Finally, we intend to be carried out like future works a study of the password management policy of a group of organizations selected to check the utility of the metric proposals, to validate the proposed group and to be a reference in best practices in this environment.
Acknowledgements This research is part of the DIMENSIONS projects, partially financed by the FEDER and the Consejería de Educación y Ciencia de la Junta de Comunidades de Castilla-La Mancha (PBC-05-012-1), CALIPO (TIC2003-07804-C05-03) and RETISTIC (TIC2002-12487-E) granted by the “Dirección General de Investigación del Ministerio de Ciencia y Tecnología” (Spain).
1022
C. Villarrubia, E. Fernández-Medina, and M. Piattini
References 1. ACSA, editor. Proceedings of the Workshop on Information Security System Scoring and Ranking, Williamsburg, Virginia, may 2001. 2. A. Adams, M. A. Sasse, and P. Lunt. Making passwords secure and usable. In Proceedings of Human Computer Interaction, Bristol, England, aug 1997. 3. M. Bishop. Comparing authentication techniques. In Proceedings of the Third Workshop con Computer Incident Handling, pp. 1–10, aug 1991. 4. P. Bouvier and R. Longeon. Le tableau de bord de la sécurité du système d’information. Sécurité Informatique, jun 2003. 5. Carnegie Mellon University, Pittsburgh, Pennsylvania. SSE-CMM Model Description Document, 3.0 edition, jun 2003. 6. D. A. Chapin and S. Akridge. How can security be measured? Information Systems Control Journal, 2:43–47, 2005. 7. C. Colado and A. Franco. Métricas de seguridad: una visión actualizada. SIC. Seguridad en Informática y Comunicaciones, 57:64–66, nov 2003. 8. Departament of the Air Force. AFI33-205. Information Protection Metrics and Measurements Program, aug 1997. 9. A. Halderman, B. Waters, and E. W. Felten. A convenient method for securely managing passwords. In Proceedings of the 14th International World Wide Web Conference, pp. 471–479, Chiba, Japan, may 2005. 10. ISO. ISO 7498-2. Open Systems Interconnection - Basic Reference Model - Part 2: Security Architecture, 1989. 11. ISO/IEC. ISO/IEC TR 13335-1. Guidelines for the Management of IT Security. Part I: Concepts and Models of IT Security, 1996. 12. ISO/IEC. ISO/IEC 15408. Evaluation Criteria for IT Security, dec 1999. 13. ISO/IEC. ISO/IEC 17799. Code of Practice for Information Security Management, 2000. 14. G. King. Best security practices: An overview. In Proceedings of the 23rd National Information Systems Security Conference, Baltimore, Maryland, oct 2000. NIST. 15. J. M. Marcelo. Seguridad de las Tecnologías de la Información, capítulo Identificación y Evaluación de Entidades en un Método AGR, pp. 69–103. AENOR, 2003. 16. W. L. McKnight. What is information assurance? CrossTalk. The Journal of Defense Software Engineering, pp. 4–6, jul 2002. 17. R. T. Mercuri. Analyzing security costs. CACM, 46(6):15–18, jun 2003. 18. R. Morris and K. Thompson. Password security: A case history. CACM, 22(11):594– 597, 1979. 19. F.Nielsen. Approaches of security metrics. Technical report, NIST-CSSPAB, jun 2000. 20. NIST. FIPS-112: Password Usage, may 1985. 21. NIST. FIPS-181: Automated Password Generator, oct 1993. 22. S. C. Payne. A guide to security metrics. Technical report, SANS Institute, jul 2001. 23. B. Pinkas and T. Sander. Securing passwords against dictionary attacks. In Proceedings of the ACM Computer and Security Conference (CSC’ 02), pp. 161–170, nov 2002. 24. G. Schuedel and B. Wood. Adversary work factor as a metric for information assurance. In Procedings of the New Security Paradigm Workshop, pp. 23–30, Ireland, sep 2000. 25. M. Swanson. Security self-assessment guide for information technology systems. Tech. Report NIST 800-26, National Institute of Standards and Technology, nov 2001. 26. M. Swanson, N. Bartol, J. Sabato, . J. Hash, and L. Graffo. Security metrics guide for information technology systems. Technical Report NIST 800-55, National Institute of Standards and Technology, jul 2003.
Metrics of Password Management Policy
1023
27. R. B. Vaughn, Jr., R. Henning, and A. Siraj. Information assurance measures and metrics – state of practice and proposed taxonomy. In Proceedings of the 36th Hawaii International Conference on Systems Sciences, 2003. 28. R. B. Vaughn, Jr., A. Siraj, and D. A. Dampier. Information security system rating and ranking. CrossTalk. The Journal of Defense Software Engineering, pp. 30–32, may 2002. 29. C. Villarrubia, E. Fernández-Medina, and M. Piattini. Towards a classification of security metrics. In Proceedings of the 2nd international workshop on security in information systems (WOSIS 2004), pp. 342–350, apr 2004.
Using UML Packages for Designing Secure Data Warehouses Rodolfo Villarroel1, Emilio Soler2, Eduardo Fernández-Medina3, Juan Trujillo4, and Mario Piattini3 1
Departamento de Computación e Informática. Catholic University of Maule, Avenida San Miguel 3605 Talca, Chile [email protected] 2 Departamento de Informática. University of Matanzas, Autopista de Varadero Km. 3. Matanzas, Cuba [email protected] 3 Alarcos Research Group. Information Systems and Technologies Department, UCLM-Soluziona Research and Development Institute, University of Castilla-La Mancha Paseo de la Universidad, 4 - 13071 Ciudad Real, Spain {Eduardo.FdezMedina, Mario.Piattini}@uclm.es 4 Departamento de Lenguajes y Sistemas Informáticos. University of Alicante, C/San Vicente S/N 03690 Alicante, Spain [email protected]
Abstract. Due to the sensitive data contained in Data Warehouses (DWs), it is essential to specify security measures from the early stages of the DWs design and enforce them. In this paper, we will present a UML profile to represent multidimensional and security aspects of our conceptual modeling. Our approach proposes the use of UML packages in order to group classes together into higher level units creating different levels of abstraction, and therefore, simplifying the final model. Furthermore, we present an extension of the relational model to consider security and audit measures represented in the conceptual modeling. To accomplish this, we based on the Relational Package of the Common Warehouse Metamodel (CWM) and extend it to properly represent all security and audit rules defined in the conceptual modeling of DWs. Finally, we will show an example to illustrate the applicability of our proposal.
1 Introduction Organizations depend increasingly on information systems, which rely upon databases and data warehouses (DWs), which need increasingly more quality and security. Indeed, the very survival of organizations depends on the correct management, security and confidentiality of information [2]. In fact, as some authors have remarked [1, 4], information security is a serious requirement which must be carefully considered, not as an isolated aspect, but as an element which turns up as an issue in all stages of the development lifecycle, from the requirement analysis to implementation and maintenance. As other authors point out [6, 9], even though most M. Gavrilova et al. (Eds.): ICCSA 2006, LNCS 3982, pp. 1024 – 1034, 2006. © Springer-Verlag Berlin Heidelberg 2006
Using UML Packages for Designing Secure Data Warehouses
1025
DWs are implemented into relational DBMS, security measures and access control models specified for transactional (relational) databases are not appropriate for DWs. The main reason is that the security measures for DWs must be defined on a multidimensional basis, since DW users query the DW in terms of facts, dimensions, classification hierarchy levels and so on. In MD modeling, information is structured into facts and dimensions. A fact represents interesting measures of a business process (sales, deliveries, etc.), whereas a dimension considers the context for analyzing a fact (product, customer, time, etc.). A high number of dimensions with their corresponding hierarchies, and a considerable number of facts sharing dimensions and classification hierarchies will lead to a very complex design, thereby increasing the difficulty in reading the modeled system. Therefore, a secure MD conceptual model should also provide techniques to avoid flat diagrams to simplify the final model. In this paper, we present a UML profile to represent MD and security aspects of our conceptual modeling. We propose the use of UML packages in order to group classes together into higher level units creating different levels of abstraction. Furthermore, we present an extension of the relational model, aligned with OMG, to consider security and audit measures represented in the conceptual modeling. The remainder of this paper is structured as follows. In Section 2 we summarize the main related work. In Section 3 we present the Common Warehouse Metamodel (CWM) and the four-layer architecture of OMG. In Section 4 we present the UML 2.0 profile for secure multidimensional modeling. In Section 5 we present an extension of the relational metamodel of CWM. In Section 6 we state an example to support the conceptual design of secure data warehouses using packages. Finally, in Section 7, we draw some conclusions and sketch our immediate future work.
2 Related Work In the past few years, several approaches have been proposed for representing the main multidimensional (MD) properties at the conceptual level [5, 11-13]. Nevertheless, none of these approaches for MD modeling, considers security to be an important issue in their conceptual models, so they do not solve the problems arising from this question in these kinds of systems. It is true that, in the relevant literature, we can find several initiatives for the inclusion of security in data warehouses [6, 9, 10]. However, none of them considers security aspects which incorporate all stages of the system development cycle, nor the introduction of security into MD design. The previous work presented in [8] introduced a Model Driven Architecture (MDA) oriented framework for the DW development, choosing the ROLAP (Relational On-Line Analytical Processing) like DBMS and the Platform Specific Model (PSM) is modeled by using the relational metamodel from the CWM. However, none security and audit measures can be modeled in this metamodel. To the best of our kinowledge, only our previous works [3, 14] sets the basis for providing a conceptual model for the design of secure DWs. In this paper, our previous works are refined, adapting our UML profile for a secure multidimensional modeling to the proposal for the MD modeling with UML package diagrams [7].
1026
R. Villarroel et al.
3 CWM and the Four Layer Architecture of OMG The standard OMG (Object Management Group) promotes the theory and practice of object-oriented technology in software development, based on the four-layer metamodel architecture. A model at one layer is used to specify models in the layer above. The four-layer architecture is shown in Table 1. Table 1. The four-layer architecture of OMG
Meta-level M3 M2 M1 M0
MOF Terms Meta-metamodel Metamodel, metadata Model, metadata Object, data
Examples The MOF model UML metamodel, CWM metamodel UML models, CWM metadata Modeled systems, data warehouse
The main purpose of the CWM is to enable easy interchange of warehouse and business intelligence metadata between warehouse tools, warehouse platforms and warehouse metadata repositories in distributed heterogeneous environments. CWM is organized in 21 separate packages which they were grouped into five stackable layers by means of similar roles (see Fig. 1).
Fig. 1. CWM metamodel layering and its packages
From the organization represented in Fig. 1, we will mainly focus our work (for a secure relational modeling of DWs) on the Resource layer and, more precisely, on the Relational package as a relational metamodel to describe that represent metadata of relational data resources.
4 A UML profile for Secure Multidimensional Modeling The goal of this UML profile is to be able to design a MD conceptual model, but classifying information at the same time, in order to define which properties the user has to have in order to be entitled to gain access to information. We can define, for each element of the model (fact class, dimension class, fact attribute, etc.), its security information, specifying a sequence of security levels, a set of user compartments and a set of user roles. We can also specify security constraints considering these security attributes. Our profile will be called SECDW (Secure Data Warehouses) and will be represented as a UML package. This profile will not only inherit all properties from
Using UML Packages for Designing Secure Data Warehouses
1027
the UML metamodel but it will also incorporate new data types, stereotypes, tagged values and constraints. In Fig. 2, a high-level view of our SECDW profile is provided.
Types SECDW
Types OCL SetType SequenceType
«import»
OclType
«import»
«profile» SECDW
Fig. 2. High level view of our SECDW profile
We have defined a package that includes all the stereotypes that will be necessary in our profile (see Fig. 3). «profile» SECDW Model
Class
Instance
«stereotype» SecureDW «stereotype» UserProfile
classes: Set(OclType) securityLevels: Sequence(Level) securityRoles: Role securityCompartments: Set(Compartment)
«stereotype» SecureClass Attributes: Set(OclType) securityLevels: Levels securityRoles: Set(Role) securityCompartments: Set(Compartment)
«stereotype» SecureInstance securityLevel: Level securityRoles: Set(Role) securityCompartments: Set(Compartment)
Property
«stereotype» SecureProperty
«stereotype» SFact
securityLevels: Levels securityRoles: Set(Role) securityCompartments: Set(Compartment)
«stereotype» SFA derivationRule: String
«stereotype» SOID
«stereotype» SDA derivationRule: String
«stereotype» SDimension isTime
«stereotype» SD derivationRule: String
«stereotype» SDD
Package
Constraint
«stereotype» SecurePackage
«stereotype» AuditRule
securityLevel: Level securityRoles: Set(Role) securityCompartments: Set(Compartment) «stereotype» SStarPackage
«stereotype» SDimensionPackage
«stereotype» SBase
logType: AccessAttempt
«stereotype» AuthorizationRule ExceptSign: {+,-} ExceptPrivilege: Privilege involvedClasses: Set(OclType)
«stereotype» SFactPackage
Fig. 3: New stereotypes
«stereotype» SecurityRule involvedClasses: Set(OclType)
1028
R. Villarroel et al.
This profile contains four types of stereotypes: • Secure Class, secure package and secure data warehouses stereotypes (and stereotypes inheriting information from them) that contain tagged values associated with attributes (model or class attributes), security levels, user roles and organizational compartments. • Attribute stereotypes (and stereotypes inheriting information from attributes) and instances, which have tagged values associated with security levels, user roles and organizational compartments. • Stereotypes that allow us to represent security constraints, authorization rules and audit rules. • UserProfile stereotype, which is necessary to specify constraints depending on particular information of a user or a group of users. 4.1 Using UML Packages for Secure Multidimensional Modeling In our approach, the main structural properties of MD models are specified by means of a UML class diagram in which the information is clearly separated into facts and dimensions. Our approach proposes the use of UML packages in order to group classes together into higher level units creating different levels of abstraction, and therefore, simplifying the final model. The different levels show how one package can be further exploded by defining its corresponding elements into the next level as we describe as follows (see Fig. 4): • Level 1: Model definition. A package represents a star schema of a conceptual MD model. • Level 2: Star schema definition. A package represents a fact or a dimension of a star schema. • Level 3: Dimension/fact definition. A package is exploded into a set of classes that represent the hierarchy levels defined in a dimension package, or the whole star schema in the case of the fact package.
Fig. 4. Levels of a MD model explosion using packages
Next, we will present the metamodel of our OO conceptual MD approach using a UML class diagram. In order to simplify this diagram, we have divided it into three levels. In Fig. 5, the content of metamodel level1 package is shown. This
Using UML Packages for Designing Secure Data Warehouses
1029
package specifies the modeling elements that can be applied at the metamodel level 1 of our approach. At this level, only the StarPackage model element is allowed.
Fig. 5. Metamodel: level 1
In Fig. 6, we will show the content of metamodel level2 package. SecurityRule + involvedClasses 0..*
AuthorizationRule + exceptSign + ExceptPrivilege +involvedClasses
AuditRule +logType +logInfo 0..*
0..*
+supplier
SDimensionPackage 0..* 0..* + name
+supplier
0..*
1
1
+client dependsOn
Degenerate Fact (from Level3) 0..* + name contains +ownedElement
dependsOn 1
contains
+client
1
imports SFact Package 0..* 1 + name +importedElement
SDimension (from Level3) + name
1
1
1
contains
1 contains 1
SFact (from Level3) + name
1
+ownedElement
imports
UserProfile +name +importedElement
Fig. 6. Metamodel: level 2
0..*
SBase (from Level3 + name
1030
R. Villarroel et al.
The stereotypes SecurityRule, AuthorizationRule, and AuditRule can have the following information: • SecurityRule (sensitivity information associated). • AuthorizationRule (information to permit or deny access). • AuditRule (information to analyze the user behaviour when using the system). Finally, in Fig. 7, we will show the content of the metamodel level3 package. This diagram represents the main MD properties of our modeling approach.
Fig. 7. Metamodel: level 3
5 Secure Multidimensional Modeling at the Logical Level In this section we outline the relational metamodel of CWM. We only use part of the relational CWM metamodel for our purposes; which allow us to represent tables, columns, primary keys and foreign keys. However, for representing security and audit measures in the metamodel, we need to add some metaclasses. In Fig. 8 we show part of the relational CWM metamodel extended. The Schema metaclass aim the security at the model level. SecurityProperty metaclass inherit from the Constraint metaclass and specializes as SecurityLevels, SecurityCompartments and SecurityRoles metaclasses. Furthermore, for representing security constraints, authorization rules and audit rules in the metamodel we add AUDconstraint class, ARconstraint class and AURconstraint class, which inherit from SecurityConstraint. For specify constraints depending on particular information of a user or a group of users, we introduce the userProfile metaclass. Finally, we need add associations of Table and Column metaclasses with the metaclasses introduced in order to establish security in attributes and tables. For express the constraints (AuditRule, AuthorizationRule and SecurityRule) modeled in SECDW metamodel using notes, we need to add a new attribute OCLConstraint in the SecurityConstraint metaclass.
Using UML Packages for Designing Secure Data Warehouses
1031
Fig. 8. Secure Relational Modeling for Data Warehouses
6 An Example of Secure Multidimensional Modeling Using Packages In this section, we will apply our UML 2.0 extension for the conceptual design of a secure MD model in the context of a normal Health system. To do so, we will base on the new stereotypes and tagged-values stated in Fig. 3. This example refers to a data warehouse in the area of Health, from which we have selected three data marts such as Hospital, Admission and Services. They are separated because they are used by different end users. In Fig 9 (a), we can see the three StarPackages representing the different data marts that form DW.
Star Scheme Admission {SL=S; SR=Health, Admin}
Diagnosis
S
Star Scheme Hospital
S
Patient
{SL=S; SR=Health}
{SL=S; SR=Health, Admin}
Admission S {SL=S..TS; SR=Health, Admin}
{SL=S; SR=Health, Admin}
Star Scheme Services {SL=C, SR=Manten}
Fig. 9 (a). Level 1: Model Definition
Ward {SL=S; SR=Health}
S
Time
S
{SL=S; SL=S; SR=Health, SR=Health, Admin} Admin}
Fig. 9 (b). Level 2: Star schema Admission Definition
We have selected the Admission Star package to study its content in a detailed way. This package has the purpose of containing each one of the admissions of patients in one or more hospitals. For instance, Fig. 9 (b) presents at the center the package representing the Admission SFact Class and at the borders packages
1032
R. Villarroel et al.
corresponding to each one of the secure packages that will be later represented at the following level through dimensions and hierarchy levels. In Fig. 10, we present a detailed vision of SFactPackage Admission. In this case, as it corresponds to a secure fact package, it is shown the star schema complete. If we had chosen a dimension, it will be only detailed the modeling (including security aspects) of the dimension with its hierarchy levels. In this figure, we can see the stereotypes for the Admission Fact Class, Diagnosis and Patient Dimensions and the classification hierarchies (or Base Class) corresponding to each dimension. The tagged values are represented as static security constraints at the class or attribute level. For example, we can see that the Admission Sfact class has the security levels from Secret to TopSecret and the user roles Health and Admin. At the attribute level, there is a cost static security rule that indicates that it can only be accessed by users having the admin. role. In addition, a series of UML notes can be seen, where dynamic security constraints (that depend on a condition), authorization rules for exceptions and audit rules are represented.
Fig. 10. Level 3: Content of SFactPackage Admission
7 Conclusions and Future Work In this paper, we have presented a UML 2.0 profile for secure multidimensional modeling that extends previous works. To do so, we have used UML packages to represent our stereotypes, tagged values, and OCL constraints in our modeling.
Using UML Packages for Designing Secure Data Warehouses
1033
Furthermore, we have presented an extension of the relational metamodel of the CWM in order to represent security and audit measures in the logical modeling of data warehouses. Our inmediate future work consists on the formal specification of all the required transformations between the conceptual and the logical models by using the QueryView-Transformation (QVT), thereby aligning our approach with the Model Driven Architecture. In this way, we will be able to specify all transformations in a formal language, thereby avoiding an arbitrary definition of these rules.
Acknowledgements This research is part of the RETISTIC (TIC2002-12487-E) and METASIGN (TIN2004-00799) projects from the Spanish Ministry of Education and Science, the MESSSENGER (PCC-03-003-1) and DIMENSIONS (PBC-05-012-2) projects from the Regional Science and Technology Ministry of Castilla-La Mancha, the DADASMECA project (GV05/220) from the Regional Government of Valencia, and the COMPETISOFT project (506PI0297) financed by CYTED.
References 1. Devanbu, P. and Stubblebine, S.: Software engineering for security: a roadmap. in Proceedings of the Conference on The Future of Software Engineering. Ireland (2000) 2. Dhillon, G. and Backhouse, J.: Information system security management in the new millennium. Communications of the ACM (2000) 43(7):125-128 3. Fernández-Medina, E., Trujillo, J., Villarroel, R., and Piattini, M.: Extending the UML for Designing Secure Data Warehouses. in Int. Conference on Conceptual Modeling (ER 2004). Shanghai, China: Springer-Verlag. LNCS 3288 (2004) 4. Ferrari, E. and Thuraisingham, B.: Secure Database Systems, in Advanced Databases: Technology Design, Piattini, M. and Díaz, O., Editors, Artech House: London (2000) 5. Golfarelli, M., Maio, D., and Rizzi, S., The Dimensional Fact Model: A Conceptual Model for Data Warehouses. Int. Journal of Cooperative Information Systems (IJCIS), (1998) 7.(2-3): 215-247. 6. Katic, N., Quirchmayr, G., Schiefer, J., Stolba, M., and Min Tjoa, A.: A Prototype Model for Data Warehouse Security Based on Metadata. in 9th Int. Workshop on Database and Expert Systems Applications (DEXA'98).Vienna. IEEE Computer Society (1998) 7. Luján-Mora, S., Trujillo, J., and Song, I.Y.: Multidimensional Modeling with UML Package Diagrams. in Int. Conference on Conceptual Modeling - ER 2002.Tampere, Finland: Springer. LNCS 2503 (2002) 8. Mazon, J., Trujillo, J., Serrano, M., and Piattini, M. Applying MDA to the development of data warehouses. in 8th ACM Int. Workshop on Data Warehousing and OLAP (DOLAP'05). Bremen, Germany (2005) 57-66 9. Priebe, T. and Pernul, G.: Towards OLAP Security Design - Survey and Research Issues. in 3rd ACM Int. Workshop on Data Warehousing and OLAP (DOLAP'00). USA (2000) 10. Rosenthal, A. and Sciore, E.: View Security as the Basic for Data Warehouse Security. in 2nd Int. Workshop on Design and Management of Data Warehouses (DMDW'00). Sweden (2000)
1034
R. Villarroel et al.
11. Sapia, C., Blaschka, M., Höfling, G., and Dinter, B.: Extending the E/R Model for the Multidimensional Paradigm. in 1st Int. Workshop on Data Warehouse and Data Mining (DWDM'98). Singapore: Springer-Verlag LNCS 1552 (1998) 12. Trujillo, J., Palomar, M., Gómez, J., and Song, I.Y., Designing Data Warehouses with OO Conceptual Models. IEEE Computer, special issue on Data Warehouses, 2001(34): 66-75. 13. Tryfona, N., Busborg, F., and Christiansen, J.: starER: A Conceptual Model for Data Warehouse Design. in ACM 2nd Int. Workshop on Data Warehousing and OLAP (DOLAP'99). Missouri, USA: ACM (1999) 14. Villarroel, R., Fernandez-Medina, E., Trujillo, J., and Piattini, M.: Towards a UML 2.0/OCL extension for Designing Secure Data Warehouses. in 3rd. Int. Workshop on Security in Information Systems (WOSIS 2005). Miami, USA: INSTICC Press (2005)
Practical Attack on the Shrinking Generator Pino Caballero-Gil1 and Amparo F´ uster-Sabater2 1
2
D.E.I.O.C. University of La Laguna. 38271 La Laguna, Tenerife, Spain [email protected] Institute of Applied Physics. C.S.I.C. Serrano 144, 28006 Madrid, Spain [email protected]
Abstract. This work proposes an efficient attack on the Shrinking Generator based on its characterization by means of Linear Hybrid Cellular Automata. The algorithm uses the computation of the characteristic polynomials of specific sub-automata and the generation of the Galois field associated to one of the Linear Feedback Shift Registers components of the generator. Both theoretical and empirical evidences for the effectiveness of the attack are given. The two main advantages of the described cryptanalysis are the determinism of bits prediction and the possible application of the obtained results to different generators.
1
Introduction
A binary additive stream cipher is a synchronous cipher in which the binary output of a keystream generator is added bitwise to the binary plaintext sequence producing the binary ciphertext. The main goal in stream cipher design is to produce random-looking sequences that are unpredictable in an efficient way. From a cryptanalysis point of view, a good stream cipher should be resistant against known-plaintext attacks. Most known keystream generators are based on Linear Feedback Shift Registers (LFSRs). In particular, the Shrinking Generator (SG) is a nonlinear combinator based on two LFSRs such that the bits of one output are used to determine whether the corresponding bits of the second output are part of the overall keystream. Although there have been several approaches for attacking the SG, it produces pseudorandom sequences with good security properties. A basic divideand-conquer attack requiring an exhaustive search through all the possible initial states and feedback polynomials of the selector LFSR was proposed in [14]. The authors of [7] described a correlation attack targeting the second LFSR. Another correlation attack based on searching specific subsequences of the output sequence was introduced in [8]. More recently, a distinguishing attack applicable when the second LFSR has a low-weight feedback polynomial was investigated in [5].
Research supported by the Spanish Ministry of Education and Science and the European FEDER Fund under Projects SEG2004-04352-C04-03 and SEG2004-02418.
M. Gavrilova et al. (Eds.): ICCSA 2006, LNCS 3982, pp. 1035–1043, 2006. c Springer-Verlag Berlin Heidelberg 2006
1036
P. Caballero-Gil and A. F´ uster Sabater
On the other hand, Cellular Automata (CA) are discrete mathematical models in which a lattice of finite state machines, called cells, updates itself synchronously according to local rules [16]. CA have been proposed both for secret and public key cryptography [16], [12], [13]. Also cryptanalysis of certain CA based keystream generators have been published in [10] and [11]. Finally, in [6] a Cellular Automata-Based model for the Shrinking Generator was proposed. Such a work may be considered the starting point of this research. This work has been laid out as follows. The next section gives relevant background about the basic structures we are dealing with: Linear Hybrid Cellular Automata and Shrinking Generators. The CA-Based model for the SG that is used in this work is described next. Section 4 gives the theoretical basis of the proposed CA-based cryptanalysis of the SG. Sections 5 and 6 introduce the full description of the algorithm and its analysis, respectively. Finally, in Section 7 several conclusions and open questions are drawn.
2
Preliminaries
Cellular automata are finite state machines that consist of arrays of n cells [16]. The simplest nontrivial CA are binary and one-dimensional, where a cell’s neighbours are the cells on either side of it . The name of a CA is a decimal number which, in binary, gives the rule table. According to rule 90, the value of a particular cell i is the sum modulo 2 of the values of its two neighbour cells on the previous time step t. Rule 150 also includes the value of cell i at time step t. = xti−1 +xti+1 and xt+1 = xti−1 +xti +xti+1 These two rules may be defined as xt+1 i i t respectively, where xi represents the state value of cell i at time t. Null CA are those where cells with permanent null content are supposed adjacent to the extreme cells of the CA. Binary CA where the neighbourhood dependence is just on XOR operations are called linear CA. If in a CA different rules are applied over different cells, then it is called a hybrid CA. Linear Hybrid Cellular Automata are usually denoted with the acronyms LHCA. In this research only one-dimensional 90/150 null LHCA are considered. Binary string R1 R2 ...Rn are here used to represent n-cell LHCA, where Ri is either 0, if cell i uses rule 90, or 1, if cell i uses rule 150, for 1 ≤ i ≤ n. Given an irreducible polynomial, several algorithms have been developed to find its corresponding LHCA. The most recent one [1] applies the Euclidean algorithm to compute the LHCA in a polynomial running time, so it is sufficiently fast to generate LHCA for polynomials of very large degree. On the other hand, in [3], a synthesis algorithm based also on the Euclidean algorithm which allows to calculate in linear time the characteristic polynomial for any given LHCA was introduced. In this work such an algorithm will be called Polynomial-Synthesis Algorithm. The SG is a well-known keystream generator introduced by Coppersmith, Krawczyk and Mansour in 1993 [4]. It is composed of two LFSRs: A selector register that produces a sequence used to decimate the sequence generated by the other register. The selector register is here denoted by S, its length is LS ,
Practical Attack on the Shrinking Generator
1037
its characteristic polynomial is PS (x) and the sequence it produces is {si }. The decimated sequence is denoted {ai }, the second register that produces it is A, its length is LA , its characteristic polynomial is PA (x), and the shrunken sequence is {zj }. The period of the shrunken sequence is T = (2LA − 1)2LS −1 and its linear complexity L is such that LA 2LS −2 < L ≤ LA 2LS −1 . Its characteristic polynomial is of the form P (x)N where P (x) is a LA -degree primitive polynomial and N is an integer such that 2LS −2 < N ≤ 2LS −1 . Despite its simplicity, the SG has remained resistant against efficient cryptanalysis.
3
The Cellular Automata-Based Model for the Shrinking Generator
In this work we consider the linear model of the SG described in [6] in terms of LHCA. The equivalent LHCA obtained for any SG through the algorithm proposed in such a work, and here denoted CA-Synthesis Algorithm, are formed by concatenations of basic primitive LHCA and their mirror images, with one or two modifications of rules in each LHCA component [15]. In particular, we have found that the numbers of modifications in the described model are two in all but two concatenated LHCA, and only one modification in the two extreme LHCA. The output of the CA-Synthesis Algorithm is formed by two equivalent LHCA for any SG with selector LFSR of length LS and decimated LFSR sequence produced by A. The characteristic polynomial of the equivalent LHCA is the same as the one of the original SG, that is to say, P (x)N . Since the number of concatenations is between 2LS −2 and 2LS −1 , and the length of the basic primitive LHCA is LA , we have that the length of the equivalent LHCA is given by an integer L such that LA 2LS −2 < L ≤ LA 2LS −1 . Consequently, in order to generate the whole shrunken sequence in one of the extreme cells of the equivalent LHCA it would be necessary to determine uniquely the initial state of the equivalent LHCA which is able to produce it, and to get this, it would be necessary to intercept L shrunken bits. Consequently, although we have a linear model of the SG, in order to break the SG with it, we need as many intercepted bits as the linear complexity of the SG. This work provides an efficient way to use the CA-model of the SG in order to guess unseen bits of the shrunken sequence from the interception of a number of bits lower than the linear complexity of the SG.
4
Theoretical Basis
Let Z = Z 0 = z0 , z1 , z2 , ... be the output sequence of the SG whose characteristic polynomial P (x)N ∈ GF (2)[x] has degree L. Let Z t = zt , zt+1 , zt+2 , ... denote the t−th phase shift of Z. Let α ∈ GF (2LA ) be a root of P (x). Since the equivalent LHCA may generate the shrunken sequence in any of its cells, given a shrunken sequence z0 , z1 , z2 , ..., zr , it is always possible to
1038
P. Caballero-Gil and A. F´ uster Sabater
assume, without loss of generality, that its generation is at the left extreme cell so that x01 = z0 , x11 = z1 , x21 = z2 , ..., xr1 = zr . According to this, assuming the knowledge of r bits of the shrunken sequence, we may reconstruct r sub-sequences xti of length r − i + 1 corresponding to the rules Ri with 1 < i ≤ r so that xti = t Ri−1 (xti−1 , xt+1 i−1 , xi−2 ). Since rules 90 and 150 are additive and the equivalent LHCA is null boundary, for any rule Ri , the previous expression corresponds to a sum of some elements of the shrunken sequence: xti = zt+k1 + zt+k2 + · · · + zt+kri , whose sub-indexes correspond to the exponents of the unknown in the characteristic polynomial of the LHCA R1 R2 . . . Ri−1 . Consequently, if this subsequence {xti } of length r−i+1 is used recursively as left extreme sequence of the equivalent LHCA in order to reconstruct in the same way as before, r − i + 1 subsequences of length r − 2i + 2 corresponding to the rules Ri with 1 < i ≤ r − i + 1, we obtain in the same cell i: zt+2k1 + zt+2k2 + · · · + zt+2kri . In this way, if the hypothesis zt+d = zt+2k1 + zt+2k2 + · · · + zt+2kri , or equivalently Z d = Z 2k1 + Z 2k2 + · · · + Z 2kri
(1)
is fulfilled, then a d−th phase shift of the shrunken sequence reappears at cell i of the equivalent LHCA each second chained sub-triangle generated as explained in the previous paragraph. Furthermore, it is easy to see that if hypothesis (1) is satisfied, then each 2j-th chained triangle provides r − 2ji + 2j bits of a jd−th phase shift of the shrunken sequence. Note that hypothesis (1) may be easily generalized to hypothesis: l
l
l
Z d = Z 2 k1 + Z 2 k2 + · · · + Z 2 kri
(2)
so that if hypothesis (2) is satisfied, the reappearance of a d−th phase shift of the shrunken sequence is guaranteed at cell i in each 2l -th chained sub-triangle. On the other hand, the exponents of the unknown in the characteristic polynomial of the LHCA R1 R2 . . . Ri with i ≤ LA are of the form (2k1 , 2k2 , ..., 2kri ) if and only if the LHCA is the concatenation of a basic automata with its mirror image, R1 R2 . . . Ri/2 Ri/2 . . . R2 R1 . Also, the exponents of the unknown in the characteristic polynomial of the LHCA R1 R2 . . . Ri with i ≤ LA are of the form (2k1 +1, 2k2 +1, ..., 2kri +1) if and only if the LHCA is the concatenation of a basic automata with the rule 90 and its mirror image, R1 R2 . . . R(i−1)/2 0R(i−1)/2 . . . R2 R1 . Consequently, the hypothesis (2) is always fulfilled for rule Ri when the sub-automata R1 R2 . . . Ri corresponds to one of both previous descriptions, and in such a case a d−th phase shift of the shrunken sequence at cell i is obtained in at most the 2LS −2 -th chained sub-triangle. It is well-known that if {sn } is a sequence produced by a LFSR whose characteristic polynomial is irreducible, and α is a root of such a polynomial, then each element sn of the sequence may be written as the trace of the n-power of α [9]. Since P (x) is a LA -degree primitive polynomial, the successive powers αi , 0 ≤ i < 2LA − 1 generate the finite field GF (2LA ), and their respective traces equal the corresponding elements si of the PN-sequence associated to the polynomial P (x). On the other hand, since the trace function is linear and all the
Practical Attack on the Shrinking Generator
1039
powers of α may be expressed in terms of the first LA − 1 powers, the association between powers of α and elements of the PN-sequence may be transferred to linear relations between different phase shifts of the PN-sequence and the first LA − 1 phase shifts. From [6] we know that the shrunken sequence is composed of interpolations of different phase shifts of the PN-sequence associated to the polynomial P (x), so that the element si of the basic PN-sequence corresponds to the shrunken bit ziN . Consequently, any linear relation between different phase shifts of the PN-sequence deduced as explained in the previous paragraph corresponds to a linear relation between different phase shifts of the shrunken sequence, which are the same phase shifts obtained for the PN-sequence, but multiplied by N .
5
Cryptanalysis of the Shrinking Generator
Starting from the theoretical basis of the previous section, in the following we describe an efficient algorithm based on the generation of the finite field GF (2LA ) and the test of hypothesis (2). The proposed cryptanalysis algorithm has two phases. The first off-line phase is equal for many different generators that have part of its structure in common. The second on-line phase requires as input r intercepted shrunken bits, and provides as output a number of unseen shrunken bits which is directly proportional to the number of intercepted shrunken bits. Algorithm Off-line Phase: Input: The lengths LS and LA , and the characteristic polynomial of A, PA (x) corresponding to the LFSRs S and A components of the SG. Step 1: Using the CA-Synthesis Algorithm described in [6], compute the two equivalent LHCA that are valid for any SG with selector LFSR of length LS and decimated LFSR sequence produced by A. Step 2: Using the primitive LA -degree polynomial P (x) associated to the basic LHCA, generate the finite field GF (2LA ) formed with the exponentiation of one root of such a polynomial, α, and express each element of GF (2LA ), αe as a LA -length array E = [e0 , e1 , . . . , eLA −2 , eLA −1 ] where ei = i iff αi is present in the expression of αe . Step 3: For both LHCA obtained in step 1, search for sub-automata of the form R1 R2 . . . Ri/2 Ri/2 . . . R2 R1 or R1 R2 . . . R(i−1)/2 0R(i−1)/2 . . . R2 R1 . For each of them, use the Polynomial-Synthesis Algorithm described in [3] to calculate the 2 ∗ r characteristic polynomials and express them as 2 ∗ r different (L + 1)-length arrays D = [d0 , d1 , . . . , dL−1 , dL ] where di = i iff i is an exponent of the unknown in the corresponding polynomial. Using the finite field generated in the previous step, decompose each array D as an equivalent linear expression a∗B +C with a being a power of 2, and B and C two arrays such that B = [0, b1 , . . . , bLA −2 , bLA −1 ] and C = [c, c, . . . , c]. Consider B as the output of this step. Step 4: For each array B = [0, b1 , . . . , bLA −2 , bLA −1 ] obtained in Step 3, search it within all the LA -length arrays obtained in Step 2, and if found it, associate
1040
P. Caballero-Gil and A. F´ uster Sabater
to the corresponding rule the (N/a) ∗ (a ∗ e + c)-th phase shift for the N/a-th sub-triangles. Output: For each equivalent LHCA, favourable rules and corresponding phase shifts and sub-triangles. On-line Phase: Input: Output of the off-line phase and r intercepted shrunken bits. Step 5: Once intercepted r shrunken bits, proceed with them by generating the chained sub-triangles indicated in Step 4, to obtain for all successful rules u, ar2 /2N Ru bits of the (N/a) ∗ (a ∗ e + c)-th phase shifts of the shrinking sequence. Output: A variable number of bits of the shrunken sequence. Note that all the computations made for any LHCA are useful for any other LHCA with the same basic CA and more concatenations, that is to say, the outputs of steps 2, 3 and 4 obtained for an equivalent LHCA with characteristic polynomial (P (x))N1 continue being correct for any other equivalent LHCA with characteristic polynomial (P (x))N2 , with N1 |N2 . Consequently, the cryptanalysis of a SG with LFSRs S1 and A1 are useful for the cryptanalysis of any other SG with LFSRs S2 and A2 such that its corresponding characteristic polynomial is (PA (x))N2 with N1 |N2 . Example: Consider any SG with LS =3, LA =5, PA (x) = 1 + x + x2 + x3 + x5 . Step 1: P (x) = 1 + x + x2 + x4 + x5 The two LHCA computed using the CA-Synthesis Algorithm are: LHCA1 :10001100000000110001 and LHCA2 : 00000000011000000000 Step 2: The finite field GF (25 ) associated to P (x) is computed. Here only the powers that have the term 1 in their expressions are showed. E10 : [01234] E11 : [03] E13 : [014] E5 : [0124] E6 : [034] E7 : [02] E14 : [04] E15 : [024] E16 : [0234] E17 : [023] E19 : [01] E23 : [012] E26 : [0123] E28 : [013] E30 : [0134] E31 : [0] Steps 3 and 4: For LHCA1 the only useful sub-automata corresponds to rule 5. For LHCA2 we find useful sub-automata from rule 2 to rule 9. The characteristic polynomials for those rules are computed with the PolynomialSynthesis Algorithm. Then they are decomposed in terms of 5-length arrays. Here only the positive terms of the arrays are shown and the arrays C are written as c. After the arrows the corresponding phase shifts and sub-triangles obtained from the comparison with the arrays computed in step 2 are indicated. For LHCA1 : D51 :[1 3 5]:2*[0 1 2]+1→2(2*23+1)=94-th shift, 2-nd triangles For LHCA2 : D22 :[0 2]:2*[0 1]→4*19= 76-th shift, 2-nd triangles D32 :[3]:[0]+3→ 3-th shift, 1-st triangle D42 :[0 2 4]:2*[0 1 2]→4*23= 92-th shift, 2-nd triangles
Practical Attack on the Shrinking Generator
1041
D52 :[1 5]:4*[0 1]+1→4*19+1= 77-th shift, 1-st triangle D62 :[0 4 6]:2*[0 2 3]→4*17= 68-th shift, 2-nd triangles D72 :[7]: [0]+7→ 7-th shift, 1-th triangle D82 :[0 4 6 8]:2*[0 2 3 4]→ 4*16=64-th shift, 2-nd triangles D92 :[1 5 9]:4*[0 1 2]+1→4*23+1= 93-th shift, 1-st triangle From the off-line phase we know that we may deduce a number of unseen ˙ 76, ˙ 92, ˙ 77, ˙ 68, ˙ 64 ˙ and 93 ˙ (mod 124), which depend shrunken bits from positions 94, on the number of intercepted bits in the on-line phase. Step 5: After the interception of r=10 shrunken bits: 0011101011, we proceed with them by generating the four chained sub-triangles corresponding to the LHCA2 for Rule R2 R1 90 0 0 1 1 1 0 1 0 1 1
R2 90 0 1 1 1 0 1 0 1 1
R3 chain R1 R2 R3 chain R1 R2 R3 chain R1 R2 R3 90 90 90 90 90 90 90 90 90 90 1 1 1 1 1 0 0 0 1 1 1 1 0 0 0 0 1 1 0 1 0 0 1 0 0 1 0 0 0 1 1 0 1 1 0 1 1 0 0 0 0 0 0 0 0 0 1 1 0 0 1 1 1
So, we get 6 bits of the 76-th phase shift, z76 :1,z77 :0, z78 :0, z79 :1, z80 :0, z81 :1, and 2 bits of the 28-th phase shift, z28 :1, z29 :1. We may also use R2 of LHCA1 to generate with the 4-th triangle, 2 new bits: z92 :0, z93 :1. In summary, from the knowledge of 10 shrunken bits we have discovered other 10 unseen shrunken bits.
6
Analysis of the Algorithm
In this section we present some simulation results and a comparison with previous attacks. In the following we show a summary of the outputs from the off-line phase of the attack using the first LHCA from the tables in [2]. The next table contains the rules that allow to get new bits from an earlier triangle than 2LS −1 -th. LA Ri LA Ri
6 7 8 10 12 13 15 16 17 20 21 23 25 26 27 29 30 2 3 6 2;4 4 10 8 2 13 4 6 8 3 6 2 6;13 3 31 32 33 34 35 36 37 38 39 40 42 43 45 46 47 48 49 2;4 2;3;4 2 5 2;13 6 5 2;4 5 2;6 2;7 2 2 6 4 5 6
As the lengths of the LFSRs increase, the proposed attack naturally needs a larger intercepted sequence. Concretely, the number r of intercepted bits that are necessary to guess n new bits is given by Ri ∗ 2LS −2 + n. Note that the
1042
P. Caballero-Gil and A. F´ uster Sabater
automata with more favorable rules associated give more information for each intercepted bit. The most interesting conclusion of this empirical analysis is that most basic automata have some sub-automata of the form R1 R2 ... Ri/2 Ri/2 ... R2 R1 or R1 R2 ... R(i−1)/2 0R(i−1)/2 ... R2 R1 , which allow to use them in order to get new unseen shrunken bits starting from a number of intercepted shrunken bits that is lower than the linear complexity of the generator. In order to compare the proposed attack with known related results, two important aspects of our algorithm should be highlighted. First, the off-line phase is to be executed before intercepting sequence, and consequently, its computational complexity should not be considered in the same way as on-line computations (after interception). The second point is the determinism of the proposed attack because the obtained bits are known with absolute certainty. The computational complexity of the proposed attack is O(2LS −2 ). If we compare it with the one of known attacks on SG, we find the following. The complexity of the divideand-conquer attack proposed in [14] is exponential in LS . The probabilistic correlation attack described in [7] has a computational complexity of 2LA ∗ LA 2 . Also the probabilistic correlation attack introduced in [8] is exponential in LA . Finally, the distinguishing attack investigated in [5] needs approximately 232 bits to distinguish the SG with a weight 4 polynomial of degree 10000. However, note that the distinguishing attack does not try to recover the sequence, and instead, the aim is to distinguish the keystream from a purely random sequence.
7
Conclusions and Open Problems
The main purpose of this paper has been to introduce a practical known-plaintext attack on the shrinking generator, which requires a number of intercepted bits lower than the linear complexity in order to predict with absolute certainty approximately a number of unseen shrunken bits that depends on the number of intercepted bits. Any shrinking generator whose characteristics lead to a successful off-line phase of the algorithm that implies the deduction of too many unseen shrunken bits should be rejected for its cryptographic use. Therefore, the proposed algorithm is useful both for cryptanalysts and for cryptographers who use the shrinking generator. The two main advantages of the described cryptanalysis are the determinism of bits prediction and the application of obtained results to different shrinking generators. One of the subjects that are being object of work in progress is the modelling of other keystream generators through concatenations of maximum length CA.
References 1. K. Cattell and J.C. Muzio. Synthesis of one-dimensional linear hybrid cellular automata. IEEE Transactions on Computer-Aided Design, 1996. 2. K. Cattell and J.C. Muzio. Tables of linear cellular automata for minimal weight primitive polynomials of degree up to 300. Technical report, University of Victoria, Department of Computer Science, 1993.
Practical Attack on the Shrinking Generator
1043
3. K. Cattell, S. Zhang, X. Sun, M. Serra, J.C. Muzio, and D. M. Miller, OneDimensional Linear Hybrid Cellular Automata: Their Synthesis, Properties, and Applications in VLSI Testing. Tutorial. www.cs.uvic.ca/ mserra/CA.html 4. D. Coppersmith, H. Krawczyk, Y. Mansour, The Shrinking Generator. Proc. Crypto’93. LNCS 773, Springer Verlag (1994) 22-39. 5. P. Ekdahl, W. Meier, T. Johansson, Predicting the Shrinking Generator with Fixed Connections. Proc. Eurocrypt’03. LNCS 2656, Springer Verlag (2004) 345-359. 6. A. F´ uster-Sabater, D. de la Gu´ıa, Cellular Automata Application to the Linearization of Stream Cipher Generators. Proc. ACRI 2004. LNCS 3305, Springer Verlag (2004) 612-622. 7. J.D.Golic, L O’Connor, A Cryptanalysis of Clock-Controlled Shift Registers with Multiple Steps. Cryptography: Policy and Algorithms (1995) 174-185. 8. T. Johansson, Reduced Complexity Correlation Attacks on Two Clock-Controlled Generators. Proc. Asiacrypt’98. LNCS 1514, Springer Verlag (1998) 342-356. 9. E.L. Key, An Analysis of the Structure and Complexity of Nonlinear Binary Sequence Generators, IEEE Transactions on Information Theory 22 (1976) 732-736. 10. W.Meier, O.Staffelbach, Analysis of Pseudo Random Sequence Generated by Cellular Automata. Proc. Eurocrypt’91. LNCS 547, Springer Verlag (1992) 186-199. 11. M. Mihaljevic, Security Examination of a Cellular Automata Based Pseudorandom Bit Generator Using an Algebraic Replica Approach. Proc. AAECC-12. LNCS 1255, Springer Verlag (1997) 250-262. 12. S.Nandi, B.K.Kar, P.P.Chaudhuri, Theory and Applications of Cellular Automata in Cryptography. IEEE Transactions on Computers 43,12 (1994) 1346-1357. 13. F. Seredynski, P. Bouvry, A.Y. Zomaya, Cellular Automata Computations and Secret Key Cryptography. Parallel Computing archive Volume 30, Issue 5-6 (2004). 14. L.Simpson, J.D.Golic, E.Dawson, A Probabilistic Correlation Attack on the Shrinking Generator. Proc.ACISP’98. LNCS 1438, Springer Verlag (1998) 147-158. 15. X. Sun, E. Kontopidi, M. Serra, and J. C. Muzio. The concatenation and partitioning of linear finite state machines. International Journal of Electronics, 78(5) 809-839, 1995. 16. S.Wolfram, Cryptography with Cellular Automata. Proc. Crypto’85. LNCS 218, Springer Verlag (1986) 429-432.
A Comparative Study of Proposals for Establishing Security Requirements for the Development of Secure Information Systems Daniel Mellado1, Eduardo Fernández-Medina2, and Mario Piattini2 1
Ministry of Labour and Social Affairs, Management Organism of Information Technologies of the Social Security, Quality, Auditing and Security Institute, Madrid, Spain [email protected] 2 Alarcos Research Group, Information Systems and Technologies Department, UCLM-Soluziona Research and Development Institute, University of Castilla-La Mancha, Paseo de la Universidad 4, 13071 Ciudad Real, Spain (Eduardo.FdezMedina, Mario.Piattini)@uclm.es
Abstract. Nowadays, security solutions are focused mainly on providing security defences, instead of solving one of the main reasons for security problems that refers to an appropriate Information Systems (IS) design. In this paper a comparative analysis of eight different relevant technical proposals, which place great importance on the establishing of security requirements in the development of IS, is carried out. And they provide some significant contributions in aspects related to security. These can serve as a basis for new methodologies or as extensions to existing ones. Nevertheless, they only satisfy partly the necessary criteria for the establishment of security requirements, with guarantees and integration in the development of IS. Thus we conclude that they are not specific enough for dealing with security requirements in the first stages of software development in a systematic and intuitive way, though parts of the proposals, if taken as complementary measures, can be used in that manner.
1 Introduction Present-day information systems are vulnerable to a host of threats. What is more, with increasing complexity of applications and services, there is a correspondingly greater chance of suffering from breaches in security [25]. In our contemporary Information Society, depending as it does on a huge number of software systems which have a critical role, it is absolutely vital that IS are ensured as being safe right from the very beginning [2, 18]. That is so, is obvious from the potential losses faced by organizations that put their trust in all these IS. As we know, the principle which establishes that the building of security into the early stages of the development process is cost-effective and also brings about more robust designs is widely-accepted [15]. The biggest problem, however, is that in the majority of software projects security is dealt with when the system has already been designed and put into operation. On many occasions, this is thanks to an inappropriate management of the specification of the security requirements of the new system, since the stage known as the requirement specification phase is often carried out with the M. Gavrilova et al. (Eds.): ICCSA 2006, LNCS 3982, pp. 1044 – 1053, 2006. © Springer-Verlag Berlin Heidelberg 2006
A Comparative Study of Proposals for Establishing Security Requirements
1045
aid of just a few descriptions, or the specification of objectives that are put down on a few sheets of paper. Added to this, the actual security requirements themselves are often not well understood. This being so, even when there is an attempt to define security requirements, many developers tend to describe design solutions in terms of protection mechanisms, instead of making declarative propositions regarding the level of protection required [8]. A very important part of the achieving of secure software systems in the software development process is that known as Security Requirements Engineering .This provides techniques, methods and norms for tackling this task in the IS development cycle. It should involve the use of repeatable and systematic procedures in an effort to ensure that the set of requirements obtained is complete, consistent and easy to understand and analyzable by the different actors involved in the development of the system [16]. A good requirement specification document should include both functional requirements (related to the services which the software or system should provide), and non-functional (related to what are known as features of quality, performance, portability, security, etc). As far as security is concerned, it should be a consideration throughout the whole development process, and it ought to be defined in conjunction with the requirements specification [19]. In this paper eight relevant technical proposals are studied. They are ones which place importance on eliciting security requirements in the development of IS. These proposals will be presented briefly and then compared in this paper. This should serve as an introduction to the current state of the art of security requirements in the development of IS. The remainder of the paper is set out as follows: in section 2, we will describe each one of the technical proposals. We will present the comparative study performed on these proposals in section 3. Lastly, our conclusions are set out in section 4.
2 Technical Proposals Which Support Security Requirements The proposals which will be analyzed in this comparative study are as follows: • Breu, et al. 2004 & Breu & Innerhofer–Oberperfler, 2005: “Towards a systematic development of secure systems” [5] and “Model based business driven IT security analysis” [6]. • Firesmith, 2003 & 2004: “Security Use Cases” [9] and “Security Requirements in Open Process Framework” [10]. • Jennex 2005: “Modeling security requirements for information system development” [13]. • Myagmar, Lee, & Yurcik, 2005: “Threat modeling as a basis for security requirements” [20]. • Toval et al. 2001: “Security requirements in SIREN” [24] and Gutiérrez, et al. 2005: “Security Requirements for Web Services based on SIREN” [11]. • Peeters 2005: “Agile security requirements engineering” [21]. • Popp et al. 2003: “Security-Critical system development with extended use cases” [22]. • Yu 1997: “Security requirements based on the i* framework” [26]. We have chosen these proposals because the majority of them try to solve the problem of security in the different phases of IS development. They also place an
1046
D. Mellado, E. Fernández-Medina, and M. Piattini
emphasis on security requirements in the development of secure Information Systems. We give a brief outline of each of these approaches below. 2.1 Towards a Systematic Development of Secure Systems, and Model Based Business Driven IT Security Analysis (Proposed by Breu et al. 2004 [5] and Breu & Innerhofer–Oberperfler, 2005 [6]) The authors propose a new process model for security engineering, which extends object oriented, use case driven software development by the systematic treatment of security related issues. They also introduce the notion of security aspects, describing the most relevant security requirements and the countermeasures at a certain level of abstraction. Starting from the concept of iterative software construction, they present a micro-process for the security analysis, made up of five steps, which are performed repeatedly at each level of abstraction throughout the incremental development: elicitation of security requirements, threats and risks analysis, taking measures and the correctness check relating measures and requirements. Finally, the authors conclude that security of information is a business issue, and for this reason its management should be business driven. 2.2 Security Use Cases, and Security Requirements in Open Process Framework (Proposed by Firesmith, 2003 [9] & 2004 [10]) Firesmith in [10] offers some steps which allow security requirements to be defined from reusable templates. His analysis of security requirements is founded on two basic principles obtained from OCTAVE (Operationally Critical Threat, Asset, and Vulnerability Evaluation) [1] based on resources and risk-driven. The steps in his process for the identification and analysis of security requirements are: identification of assets; identifying the most likely attackers types; identification of the possible threats to these assets; determining the negative impacts for each vulnerable resource; estimating and prioritizing security risks with respect to vulnerable resources and according to the most relevant threats and their potentially negative impact; choosing security subfactors, to limit the risk to an acceptable level; choosing the relevant templates for each subfactor and security risk; identify the relevant functional requirements; determine the security criteria; define the security metric, along with the minimum level that is acceptable; specify the requirement. Moreover, the author proposes security use cases as a technique that should be used to specify the security requirements that the application shall successfully protect itself from its relevant security threats [9]. The final suggestions from this author are that, given that systems usually have similar security requirements, templates should be employed to specify the security requirements in such a way that they can easily be re-used from one system to another. 2.3 Modeling Security Requirements for Information System Development (Proposed by Jennex, 2005) [13] Jennex puts forward the idea of using barrier analysis and the concept of defence indepth to modify Siponen and Baskervilles’s integrated design paradigm [23] into a more graphical and easier to understand and use methodology.
A Comparative Study of Proposals for Establishing Security Requirements
1047
The methodology suggested by the author, then, proposes using barrier analysis diagrams as a graphical method of identifying and documenting security requirements. Furthermore, this approach used meta-notation to add security details to existing system development diagrams. The process follows the approach of integrating security design into the software development life-cycle. The objective of using barrier diagrams in the requirement phase, therefore, is that the security requirements should be appropriately identified. 2.4 Threat Modeling as a Basis for Security Requirements (Proposed by Myagmar, Lee, & Yurcik, 2005) [20] The authors take as starting point for their proposal the following question, one that is important to ask in every IS- “Are the system’s security features really necessary, and do they really meet the system’s security needs? The writers offer a viewpoint on the process of requirement engineering in which, with an appropriate identification of threats and a proper choice of countermeasures, the ability of attackers to misuse or abuse the system is lessened. The threat-modelling process set out by these authors is made up of three highlevel steps: Characterizing the system; Identifying assets and access points; Identifying threats. As far as the specification of security requirements is concerned, the greater part of the information needed for the elicitation of requirements and for composing an initial set of security requirements is provided by means of threat modelling. This is done by changing a declaration of threat into a requirement by including “must not” in the declaration. Lastly, with the final goal being to achieve 100% risk acceptance, the risk management the writers propose consists of: risk assessment (to do this risks should be prioritized according to the damage they might cause and to the likelihood of their occurring), risk reduction, and risk acceptance. 2.5 Security Requirements in SIREN and Security Requirements for Web Services Based on SIREN (Proposed by Toval, et al. 2001 [24] and Gutierrez, et al. 2005 [11]) In [24] Toval et al. define a Requirement Engineering process, based on the re-use of security requirements, which is also compatible with MAGERIT (the Spanish public administration risk analysis and management method), which conforms to CCF (Common Criteria Framework) defined by the ISO 15408 (ISO/IEC, 1999). The re-use of security requirements is carried out at different specification level: at a documentation level through the defining of a hierarchical structure of security requirement specifications, and at the level of security requirement by means of its being stored in the repository of re-usable requirements. SIREN (SImple REuse of software requiremeNts) describes a process model, some basic guidelines, techniques and tools. The guidelines consist of a hierarchy of requirement specification documents, together with the template for each document. It is a spiral model process, and includes the phases of requirements elicitation, requirements analysis and negotiation, requirements specification and validation. A repository of requirements classified by domains and profiles is also defined.
1048
D. Mellado, E. Fernández-Medina, and M. Piattini
Moreover, in [11] Gutiérrez et al. present a catalogue of security requirement templates for Web Services (WS) based on the SIREN method of requirement engineering. They focus their efforts on the security requirement templates for the following subfactors: authentification, authorization, confidentiality, integrity and privacy. 2.6 Agile Security Requirements Engineering (Proposed by Peeters, 2005) [21] Peeters proposes to extend agile practices to deal with security in an informal, communicative and assurance-driven spirit. To increase the agility of requirement engineering, Peeters puts forward the idea of using “abuser stories”. These identify how the attackers may abuse the system and jeopardize stakeholders’ assets. Thus, throughout the abuser stories, the establishing of security requirements is made easier. As with “user stories”, “abuser stories” are short and informal and they are scored and ranked according to the perceived threat they pose to customers’ assets. Correct planning will consequently mean considering the “user stories” and the “abuser stories” together. This will ensure an explicit, rational trade-off between functionality and security. 2.7 Security-Critical System Development with Extended Use Cases (Proposed by Popp, et al. 2003) [22] What these authors provide is an extension to the conventional process of developing use-case-oriented processes [7, 12]. This process normally consists of three activities as far as requirement engineering is concerned. 1. They deal with the static concepts of the domain of an application in a class model known as Application Core. In this point they extend the domain by modelling access policies and security properties based on UMLSec [14]. 2. Identification of the use cases and their manifestation in a Use Case Model. These are completed by the textual description coming from characteristics which measure the threats and vulnerability of input and output. They also outline the security policies which are a response to previous threats. The outcome is a model known as Model of Security Use Cases. 3. Integration of the previous two viewpoints in a single oriented object model, mainly through the description of use cases in terms of message flows between objects. The extension consists of the integration of the Security Use Case Model, and the Application Core refines the security policy in terms of the message flows between objects. 2.8 Security Requirements Based on the i* Framework (Proposed by Yu, 1997) [26] The i* framework provides a framework which allows the easy integration of different techniques and concepts for dealing with Systems Security. The structural representation defined in i* shows the dependence relationships between the actors, and is what makes security aspects appear.
A Comparative Study of Proposals for Establishing Security Requirements
1049
In (Fig. 1. Requirement elicitation process with i*) we see the process of functional requirements elicitation and analysis defined by i*, and how it is integrated into the process of elicitation and analysis of security requirements [17].
Fig. 1. Requirement elicitation process with i*
3 Comparison To get a general overview of the different proposals which we have discussed above, a comparative study of these will be carried out in this section. To this end, we propose an analytical framework based on the following criteria: Degree of Agility. This refers to the degree of agility of the methodology of development, as compared with traditional, planned methodologies. To see this we will take as our basis the observations carried out Boehm and Turner [3, 4]. These authors propose a method based on risks, by means of which they try to keep both kinds of methodologies (those which are agile and those driven by planning- traditional ones) in balance, taking advantage of the positive points of the two types and taking steps to make up for their disadvantages. Each proposal will be given a rating using the following scale: high, medium-high, medium, medium-low, low. Support. This refers to aspects such as tools, procedures, guides, standards and study cases which help make the proposal easier to use. Each proposal will be given a rating using the following scale: high, medium-high, medium, mediumlow, low. Degree of integration with other software requirements. This is all about how the establishing of security requirements fits in with the establishing of other software requirements (with the other non-functional requirements, as well as with the functional ones) in the development of an IS. To do this it will take into account aspects such as the use of similar and already-existing techniques for the determining of other requirements, such as, for instance, UML diagrams. It also bears in mind the degree of parallelization and co-ordination with the elicitation of other requirements, etc. Each proposal will be given a rating using the following scale: high, medium-high, medium, medium-low, low.
1050
D. Mellado, E. Fernández-Medina, and M. Piattini
User friendliness. Here the reference is to the ease with which the technique could be used without any previous knowledge or special training. In this case characteristics such as the help support and the use of techniques which already exist for other requirements will be taken into account, as well as widely-used standards, etc. Each proposal will be given a rating using the following scale: high, medium-high, medium, medium-low, low. Contributions of the proposal as regards security. New perspectives which the proposal brings to the improvement of the establishing of security requirements. In the following table (Table 1. Comparison of proposals) the comparison between the proposals from different authors, within the analysis framework we propose, is set out. Table 1. Comparison of proposals
Degree of integration with other software requirements
User friendliness
Contributions of the proposal as regards security
Medium
MediumHigh
A micro-process for the security analysis
Firesmith, 2003 MediumMedium and 2004 [9, 10] High
Medium
MediumHigh
MediumLow
MediumHigh
High
Myagmar, Lee, Medium& Yurcik 2005 Medium Low [20]
Medium
MediumHigh
Criteria Proposals Breu, et al. 2004 [5] and Breu & Innerhofer– Oberperfler, 2005 [6]
Jennex, 2005[13]
Help Degree of Agility Support
Low
High
Medium
MediumPeeters, High Low 2005[21] Popp, et al. 2003 MediumHigh [22] Low
High MediumHigh
MediumHigh MediumHigh
Toval,, et al. 2001 [24] & Gutierrez, et al.2005 [11]
Low
MediumHigh
Medium
MediumHigh
Yu, 1997 [26]
Low
MediumLow
MediumHigh
MediumHigh
Security use cases. Re-usable templates Diagrams of barriers Threat modeling as a basis for security requirements Abuser stories UMLSec Re-use of security requirements compatible with MAGERIT A catalogue of security requirements templates for WS Integration of functional and security requirements
A Comparative Study of Proposals for Establishing Security Requirements
1051
As can be seen in the table, after our analysis we reach the conclusion that the proposals discussed above present some weaknesses. These include the difficulty of integrating them into the software development; the lack of an overall/complete support of security modelling at an organizational, conceptual and technical level. There is also an increasing distance between the development of the IS and the implementation of the necessary security. Moreover, these proposals are not specific enough for a systematic and intuitive treatment of IS security requirements in the first stages of software development. In short, the proposals we have analyzed partially satisfy the criteria that are necessary for the establishing of security requirements with some degree of guarantee. They do not reach the desired level of integration in the development of IS. At the same time, having said all that, each one of these methodologies contributes highly important aspects to do with security. These are features that can be used as the basis for new methodologies, or as extensions of those that already exist.
4 Conclusions In our present so-called Information Society the increasingly crucial nature of IS with corresponding levels of new legal and governmental requirements is obvious. So the development of more and more sophisticated approaches to ensuring the security of information is becoming a necessity. Information Security is usually only tackled from a technical viewpoint at the implementation stage, even though it is an important aspect. We believe it is fundamental to deal with security at all stages of IS development, especially in the establishing of security requirements, since these form the basis for the achieving of a robust IS. Various interesting methodological proposals which have to do with this issue exist some of them have been described and compared in this paper, although they all present some weak points, as we have said above. In a similar vein, it must be said that these approaches are not specific enough for a treatment of IS security requirements in the first stages of the IS development process. Consequently, we consider that it would be interesting to obtain some systematic and intuitive way of eliciting and defining security requirements with some guarantee. Such a technique should allow integration of security requirements into the IS development as far and as much as is possible. It will also permit the re-use of requirements from some projects to others. Added to these considerations is the fact that it will have to be valid for the new Internet-based IS and especially for those based on SOA architecture, supported by the technology of Web Services. To this end it would be good if it provided support tools. Positive it would also be if it were based on standards of normalization for the definition of requirements. Some examples of this might be XML or the use of templates; modeling standards like UML can be used to similar advantage. It would also be desirable for it to conform to security management standards such as ISO/IEC 17799 or COBIT.
Acknowledgements This paper has been produced in the context of the DIMENSIONS (PBC-05-012-2) Project of the Consejería de Ciencia y Tecnología de la Junta de Comunidades de Castilla- La Mancha along with FEDER and the CALIPO (TIC2003-07804-CO5-03)
1052
D. Mellado, E. Fernández-Medina, and M. Piattini
and RETISTIC (TIC2002-12487-E) projects of the Dirección General de Investigación del Ministerio de Ciencia y Tecnología.
References 1. Alberts, C.J., Behrens, S.G., Pethia, R.D., and Wilson, W.R., OCTAVE Framework, Version 1.0. 1999: Networked Systems Survivability Program. p. 84. 2. Baskeville, R., The development duality of information systems security. Journal of Management Systems, 1992. 4(1): p. 1-12. 3. Boehm, B. and Turner, R., Observations on Balancing Discipline and Agility. 2003: Agile Development Conference (ADC '03). p. 32. 4. Boehm, B. and Turner, R., Balancing Agility and Discipline: Evaluating and Integrating Agile and Plan-Driven Methods. 2004: ICSE'04. p. 718-719. 5. Breu, R., Burger, K., Hafner, M., and Popp, G., Towards a Systematic Development of Secure Systems. 2004: WOSIS 2004. 6. Breu, R. and Innerhofer–Oberperfler, F., Model based business driven IT security analysis. 2005: SREIS 2005. 7. D'Souza, D.F. and Wills, A.C., Objects, Components & Frameworks with UML: The Catalysis Approach. 1998: Addison-Wesley Publishing Company. 8. Firesmith, D.G., Engineering Security Requirements. Journal of Object Technology, 2003. 2(1): p. 53-68. 9. Firesmith, D.G., Security Use Cases. 2003: Journal of Object Technology. p. 53-64. 10. Firesmith, D.G., Specifying Reusable Security Requirements. 2004: Journal of Object Technology. p. 61-75. 11. Gutiérrez, C., Moros, B., Toval, A., Fernández-Medina, E., and Piattini, M., Security Requirements for Web Services based on SIREN. Symposium on Requirements Engineering for Information Security (SREIS-2005), together with the 13th IEEE International Requirements Engineering Conference – RE’05, 2005. 12. Jacobson, I., Booch, G., and Rumbaugh, J., The Unified Software Development Process. 1999: Addison-Wesley Longman Inc. 13. Jennex, M.E., Modeling security requirements for information systems development. 2005: SREIS 2005. 14. Jürjens, J., Secure Systems Development with UML. 2005: Springer. 309. 15. Kim., H.-K., Automatic Translation Form Requirements Model into Use Cases Modeling on UML, C. Youn-Ky, Editor. 2005: ICCSA 2005. 16. Kotonya, G. and Sommerville, I., Requirements Engineering Process and Techniques. 1998. 17. Liu, L., Yu, E., and Mylopoulus, J., Security and Privacy Requirements Analysis within Social Setting. 2003: 11th IEEE International Requirements Engineering Conference. 18. McDermott, J. and Fox, C. Using Abuse Case Models for Security Requirements Analysis. in Annual Computer Security Applications Conference. 1999. Phoenix, Arizona. 19. Mouratidis, H., Giorgini, P., Manson, G., and Philp, I. A Natural Extension of Tropos Methodology for Modelling Security. in Workshop on Agent-oriented methodologies, at OOPSLA 2002. 2003. Seattle, WA, USA. 20. Myagmar, S., J. Lee, A., and Yurcik, W., Threat Modeling as a Basis for Security Requirements. 2005: SREIS 2005. 21. Peeters, J., Agile Security Requirements Engineering. 2005: SREIS 2005.
A Comparative Study of Proposals for Establishing Security Requirements
1053
22. Popp, G., Jürjens, J., Wimmel, G., and Breu, R., Security-Critical System Development with Extended Use Cases. 2003: 10th Asia-Pacific Software Engineering Conference. p. 478-487. 23. Siponen, M. and Baskerville, R., A new paradigm for adding security into IS development methods,. 2001: 8th Annual Working Conference on Information Security Management andSmall Systems Security. 24. Toval, A., Nicolás, J., Moros, B., and García, F., Requirements Reuse for Improving Information Systems Security: A Practitioner's Approach. 2001: Requirements Engineering Journal. p. 205-219. 25. Walton, J.P., Developing a Enterprise Information Security Policy. 2002, ACM Press: Proceedings of the 30th annual ACM SIGUCCS conference on User services. 26. Yu, E., Towards Modelling and Reasoning Support for Early-Phase Requirements Engineering. 1997: 3rd IEEE International Symposium on Requirements Engineering (RE'97). p. 226-235.
Stochastic Simulation Method for the Term Structure Models with Jump Kisoeb Park1 , Moonseong Kim2 , and Seki Kim1, 1
2
Department of Mathematics, Sungkyunkwan University, 440-746, Suwon, Korea Tel.: +82-31-290-7030, 7034 {kisoeb, skim}@skku.edu School of Information and Communication Engineering, Sungkyunkwan University, 440-746, Suwon, Korea Tel.: +82-31-290-7226 [email protected]
Abstract. Monte Carlo Method as a stochastic simulation method is used to evaluate many financial derivatives by financial engineers. Monte Carlo simulation is harder and more difficult to implement and analyse in many fields than other numerical methods. In this paper, we derive term structure models with jump and perform Monte Carlo simulations for them. We also make a comparison between the term structure models of interest rates with jump and HJM models based on jump. Bond pricing with Monte Carlo simulation is investigated for the term structure models with jump.
1
Introduction
Before mentioning the procedure in derivation of bond pricing models with jumps, we discuss general models of the term structure of interest rates. Approaches to modeling the term structure of interest rates in continuous time may be broadly described in terms of either the equilibrium approach or the noarbitrage approach even though some early models include concepts from both approaches. We introduce one-state variable model of Vasicek (1977)[16], Cox, Ingersoll, and Ross (CIR)[5], the extended model of the Hull and White[10], and the development of the model is the jump-diffusion model of the Ahn and Thompson[1] and the Baz and Das[2]. Conventionally, financial variables such as stock prices, foreign exchange rates, and interest rates are assumed to follow a diffusion processes with continuous paths when pricing financial assets. Also, Heath, Jarrow and Morton(HJM)[6] is widely accepted as the most general methodology for term structure of interest rate models. In pricing and hedging with financial derivatives, jump-diffusion models are particularly important, since ignoring jumps in financial prices will cause pricing and hedging rates. Term structure model solutions under jump-diffusions
Corresponding author.
M. Gavrilova et al. (Eds.): ICCSA 2006, LNCS 3982, pp. 1054–1063, 2006. c Springer-Verlag Berlin Heidelberg 2006
Stochastic Simulation Method for the Term Structure Models with Jump
1055
are justified because movements in interest rates display both continuous and discontinuous behavior. These jumps are caused by several market phenomena money market interventions by the Fed, news surprise, and shocks in the foreign exchange markets, and so on. We study a solution of the bond pricing for the term structure models with jump. The term structure models with jump which allows the short term interest rate, the forward rate, the follow a random walk. We compare between the term structure model of interest rate with jump and the HJM model based on jump. We introduce the Monte Carlo simulation. One of the many uses of Monte Carlo simulation by financial engineers is to place a value on financial derivatives. Interest in use of Monte Carlo simulation for bond pricing is increasing because of the flexibility of the methods in handling complex financial instruments. One measure of the sharpness of the point estimate of the mean is Mean Standard Error(MSE). For the term structure models with jump, we study bond prices by the Monte Carlo simulation. Numerical methods that are known as Monte Carlo methods can be loosely described as statistical simulation methods, where statistical simulation is defined in quite general terms to be any method that utilizes sequences of random numbers to perform the simulation. The structure of the remainder of this paper is as follows. In Section 2, the basic of bond prices with jump are introduced. In Section 3, the term structure models in jump are presented. In Section 4, we calculate numerical solutions using Monte Carlo simulation for the term structure models with jump. In Section 5, we investigate bond prices given for the eight models using the Vasicek and CIR models. Conclusions are in Section 6.
2 2.1
Preliminaries for the Bond Prices Stochastic Differential Equation with Jump
We will first recall some notations. All our models will be set up in a given complete probability space (Ω, Ft , P ) and an argumented filtration (Ft )t≥0 generated by a Winear process W (t) in . We will ignore taxes and transaction costs. We denote by V (r, r, T ) the price at time t of a discount bond. It follows immediately that V (r, T, T ) = 1. Now consider a quite different type of random environment. Suppose π(t) represents the total number of extreme shocks that occur in a financial market until time t. The time dependence can arise from the cyclical nature of the economy, expectations concerning the future impact of monetary policies, and expected trends in other macroeconomic variables. In the same way that a model for the asset price is proposed as a lognormal random walk, let us suppose that the interest rate r and the forward rate is governed by a Stochastic differential equation(SDE) of the form dr = u(r, t)dt + ω(r, t)dW Q + Jdπ
(1)
df (t, T ) = μf (t, T )dt + σf (t, T )dW Q (t) + Jdπ.
(2)
and
1056
K. Park, M. Kim, and S. Kim
where ω(r, t) is the instantaneous volatility, u(r, t) is the instantaneous drift, μf (t, T ) represents drift function, σ 2 fi (t, T ) is volatility coefficients, and jump size J is normal variable with mean μ and standard deviation γ. 2.2
The Zero-Coupon Bond Pricing Equation
When interest rates follow the SDE(1), a bond has a price of the form V (r, t); the dependence on T will only be made explicit when necessary. Pricing a bond is technically harder than pricing an option, since there is no underlying assert with which be hedge. We set up a portfolio containing two bonds with different maturities T1 , T2 . The bond with maturity T1 has price V1 , and the bond with maturity T2 has price V2 . Thus, the riskless portfolio is Π = V1 − ΔV2 .
(3)
And then we applied he jump-diffusion version of Ito’s lemma. Hence we derive the the partial differential bond pricing equation. Theorem 1. If r satisfy Stochastic differential equation dr = u(r, t)dt + ω(r, t) dW Q + Jdπ then the zero-coupon bond pricing equation in jumps is ∂V 1 ∂2V ∂V + ω 2 2 + (u − λω) − rV + hE[V (r + J, t) − V (r, t)] = 0 ∂t 2 ∂r ∂r
(4)
where λ(r, t) is the market price of risk. The final condition corresponds to the payoff on maturity and so V (r, T, T ) = 1. Boundary conditions depend on the form of u(r, t) and ω(r, t).
3
Bond Pricing Models with Jump
The dependence of the yield curve on the time to maturity, T − t, is called the term structure of interest rates. It is common experience from market data that yield curve typically come in three distinct shapes, each associated with different economic conditions. A wide variety of yield curves can be predicted by the model, including Increasing, decreasing, and humped. Now we consider the term structure models with jump. 3.1
Jump-Diffusion Version of Extended Vasicek’s Model
The time dependence can arise from the cyclical nature of the economy, expectations concerning the future impact of monetary policies, and expected trends in other macroeconomic variables. In this study, we extend the jump-diffusion version of equilibrium single factor model to reflect this time dependence. We proposed the mean reverting process for interest rate r is given by dr(t) = [θ(t) − a(t)r(t)]dt + σ(t)dW Q (t) + Jdπ(t)
(5)
We will assume that the market price of interest rate diffusion risk is a function of time, λ(t). Let us assume that jump risk is diversifiable. From equation (4)
Stochastic Simulation Method for the Term Structure Models with Jump
1057
with the drift coefficient u(r, t) = θ(t) − a(t)r(t) and the volatility coefficient ω(r, t) = σ(t), we get the partial differential difference bond pricing equation: 1 2 [θ(t) − a(t)r(t) − λ(t)σ(t)]Vr + Vt + σ(t) Vrr − rV 2 1 2 + hV [−μA(t, T ) + (γ 2 + μ2 )A(t, T ) ] = 0. 2
(6)
Then the yield on zero-coupon bond price expiring T − t periods hence is given by: Y (r, t, T ) = −
ln V (r, t, T ) T −t
(7)
is defined the entries term structure of interest rates. The price of a discount bond that pays off $ 1 at time T is the solution to (6) that satisfies the boundary condition V (r, T, T ) = 1. A solution of the form: V (r, t, T ) = exp[−A(t, T )r + B(t, T )]
(8)
can be guessed. Bond price derivatives can be calculated from (7). We omit the details, but the substitution of this derivatives into (6) and equating powers of r yields the following equations for A and B. Theorem 2 −
∂A + a(t)A − 1 = 0 ∂t
(9)
and ∂B 1 1 2 − φ(t)A + σ(t) A2 + h[−μA + (γ 2 + μ2 )A2 ] = 0, ∂t 2 2
(10)
where, φ(t) = θ(t) − λ(t)σ(t) and all coefficients is constants. In order to satisfy the final data that V (r, T, T ) = 1 we must have A(T, T ) = 0 and B(T, T ) = 0. 3.2
Jump-Diffusion Version of Extended CIR Model
We proposed the mean reverting process for interest rate r is given by dr(t) = [θ(t) − a(t)r(t)]dt + σ(t) r(t)dW Q (t) + Jdπ(t)
(11)
We will assume that the market price of interest rate diffusion risk is a function of time, λ(t) r(t). Let us assume that jump risk is diversifiable. In jump-diffusion version of extended Vasicek’s model the short-term interest rate, r, to be negative. If Jump-diffusion version of extended CIR model is proposed, then rates are always non-negative. This has the same mean-reverting drift as jump-diffusion version of extended Vasicek’s model, but the standard deviation is proportional to r(t). This means that its standard deviation increases
1058
K. Park, M. Kim, and S. Kim
when the short-term interest rate increases. From equation (4) with the drift coefficient u(r, t) = θ(t) − a(t)r(t) and the volatility coefficient ω(r, t) = σ(t) r(t), we get the partial differential bond pricing equation: 1 2 [θ(t) − a(t)r(t) − λ(t)σ(t)r(t)]Vr + Vt + σ(t) r(t)Vrr − rV 2 1 2 + hV [−μA(t, T ) + (γ 2 + μ2 )A(t, T ) ] = 0. 2
(12)
Bond price partial derivatives can be calculated from (12). We omit the details, but the substitution of this derivatives into (6) and equating powers of r yields the following equations for A and B. Theorem 3 −
∂A 1 2 + ψ(t)A + σ(t) A2 − 1 = 0 ∂t 2
(13)
and ∂B 1 − (θ(t) + hμ)A + h[(γ 2 + μ2 )A2 ] = 0, ∂t 2
(14)
where, ψ(t) = a(t) + λ(t)σ(t) and all coefficients is constants. In order to satisfy the final data that V (r, T, T ) = 1 we must have A(T, T ) = 0 and B(T, T ) = 0. Proof. In equations (12) and (13), by using the solution of this Ricatti’s equation formula we have A(t, T ) =
with ω(t) =
2(eω(t)(T −t) − 1) (ω(t) + ψ(t))(eω(t)(T −t) − 1) + 2ω(t)
(15)
ψ(t)2 + 2σ(t). Similarly way, we have B(t, T ) = t
T
1 −(θ(t) + hμ)A + h(γ 2 + μ2 )A2 dt . 2
(16)
These equation yields the exact bond prices in the problem at hand. Equation (16) can be solved numerically for B. Since (15) gives the value for A, bond prices immediately follow from equation (6). 3.3
Heath-Jarrow-Merton(HJM) Model with Jump
The HJM consider forward rates rather than bond prices as their basic building blocks. Although their model is not explicitly derived in an equilibrium model, the HJM model is a model that explains the whole term structure dynamics in a no-arbitrage model in the spirit of Harrison and Kreps[?], and it is fully compatible with an equilibrium model. If there is one jump during the period [t, t + dt] then dπ(t) = 1, and dπ(t) = 0 represents no jump during that period.
Stochastic Simulation Method for the Term Structure Models with Jump
1059
We know that the zero coupon bond prices are contained in the forward rate informations, as bond prices can be written down by integrating over the forward rate between t and T in terms of the risk-neutral process T
V (t, T ) = exp −
f (t, s)ds .
(17)
t
As we mentioned already, a given model in the HJM model with jump will result in a particular behavior for the short term interest rate. We introduce relation between the short rate process and the forward rate process as follows. In this study, we jump-diffusion version of Hull and White model to reflect this restriction condition. We know the following model for the interest rate r; dr(t) = a(t)[θ(t)/a(t) − r(t)]dt + σr (t)r(t)β dW Q (t) + Jdπ(t),
(18)
where, θ(t) is a time-dependent drift; σr (t) is the volatility factor;a(t) is the reversion rate; dW (t) is standard Wiener process; dπ(t) represents the Poisson process. Theorem 4. Let be the jump-diffusion process in short rate r(t) is the equation (18). Let be the volatility form is σf (t, T ) = σr (t)( r(t))β η(t, T ) (19)
T with η(t, T ) = exp − t a(s)ds is deterministic functions. We know the jumpdiffusion process in short rate model and the ”corresponding” compatible HJM model with jump df (t, T ) = μf (t, T )dt + σf (t, T )dW Q (t) + Jdπ(t) (20)
T where μf (t, T ) = σf (t, T ) t σf (t, s)ds. Then we obtain the equivalent model is T f (0, T ) = r(0)η(0, T ) + θ(t)η(s, T )ds 0 T T 2 β 2 − σr (s)(r(s) ) η(s, T ) (η(s, u)du)ds (21) 0
s
that is, all forward rates are normally distributed. Note that we know that β = 0 case is an extension of Vasicek’s jump diffusion model; the β = 0.5 case is an extension of CIR jump diffusion model. Note that the forward rates are normally distributed, which means that the bond prices are log-normally distributed. Both the short term rate and the forward rates can become negative. As above, we obtain the bond price from the theorem 1. By the theorem 2, we drive the relation between the short rate and forward rate. Corollary 1. Let be the HJM model with jump of the term structure of interest rate is the stochastic differential equation for forward rate f (t, T ) is given by
1060
K. Park, M. Kim, and S. Kim
T
σf (t, s)dsdt + σf (t, T )dW Q (t) + Jdπ(t)
df (t, T ) = σf (t, T )
(22)
t
where, dW Q i is the Wiener process generated by an equivalent martingale measure T β Q and σf (t, T ) = σr (t)( r(t)) exp − t a(s)ds . Then the discount bond price V (t, T ) for the forward rate is given by the formula T 2 t σ (t, s)ds V (0, T ) 1 f t V (t, T ) = exp{− σf2 (s, t)ds V (0, t) 2 σf (t, T ) 0
T σf (t, s)ds − t [f (0, t) − r(t)]} σf (t, T ) with the equation (21). Note that we know that β = 0 case is an extension of Vasicek’s jump diffusion model; the β = 0.5 case is an extension of CIR jump diffusion model.
4
Monte Carlo Simulation of the Term Structure Models with Jump
By and application of Girsanov’s theorem the dependence on the market price of risk can be absorbed into an equivalent martingale measure. Let W (t), 0 ≤ t ≤ T , be a Wienear process on a probability space (Ω, F, P ). Let λ(t), 0 ≤ t ≤ T , be a process adapted to this filtration. The Wiener processes dW Q (t) under the
t equivalent martingale measure Q are given by W Q (t) = W (t)+ 0 λ(s)ds so that dW Q i (t) = dW i (t) + λi (t)ds. A risk-neutral measureQ is any probability measure, equivalent to the market measure P , which makes all discounted bond prices martingales. We now move on to discuss Monte Carlo simulation. A Monte Carlo simulation of a stochastic process is a procedure for sampling random outcomes for the process. This uses the risk-neutral valuation result. The bond price can be expressed as: ÊT
ÊT
V (rt , t, T ) = EtQ e− t rs ds |r(t) or V (ft , t, T ) = EtQ e− t f (t,s)ds (23) where EtQ is the expectations operator with respect to the equivalent risk-neutral measure. Under the equivalent risk-neutral measure, the local expectation hypothesis holds(that is, EtQ dV V ).To execute the Monte Carlo simulation, we discretize the equations (5) and (12). we divide the time interval [t, T ] into m equal time steps of length Δt each. For small time steps, we are entitled to use the discretized version of the risk-adjusted stochastic differential equations (5), (11), and (22):
Stochastic Simulation Method for the Term Structure Models with Jump
and
1061
rj = rj−1 + [(θ √ · t) − (a · t)rj−1 · t − (λ · t)(σ · t)]Δt + (σ · t)εj Δt + Jj NΔt ,
(24)
rj = rj−1 + [(θ · t) − (a · t)rj−1 − (λ · t)(σ · t) rj−1 · t]Δt √ + (σ · t) rj−1 · t εj Δt + Jj NΔt
(25)
fj = fj−1 + σf (t, T )
T
√ σf (t, s)dsdt Δt + σf (t, T )εj Δt + Jj NΔt
(26)
t
T where σf (t, T ) = σr (t)( r(t))β exp − t a(s)ds , j = 1, 2, ···, m, εj is standard normal variable with εj ∼ N (0, 1), and NΔt is a Poisson random variable with parameter hΔt. We know that β = 0 case is an extension of Vasicek’s jump diffusion model; the β = 0.5 case is an extension of CIR jump diffusion model. We can investigate the value of the bond by sampling n spot rate paths under the discrete process approximation of the risk-adjusted processes of the equations (24), (25), and (26). The bond price estimate is given by: ⎛ ⎞ ⎛ ⎞ n m−1 n m−1 1 1 V (rt , t, T ) = exp ⎝− rij Δt⎠ or V (ft , t, T ) = exp⎝− fij Δt⎠, n i=1 n j=0 i=1 j=0 where rij is the value of the short rate and fij is the value of the forward rate under the discrete risk-adjusted process within sample path i at time t + Δt. Numerical methods that are known as Monte Carlo methods can be loosely described as statistical simulation methods, where statistical simulation is defined in quite general terms to be any method that utilizes sequences of random numbers to perform the simulation. The Monte Carlo simulation is clearly less efficient computationally than the numerical method. One measure of the sharpness of the point estimate of the mean is MSE, defined as √ M SE = ν/ n (27) where, ν 2 is the estimate of the variance of bond prices as obtained from n sample paths of the short rate:
n m−1 exp − f Δt − ν ij i=1 j=0 ν2 = . (28) n−1 This reduces the MSE by increasing the value of n. However, highly precise estimates with the brute force method can take a long time to achieve. For the purpose of simulation, we conduct three runs of 1,000 trials each and divide the year into 365 time steps.
5
Experiments
In this section, we investigate the jump-diffusion version of extended Vasicek and CIR model and HJM model with jump. Experiments are consist of the
1062
K. Park, M. Kim, and S. Kim
(a) Bond prices based on extended Vasicek model
(b) Bond prices based on extended CIR model
Fig. 1. The various bond prices for the term structure models with jump
numerical method and Monte Carlo simulation. Experiment 1, 2 plot estimated term structure using the various models. In experiment 1, 2, the parameter values are assumed to be r = 0.05, a = 0.5, b = 0.05, θ = 0.025, σ = 0.08, σr = 0.08, λ = −0.5, γ = 0.01, μ = 0, h = 10, t = 0.05, and T = 20. Experiment 3, 4 examine bond prices by the Monte Carlo simulation. In experiment 3, 4, the parameter values are assumed to be r = 0.05, f [0, t] = 0.049875878, σr = 0.08, a = 0.5, b = 0.05, θ = 0.025, σ = 0.08, λ = −0.5, Δt = (T − t)/m, m = 365, n = 1000, γ = 0.01, μ = 0, h = 10, t = 0.05, and T = 20. J-Vasicek J-E Vasicek HJM-E Vasicek J-HJM-E Vasicek CFS 0.93596 0.953704 0.954902 0.954902 MCS 0.933911 0.95031 0.951451 0.951722 Diff(CFS-MCS) 0.00150833 0.0002868 5.03495E-06 0.000319 Variance 0.00122814 0.00053554 7.09574E-05 0.00178619 MSE 0.000933 0.00286 0.00205 0.003394 Experiment 3. Bond prices estimated by the Monte Carlo simulation for the jump-diffusion and HJM model with jump based on Vasicek model. J-CIR J-E CIR HJM-E CIR J-HJM-E CIR CFS 0.942005 0.953478 0.95491 0.95491 MCS 0.947482 0.951688 0.951456 0.950456 Diff(CFS-MCS) 0.0002863 0.00030634 1.2766E-06 0.0002897 Variance 0.000535 0.0005535 0.000113 0.001702 MSE -0.005478 0.00179042 0.00345374 0.00444414 Experiment 4. Bond prices estimated by the Monte Carlo simulation for the jump-diffusion and HJM model with jump based on CIR model.
Stochastic Simulation Method for the Term Structure Models with Jump
6
1063
Conclusion
Even though Monte Carlo simulation is both harder and conceptually more difficult to implement than the other numerical methods, interest in use of Monte Carlo simulation for bond pricing is getting stronger because of its flexibility in evaluating and handling complicated financial instruments. However, it takes a long time to achieve highly precise estimates with the brute force method. In this paper we investigate bond pricing models and their Monte Carlo simulations with several scenarios. The bond price is humped in the jump versions of the extended Vasicek and CIR models while the bond prices are decreasing functions of the maturity in HJM models with jump.
References 1. C. Ahn and H. Thompson, “Jump-Diffusion Processes and the Term Structure of Interest Rates,” Journal of Finance, vol. 43, pp. 155-174, 1998. 2. J. Baz and S. R.Das, “Analytical Approximations of the Term Structure for JumpDiffusion Processes : A Numerical Analysis,” Journal of Fixed Income, vol. 6(1), pp. 78-86, 1996. 3. D. Beaglehole and M. Tenney, “Corrections and Additions to ’A Nonlinear Equilibrium Model of the Term Structure of Interest Rates’,” Journal of Financial Economics, vol. 32, pp. 345-353, 1992. 4. E. Briys, “Options, Futures and Exotic Derivatives,” John Wiley, 1985. 5. J. C. Cox, J. Ingersoll, and S. Ross, “A Theory of the Term Structure of Interest Rate,” Econometrica, vol. 53, pp. 385-407, 1985. 6. D. Health., R. Jarrow, and A. Morton, “Bond Pricing and the Term Structure of Interest Rates,” Econometrica, vol. 60. no. 1, pp. 77-105, 1992. 7. T. S. Ho and S. Lee, “ Term Structure Movements and Pricing Interest Rate Contingent Claims,” Journal of Finance, vol. 41, pp. 1011-1028, 1986. 8. F. Jamshidian, “An Exact Bond Option Formula,” Journal of Finance, vol. 44, 1989. 9. J. Frank, and CFA. Fabozzi, “Bond markets Analysis and strategies,” Fourth Edition, 2000. 10. J. Hull and A. White, “Pricing Interest Rate Derivative Securities,” Review of Financial Studies, vol. 3, pp. 573-92, 1990. 11. J. Hull and A. White, “Options, Futures, and Derivatives,” Fourth Edition, 2000. 12. F. A. Longstaff and S. Schwartz, “Interest Rate Volatility and the Term structure: A Two-Factor General Equilibrium Model,” Journal of Finance, vol. 47, pp. 12591282, 1992. 13. M. J. Brennan and E. S. Schwartz, “A Continuous Time Approach to the Pricing of Bonds,” Journal of Banking and Finance, vol. 3, pp. 133-155, 1979. 14. J. Strikwerda, “Finite Difference Schmes and Partial Differential Equations,” Wadsworth and Brooks/Cole Advanced Books and Software, 1989. 15. J. W. Drosen, “Pure jump shock models in reliability,” Adv. Appl. Probal. vol. 18, pp. 423-440, 1986. 16. O. A. Vasicek, “An Equilibrium Characterization of the Term Structure,” Journal of Financial Economics, vol. 5, pp. 177-188, 1977.
The Ellipsoidal lp Norm Obnoxious Facility Location Problem Yu Xia The Institute of Statistical Mathematics, 4-6-7 Minami-Azabu, Minato-ku, Tokyo 106-8569, Japan [email protected]
Abstract. We consider locating an obnoxious facility. We use the weighted ellipsoidal lp norm to accurately measure the distance. We derive the necessary and sufficient conditions for the local optimality and transform the optimality conditions into a system of nonlinear equations. We use Newton’s method with perturbed nonmonotone line search to solve the equations. Some numerical experiments are presented.
1
Introduction
We consider the sitting of an obnoxious facility. Some facilities, such as chemical plants, power plants, air ports, train stations, nuclear reactors, waste dumps, emanate chemical or nuclear pollutants, heat, noise, or magnetic waves. The residents in the region where the obnoxious facility is to be located desire that this facility to be sit as far away as possible from them. The obnoxiousness of the facility is usually an inverse factor of the distance to it. In this paper, we use the maximum weighted sum of distances as the objective. The weights carry the relative importance of existing residence sites. The dispersion of pollutants is usually affected by meteorology, ground morphology. For instance, hills, tall buildings, rivers, and greens may affect the velocities of winds in different directions in different areas. These effects direct the pollutants to travel in certain paths. As well, forests and mountains may damp the pollution effect. In a series of papers ([3, 4], etc.), it is argued with empirical study that lp distances weighted by an inflation factor tailored to given regions can better describe the irregularity in the transportation networks such as hills, bends, and is therefore superior to the weighted rectangular and Euclidean norms. This is the same case for pollution distance measure. The impact of the trees, hills, buildings between the pollution source and the residents need to be considered. The curvature of the pollution spread path caused by some geometrical structures can be better described through the rotation of the axes of the coordinates for the residence locations and a proper choice of p. The damping
Research supported by a postdoctoral fellowship for foreign researchers from JSPS (Japan Society for the Promotion of Science). I thank suggestions and comments of anoymous referees.
M. Gavrilova et al. (Eds.): ICCSA 2006, LNCS 3982, pp. 1064–1072, 2006. c Springer-Verlag Berlin Heidelberg 2006
The Ellipsoidal lp Norm Obnoxious Facility Location Problem
1065
factors, such as trees, mountains, can be modeled by the scaling of corresponding axes of the coordinates. Denote the lp norm (Minkowski distance of order p, p > 1) of a d-dimensional Euclidean space Rd as d 1/p def lp (z) = |zi |p . i=1
In this paper, we measure the distances between sites by an ellipsoidal lp norm distance: def lpM (z) = lp (M z) , where M is a linear transformation. It is not hard to see that lpM is a norm when M is nonsingular. Note that the lpM distance is the lp norm when M is the identity. And it includes Euclidean distance (l2 -norm distance), Manhattan distance (l1 norm distance), Chebyshev distance (l∞ norm distance). It also includes the lpb norm distance (see, for instance [2]): n 1/p def p lpb (z) = bi |xi | , bi > 0 (i = 1, . . . , n) . i=1
Obviously the lpM distance measure can better describe the actual transportation networks than the weighted lp distance or lpb distance can. The p and M may be different for different existing facilities. Let vectors f1 , . . . , fn represent the n existing residence sites. Let vector x denote the site where the new facility to be located. The distances from the obnoxious facility to some essential residence sites fj (j = 1, . . . , s) such as hospitals, schools, tourism spots, should be above some thresholds rj , due to some legal or environmental considerations. Some of the essential residence sites fj (j = 1, . . . , s) may or may not be the same as the existing sites fi (i = 1, . . . , n) included in the maxisum objective. Let wi denote the weight on location fi (i = 1, . . . , n), which is associated with the number and importance of residents there. Of some type of residents, such as patients, children, the weights might be higher than those of others. The model we are considering is the following. max x
n
wi Mi (x − fi )pi
(1a)
i=1
s.t. Ax ≤ b Mn+j (x − fn+j )pn+j ≥ rj
(1b) (j = 1, . . . , s)
(1c)
The objective is convex. Therefore, (1) is not easy to solve, since it is a maximization problem. In addition, the constraints are not convex. Next, we consider its dual and optimality conditions. Our aim is to reformulate the optimality conditions into a system of equations which can then be solved by Newton’s method. The remaining of the paper is organized as follows. In §2, we derive the optimality conditions for (1) and then reformulate the optimality conditions into
1066
Y. Xia
a system of equations. In §3, we present our Newton’s method with perturbed nonmonotone line search algorithm. In §4, we give some numerical examples of our algorithm.
2
The Optimality Conditions
In this part, we consider the Lagrangian dual of (1) and derive its optimality conditions. For each pi ≥ 0, we define a scalar pi satisfying 1 1 + =1. pi qi By H¨older’s inequality, max wi Mi (x − fi )pi = x
min
(zi q ≥wi )
max zTi Mi (x − fi ) ,
i
x
with zi qi = wi , Mi (x − fi )ppii |zil |qi = wiqi |Mil (x − fi )|pi , sign(zil ) = sign [Mil (x − fi )] when Mi (x − fi ) = 0; and zi qi ≥ wi when Mi (x − fi ) = 0. Hence, we have that the dual to (1) is the following. n
min
λ≥0,η≥0
x
s
wi Mi (x − fi )pi − λT (Ax − b) +
max i=1
− rj =
min
λ≥0,η≥0
min
zi qi ≥wi (i=1,...,n) zj qj ≥ηj (j=n+1,...,n+s)
s
s
zTj+n Mj+n (x
+ j=1
− fj+n ) −
max x
ηj rj = j=1
n+s
zTi Mi i=1
j=1 n
zTi Mi (x − fi ) − λT (Ax − b) i=1
min
λ≥0,η≥0
min
zi qi ≥wi (i=1,...,n) zj qj ≥ηj (j=n+1,...,n+s)
n+s
−λ A x− T
ηj Mn+j (x − fn+j )pn+j
s
zTi Mi fi i=1
+λ b− T
max x
ηj rj . j=1
T T In the above expression, n+s i=1 zi Mi = λ A; otherwise, the inner maximization would be unbounded. Therefore, the dual to (1) is s T T minλ,η,zi − n+s i=1 zi Mi fi + λ b − j=1 ηj rj n+s T T s.t. M z − A λ = 0 i i i=1 zi qi ≥ wi (i = 1, . . . , n) (2) zj qj ≥ ηj (j = n + 1, . . . , n + s) λ≥0 η≥0 In addition, we have zi qi = wi , Mi (x − fi )ppii |zil |qi = wiqi |Mil (x − fi )|pi , sign(zil ) = sign [Mil (x − fi )] when Mi (x − fi ) = 0 for i = 1, . . . , n. And zj qj =
The Ellipsoidal lp Norm Obnoxious Facility Location Problem p
1067
q
ηj , Mj (x−fj )pjj |zjl |qj = ηj j |Mjl (x−fj )|pj , sign(zjl ) = sign [Mjl (x − fj )] when Mi(x − fj ) = 0 for j = 1, . . . , s. We also have λi (Ai x − bi ) = 0 (i = 1, . . . , m), ηj Mn+j (x − fn+j )pn+j − rj = 0, (j = 1, . . . , s). The new facility is required not to be located in existing residence sites. Then the objective is differentiable at optimum. Therefore, the reverse convex constraint qualification is satisfied. Thus, there is no duality gap between (1) and (2) (see [5]). We conclude that the necessary and sufficient local optimality conditions for (1) are: n+s
ηj−n
MiT zi − AT λ = 0 ,
i=1 pj rj−n
− Mj (x − fj )ppjj ηj ≥ 0
=0
(j = n + 1, . . . , n + s) ,
(j = 1, . . . , s) ,
Mn+j (x − fn+j )pn+j ≥ rj
(j = 1, . . . , s) ,
λi (bi − Ai x) = 0 (i = 1, . . . , m) , λi ≥ 0 (i = 1, . . . , m) , Ai x ≤ bi pi
αi |zil | = |Mil (x − fi )| p αj |zjl |qj = ηj−n |Mjl (x − fj )| j qi
(i = 1, . . . , m) , (i = 1, . . . , n; l = 1, . . . , di ) , (j = 1 + n, . . . , n + s; l = 1, . . . , dj ) ,
sign(zil ) = sign [Mil (x − fi )] (i = 1, . . . , n + s; l = 1, . . . , di ) , αi (wi − zil pi ) = 0 (i = 1, . . . , n) , zi qi ≥ wi (i = 1, . . . , n) , αn+j (ηj − zn+j pn+j ) = 0 (j = 1, . . . , s) , zn+j qn+j ≥ ηj
(j = 1, . . . , s) .
We define sign(0) = sign(a) to be true for all a ∈ R. Note that the above system is not easy to solve, because it includes some inequalities. We use some nonlinear complementarity functions to reformulate the complementarity. Specially, we use the min function, since it is the simplest. Observe that min(a, b) = 0 iff a, b ≥ 0 and at least one of a and b is 0. We assume that fi (i = 1, . . . , n + s) doesn’t satisfy Ax ≤ b. This means that the region where the obnoxious facility to be sit doesn’t include existing residence sites. In the formulation, we also distinguish between pi ≥ 2 and pi < 2 to avoid some nondifferentiable points. From p1i + q1i = 1, (i = 1, . . . , n + s), we have pi ≥ 2 ⇒ qi ≤ 2, pi ≤ 2 ⇒ qi ≥ 2 ; pi 1 qi 1 = pi − 1 = , = qi − 1 = . qi qi − 1 pi pi − 1
1068
Y. Xia
We then transform the optimality conditions into the following system of equations. n+s
MiT zi − AT λ = 0
(3a)
i=1
pj min ηj−n , Mj (x − fj )ppjj − rj−n = 0 (j = n + 1, . . . , n + s) ,
di
min (λi , bi − Ai x) = 0 (i = 1, . . . , m) ,
q1 i
|Mil (x − fi )|pi
di
p1
i
|Mil (x − fi )|pi
qi
qi
|zil | pi sign(zil ) − wipi Mil (x − fi ) = 0
(i ∈ {1, . . . , n}, pi < 2; l = 1, . . . , di ) ⎛ ⎞ q1 j dj pj pj ⎠ ⎝ |Mjl (x − fj )| zjl − ηj−n |Mjl (x − fj )| qj sign (Mjl (x − fj )) = 0 l=1
⎝
dj
(3d)
(i ∈ {1, . . . , n}, pi ≥ 2; l = 1, . . . , di )
l=1
⎛
(3c)
pi
zil − wi |Mil (x − fi )| qi sign (Mil (x − fi )) = 0
l=1
(3b)
(3e)
(3f)
(j ∈ {n + 1, . . . , n + s}, pj ≥ 2; l = 1, . . . , dj ) ⎞ p1 j
|Mjl (x − fj )|
pj ⎠
|zjl |
qj pj
qj pj
sign(zjl ) − ηj−n Mjl (x − fj ) = 0
l=1
(3g)
(j ∈ {n + 1, . . . , n + s}, pj < 2; l = 1, . . . , dj )
3
The Algorithm def
Let F represent the left-hand-side of (3). Let Ψ = F 2F . Any global optimization method that locates a global minimal solution to Ψ finds a local solution to (1). Unfortunately, Ψ is not differentiable everywhere. We then use the gradient decent method by perturbed nonmonotone line search [6] to skip the nonsmooth points to find a global minimum of Ψ .. def Denote u = (x; λ; η; z). Let Δ u represent the Newton’s direction to (3). Below is the algorithm. The Algorithm Initialization. Set constants s > 0, 0 < σ < 1, β ∈ (0, 1), γ ∈ (β, 1), nml ≥ 1. For each k ≥ 0, assume Ψ is differentiable at uk . Set k = 0. k+1 k Do while. F ∞ ≥ opt, u − u ∞ ≥ steptol, and k ≤ itlimit. 1. Find the Newton’s direction for (3): Δ uk . 2. (a) Set αk,0 = s, i = 0.
The Ellipsoidal lp Norm Obnoxious Facility Location Problem
1069
(b) Find the smallest nonnegative integer l for which Ψ (uk ) − Ψ uk + β l αk,i Δ uk ≥0≤j≤m(k) −σβ l αk,i ∇ Ψ (uj )T Δ uk . where m(0) = 0 and 0 ≤ m(k) − 1) + 1, nml]. k ≤ min[m(k l k,i k (c) If Ψ is nondifferentiable at u + β α Δ u , find t ∈ [γ, 1) so that k l k,i k Ψ is differentiable at u + tβ α Δ u , set αk,i+1 = tβ l αk,i , i + 1 → i, go to step 2b. Otherwise, set αk = β l αk,i , uk+1 = uk + αk Δ u, k + 1 → k.
4
Numerical Experiments
We adopt the suggested parameters in [1]. The machine accuracy of the computer running the code is = 2.2204e−16. Our computer program stops either F ∞ < opt = 1/3 = 6.0555e − 5, or the infinity norm of the Newton’s direction is less than steptol = 2/3 ; or the number of iterations exceeds itlimit = 100. We set s = 1, β = 12 , σ = 1.0e − 4, nml = 10. Below is an example with 10 existing facilities and 2 essential facilities from which the obnoxious facility must be away for some minimal distance. The 10 existing facilities are: f1 f3 f5 f7 f9
= (0.8351697, 0.9708701, 0.4337257), f2 = (0.5029927, 0.4754272, 0.7399495), = (0.9887191, 0.4000630, 0.6804281), f4 = (0.9093165, 0.5075840, 0.8947370), = (0.4425517, 0.3319756, 0.0674839), f6 = (0.5963061, 0.1664579, 0.1948914), = (0.7186201, 0.7451580, 0.5479852), f8 = (0.4763083, 0.9978569, 0.5943900), = (0.4766319, 0.0927731, 0.4870974), f10 = (0.1512887, 0.2611954, 0.6834821). The 2 essential facilities are
f11 = (0.5290907, 0.1980252, 0.1012210), f12 = (0.5800074, 0.8072991, 0.1645741). The weights in the objective are w = (0.1093683, 0.2339055, 0.5295364, 0.2302270, 0.4429267, 0.3831922, 0.1667756, 0.7351673, 0.1278835, 0.4373618). The p for the ellipsoidal norms are p = (2.1, 2.3, 1.5, 3.1, 1.9, 1.9, 1.9, 1.9, 1.9, 1.7, 3.3, 1.9). The linear transformation matrices for the ellipsoidal norms are: ⎡ ⎤ ⎡ ⎤ 0.3692412 0.7397360 0.3671468 0.5969909 0.1652245 0.9320327 M1 = ⎣0.3451773 0.5917062 0.7667686⎦ M2 = ⎣0.1077514 0.0023804 0.9944936⎦ ⎡0.7153005 0.3233307 0.5691886⎤ ⎡0.9073230 0.5631396 0.9684730⎤ 0.6143491 0.1460124 0.7866041 0.4259558 0.1795297 0.3595944 M3 = ⎣0.0613802 0.7311101 0.6258767⎦ M4 = ⎣0.0765066 0.1897343 0.1458442⎦ 0.5635103 0.9968714 0.8857684 0.4611495 0.7927964 0.3008546
1070
Y. Xia
⎡ ⎤ ⎡ ⎤ 0.8462288 0.3543530 0.9533479 0.6154476 0.0826227 0.3935258 M5 = ⎣0.8034236 0.5255093 0.0520969⎦ M6 = ⎣0.8348387 0.2062091 0.8198316⎦ 0.0296419 0.9122241 0.2724550 0.3425691 0.6444498 0.2284179 ⎡ ⎤ ⎡ ⎤ 0.9841331 0.6252488 0.3162361 0.5306548 0.3486466 0.8227931 M7 = ⎣0.9204588 0.0062540 0.2861219⎦ M8 = ⎣0.7473973 0.7774893 0.1455634⎦ ⎡ 0.8010166 0.2107349 0.6545077⎤ ⎡0.2803321 0.3570228 0.5587755⎤ 0.1136328 0.6820894 0.9157958 0.1660018 0.3996167 0.1370780 M9 = ⎣0.7598759 0.9467710 0.9630559⎦ M10 =⎣0.5331381 0.8280759 0.8089569⎦ ⎡ 0.9203295 0.4310774 0.8603421⎤ ⎡ 0.6531517 0.2877654 0.2869363⎤ 0.8655222 0.8695969 0.2533491 0.8813783 0.2787139 0.8239571 M11=⎣0.7964027 0.0003685 0.8457027⎦ M12=⎣0.9325132 0.1975635 0.4511900⎦. 0.1553411 0.6526919 0.8278037 0.5321329 0.7430964 0.5640829 The coefficients for the linear constraints are ⎛ ⎞ 0.0482753 0.4878658 0.0619276 A = ⎝−0.8904399 −0.8477093 −0.4979966⎠ , 0.8629468 0.5048775 0.7739134 b = (0.3484993, −1.0367463, 1.0501067)T . The minimal distances from the essential sites are r = (0.7952982, 0.1178979)T . We start from a zero solution: x = 0, z = 0, λ = 0, η = 0. The iterates are summarized in the figure “Example 1”. In the figure, x-axis represents the iteration number. The blue plot depicts log10 (Ψ ). The green plot depicts log10 (F ∞ ). Example 1
Perturbation of b
2
0 log (Ψ) 10
−2
log (||F|| ) 10
0
∞
−4 −2
−6 −8 digits
digits
−4
−6
−10 −12 −14
−8
log (Ψ)
−16
10
log (||F|| )
−10
10
−18 −12
0
1
2
3
4 iteration
5
6
7
8
−20
0
1
2
∞
3
4 iteration
5
6
7
8
Assume some data need to be modified, due to some previous measure error or the availability of more advanced measuring instruments. We then use the old solution as the starting point to solve the new instances by Newton’s method, since Newton’s method has locally Q-quadratic convergence rate. We perturb b by some random number in (−0.5, 0.5) to b = (0.4528631, −0.9323825, 1.1544705)T .
The Ellipsoidal lp Norm Obnoxious Facility Location Problem
1071
Then we use the perturbed nonmonotone Newton’s method to solve the problem with starting point being the solution to ‘Example 1’. The iterations are summarized in the figure “Perturbation of b”. We also randomly perturb each element of A in the range (−0.5, 0.5), to ⎛ ⎞ −0.0613061 0.3782844 −0.0476538 A = ⎝−1.0000212 −0.9572907 −0.6075779⎠ ; 0.7533654 0.3952961 0.6643320 perturb each weight w in the range (−0.5, 0.5) to w = (0.7894956, 0.9140328, 1.2096637, 0.9103542, 1.1230539, 1.0633194, 0.8469029, 1.4152946, 0.8080108, 1.1174891); perturb each existing sites f randomly in the range (−0.5, 0.5), to f1 = (0.7483169, 0.8840173, 0.3468729)T f2 = (0.4161400, 0.3885744, 0.6530967)T f3 = (0.9018664, 0.3132102, 0.5935753)T f4 = (0.8224637, 0.4207312, 0.8078842)T f5 =(0.3556989, 0.2451229, −0.0193689)T f6=(0.5094533, 0.0796051, 0.1080387)T f7 = (0.6317674, 0.6583052, 0.4611324)T f8 = (0.3894556, 0.9110042, 0.5075373)T f9= (0.3897792, 0.0059204, 0.4002447)T f10 = (0.0644359, 0.1743426, 0.5966293)T f11=(0.4422379, 0.1111724, 0.0143683)T f12=(0.4931546, 0.7204463, 0.0777213)T; perturb each element of the linear matrices for the ellipsoidal norm randomly by a number in (−0.5, 0.5) to ⎡ ⎤ ⎡ ⎤ 0.4750002 0.8454950 0.4729058 0.7027499 0.2709835 1.0377917 M1 = ⎣0.4509363 0.6974652 0.8725276⎦ M2 = ⎣0.2135104 0.1081394 1.1002526⎦ 0.8210595 0.4290897 0.6749476 1.0130820 0.6688986 1.0742320 ⎡ ⎤ ⎡ ⎤ 0.7201081 0.2517714 0.8923631 0.5317148 0.2852887 0.4653534 M3 = ⎣0.1671392 0.8368691 0.7316357⎦ M4 = ⎣0.1822656 0.2954933 0.2516032⎦ 0.6692693 1.1026304 0.9915274 0.5669084 0.8985554 0.4066136 ⎡ ⎤ ⎡ ⎤ 0.9519878 0.4601120 1.0591069 0.7212066 0.1883817 0.4992848 M5 = ⎣0.9091826 0.6312682 0.1578559⎦ M6 = ⎣0.9405977 0.3119681 0.9255906⎦ ⎡0.1354009 1.0179831 0.3782140⎤ ⎡0.4483281 0.7502088 0.3341769⎤ 1.0898921 0.7310078 0.4219951 0.6364138 0.4544056 0.9285521 M7 = ⎣1.0262178 0.1120130 0.3918809⎦ M8 = ⎣0.8531563 0.8832483 0.2513224⎦ ⎡0.9067756 0.3164939 0.7602667⎤ ⎡0.3860911 0.4627818 0.6645345⎤ 0.2193918 0.7878484 1.0215548 0.2717608 0.5053757 0.2428370 M9 = ⎣0.8656349 1.0525300 1.0688149⎦ M10=⎣0.6388970 0.9338349 0.9147159⎦ 1.0260885 0.5368364 0.9661011 0.7589107 0.3935244 0.3926953 ⎡ ⎤ ⎡ ⎤ 0.9712812 0.9753558 0.3591081 0.9871373 0.3844729 0.9297161 M11=⎣0.9021617 0.1061274 0.9514617⎦ M12=⎣1.0382722 0.3033225 0.5569490⎦. 0.2611001 0.7584509 0.9335627 0.6378919 0.8488554 0.6698419 These instances are then solved by the perturbed nonmonotone Newton’s method starting from the solution to ‘Example 1’. The iterations are summarized
1072
Y. Xia
in figures “Perturbation of A”, “Perturbation of w”, “Perturbation of f”, and “Perturbation of M” respectively. Perturbation of A
Perturbation of w
0
0 log (Ψ) 10
log (||F|| )
−2
10
−2
∞
−4 −4
−6
−8
digits
digits
−6
−10
−8
−12 −10 −14
log (Ψ) 10
log (||F|| )
−18
0
1
−12
∞
10
−16
2
3
4 iteration
5
6
7
−14
8
0
1
2
3
Perturbation of f
4 iteration
5
6
7
8
Perturbation of M
−1
0 log (Ψ)
log (Ψ)
10
−2
10
log (||F|| ) 10
log (||F|| )
∞
10
−2
∞
−3 −4
−4
−6 digits
digits
−5 −6
−8
−7 −8
−10
−9 −12 −10 −11
0
0.5
1
1.5
2
2.5 iteration
3
3.5
4
4.5
5
−14
0
0.5
1
1.5 iteration
2
2.5
3
The above instances show the Q-quadratic convergence rate of the Newton’s method. To find a global solution to (1), randomly restarting of Newton’s method for (1) can be used.
References 1. John E. Dennis, Jr. and Robert B. Schnabel. Numerical methods for unconstrained optimization and nonlinear equations. Prentice Hall Series in Computational Mathematics. Prentice Hall Inc., Englewood Cliffs, NJ, 1983. 2. J. Fern´ andez, P. Fern´ andez, and B. Pelegrin. Estimating actual distances by norm functions: a comparison between the lk,p,θ -norm and the lb1 ,b2 ,θ -norm and a study about the selection of the data set. Comput. Oper. Res., 29(6):609–623, 2002. 3. R.F. Love and JG Morris. Modelling inter-city road distances by mathematical functions. Operational Research Quarterly, 23:61–71, 1972. 4. R.F. Love and JG Morris. Mathematical models of road travel distances. Management Science, 25:130–139, 1979. 5. Olvi L. Mangasarian. Nonlinear programming, volume 10 of Classics in Applied Mathematics. Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA, 1994. 6. Yu Xia. An algorithm for perturbed second-order cone programs. Technical Report AdvOl-Report No. 2004/17, McMaster University, 2004.
On the Performance of Recovery Rate Modeling J. Samuel Baixauli1 and Susana Alvarez2 1
Department of Management and Finance, University of Murcia, Spain [email protected] 2 Department of Quantitative Methods for the Economy, University of Murcia, Spain [email protected]
Abstract. To ensure accurate predictions of loss given default it is necessary to test the goodness-of-fit of the recovery rate data to the Beta distribution, assuming that its parameters are unknown. In the presence of unknown parameters, the Cramer-von Mises test statistic is neither asymptotically distribution free nor parameter free. In this paper, we propose to compute approximated critical values with a parametric bootstrap procedure. Some simulations show that the bootstrap procedure works well in practice.
1 Introduction The probability distribution function of recovery rates is generally unknown. Hence, a probability distribution function that matches quite well the shape of the underlying distribution function should be chosen. In credit risk studies a Beta distribution is usually assumed in order to make predictions. The shape of the Beta distribution depends on the choice of its two parameters p and q. A nonparametric goodness-of-fit test based on a Cramer-von Mises type test statistic (CVM) can be carried out in order to test if the Beta distribution describes reasonably well the empirical shape of the recovery rates, assuming that p and q are unknown. However, in presence of unknown parameters the asymptotic distribution of CVM is not distribution free and it is not even asymptotically parameter free. Consequently, the standard tables used for the traditional Cramer-von Mises test statistic are not longer valid. In this paper, we implement the test of the null hypothesis “the distribution of the recovery rates is B(p,q)” versus the alternative “the distribution of the recovery rates is not of this type” using parametric bootstrap methodology and we design simulation experiments to check for the power of such test. The paper is organized as follows. In Section 1 we describe briefly the problem of financial modeling in credit risk management. In Section 2 we present the statistical problem of carrying out nonparametric goodness-of-fit tests in presence of a vector of unknown parameters. In Section 3 we describe a parametric bootstrap procedure to implement the nonparametric goodness-of-fit test introduced in Section 2. We check that the proposed bootstrap procedure works reasonably well in practice by means of some Monte Carlo experiments. In Section 4 we conclude. M. Gavrilova et al. (Eds.): ICCSA 2006, LNCS 3982, pp. 1073 – 1080, 2006. © Springer-Verlag Berlin Heidelberg 2006
1074
J.S. Baixauli and S. Alvarez
2 Financial Modeling Problem in Credit Risk Management Recovery rate (Ri) is the amount that a creditor would receive in final satisfaction of the claims on a defaulted credit. In general, this is a percentage of the debt’s par value, which is the most common practice. Loss given default (LGD) is defined as (1recovery rate). Accurate LGD estimates are fundamental for provisioning reserves for credit losses, calculating risk capital and determining fair pricing for credit risky obligations. Estimates of LGD have usually been done by traditional methodologies of historical averages segmented by debt type (loans, bonds, stocks, etc.) and seniority (secured, senior unsecured, subordinate, etc.). To use the historical average methodology has several drawbacks, mainly because this methodology is characterized by using the same estimate of recovery irrespective of the horizon over which default might occur, what implies to ignore the credit cycle. Additionally, historical average methodology is updated infrequently; thus, new data have a small impact on longer-term averages. Recently, an alternative methodology appears with the purpose to predict LGD, the Moody’s KMV model (MKMV) [6] and [7]. MKMV model uses several explanatory factors to predict LGD and it assumes that the dependent variable of the regression model follows a Beta distribution. However, there is no theoretical reason that this is the correct shape of the dependent variable. MKMV model assumes the Beta distribution because it might be a reasonable description of the behavior of the recovery rates since it is one of the few common “named” distributions that give probability 1 to a finite interval, here taken to be (0,1), corresponding to 100% loss or zero loss. Additionally, it has great flexibility in the sense that it is not restricted to being symmetrical. The Beta distribution is indexed by two parameters and its probability density function (pdf) is given by:
f ( x) =
1 x p −1 (1 − x) q −1 , 0 ≤ x ≤ 1 , p > 0 , q > 0 , B ( p, q )
(1)
where, B(p,q) denotes the beta function, p is the shape parameter and q is the scale parameter. Application of a Beta distribution in LGD models can be found in [5] and [9], between others. As the parameters p and q vary, the Beta distribution takes on different shapes. In this paper, we fix the parameter values in order to represent two scenarios since different recovery rates across securities have been observed [2]. In the first scenario the probability density functions become more concentrated to the left. In the second scenario, the probability density functions become more concentrated to the right. These scenarios correspond to two types of securities: securities with low recovery rate and securities with high recovery rate, respectively. In order to regress the recovery rates on the explanatory variables, MKMV model converts the ‘assumed’ Beta distributed recovery values to a more normally distributed dependent variable using the normal quantile transformation, defined as follows: ~ −1 (2) Dependent variable = R R , p, q)) , i = N ( Beta ( i ˆ ˆ
where, N-1 is the inverse of the normal distribution function.
On the Performance of Recovery Rate Modeling
1075
Nevertheless, the assumption of Beta distribution is questionable since other distribution functions can be considered as descriptive models for the distribution of recovery rates. Hence, the usual assumption of Beta distribution could imply the fact of the recovery rates contained measurement error. Conceptually, in the measurement error case, the dependent variable is mismeasured. On consequence, accurate LGD modeling requires specifying the correct distributional model instead of assuming the Beta distribution systematically.
3 Goodness-of-Fit Tests in Presence of Some Unknown Parameters Let R1, R2,…, Rn be independent and identically (i.i.d.) random variables and let F(x,θ) be a continuous distribution function, known except for an s-dimensional parameter vector θ. We are interested in testing the hypotheses, H0: The distribution of Ri is F(.,θ) versus H1: The distribution of Ri is not F(.,θ). The nonparametric procedure for testing the null hypothesis “the distribution of Ri is F(.,θ)” versus the alternative “the distribution of Ri is not of this type” consists in comparing empirical distribution function with theoretical distribution function, substituting the unknown parameter vector θ by an estimate θˆ . In this context, the CVM test statistic is defined as: W =∫ 2 n
R
[
]
2
n Fn ( x) − F ( x,θˆ) dFn ( x) = ∑ {Fn ( Ri ) − F ( Ri ,θˆ)} , 2
n
(3)
i =1
where Fn(.) is the empirical distribution function based on {Ri }in=1 . The asymptotic distributions of test statistics based on the empirical distribution when parameters are estimated have widely been investigated by [3], [8], [12], [4] and [11], between others. [3] considered the CVM test statistic when the null distribution function depends upon a nuisance parameter that must be estimated. [8] analysed the case when null distribution is the normal distribution with mean and variance both unknown. [12] extended the theory of the CVM test statistic to the case when the null distribution is an arbitrary F(x,θ), being θ a k-dimensional vector of unknown parameters. A comprehensive treatment of the basic theory was given by [4]. [4] studied the weak convergence of a sample distribution function under a given sequence of alternative hypotheses when parameters are estimated from the data. His main result was that, in general, the asymptotic distribution of the test-statistic is not distribution-free on H0 since its asymptotic distribution depends on F. Moreover, it is not even asymptotically parameter-free since this distribution depends in general on the value of θ. Consequently, the standard tables used for the traditional CVM test are not longer valid if some parameters are estimated from the sample. If these critical values are used in presence of nuisance parameters, the results will be conservative in the sense that the probability of a type I error will be smaller than as given by tables of the CVM test statistic. [11] provided percentage points for the asymptotic distribution of the CVM test statistic, for the cases where the distribution tested is (i) normal, with
1076
J.S. Baixauli and S. Alvarez
mean or variance, or both, unknown; and (ii) exponential, with scale parameter unknown. However, percentage points for the asymptotic distribution of the CVM test statistic have not been obtained when the null distribution is different from the cases above. Consequently, there are two alternatives to implement the test using Wn2 . The first alternative is to derive asymptotic critical values from the limit distribution of Wn2 and examine how F(.) and θ influence the results. The second alternative is to derive bootstrap critical values, designing an appropriate procedure and establishing its consistency. The usefulness of bootstrap procedures in nonparametric distance tests was first highlighted by [10] and, since then, it has been extensively used in similar problems to ours. [1] showed that the bootstrap methodology consistently estimates the null distributions of various distance tests, included the CVM type test-statistics. They focused on parametric and nonparametric bootstrap procedures and they demonstrate that both procedures lead to correct asymptotic levels, that is, they lead to consistent estimates of the percentiles of the true limiting distribution of the test statistic when the parameters are estimated. However, in the case of nonparametric bootstrap, a correction of the bias is required.
4 Parametric Bootstrap Procedure and Monte Carlo Simulations We test the null hypothesis that Ri ≡ B ( p, q ) , assuming that p and q are unknown. For random number generation we use GAUSS’s ‘rndbeta’ command, which computes pseudo-random numbers with beta distribution. We choose the initial seed for the generator of 345567 and we introduce the GAUSS’s command ‘rndseed seed’. Otherwise, the default is that GAUSS uses the clock to generate an initial seed. The steps of the designed parametric bootstrap procedure are the following: − Step 1: Assume R1, R2,…, Rn are i.i.d. random variables from a Beta distribution B(p,q), with both p and q unknown. Under H0, estimate θ=(p,q) to obtain θˆ = ( pˆ , qˆ ) . Evaluate the statistic Wn2 using R1, R2,…, Rn and θˆ = ( pˆ , qˆ ) .
− Step 2: Generate B bootstrap samples of i.i.d. observations R * = ( R1* , R2* ,..., Rn* )' * from B( pˆ , qˆ ) . Obtain Rb* = ( R1*b , R2*b ,..., Rnb )' for b=1,…,B. For each b, calculate
new estimates θˆ * = ( pˆ * , qˆ * ) . Compute the bootstrap version of Wn2 , say Wnb2* . In this way, a sample of B independent (conditionally on the original sample) observations of Wn2* , say Wn21* ,...,WnB2* , is obtained. − Step 3: Let Wn2(*1−α ) B the (1-α)B-th order statistic of the sample Wn21* ,...,WnB2* . Reject H0 at the significance level α if Wn2 > Wn2(*1−α ) B . Additionally, the bootstrap p-value can be compute as p B = card (Wnb2* ≥ Wn2 ) / B .
On the Performance of Recovery Rate Modeling
1077
Fig. 1. This figure shows the shape of the pdf associated to a mixture of two Beta distributions, a B(2,9) and a B(9,2) with probability parameter λ. When λ=0 the resulting pdf is left-skewed shaped and, when λ=1, it is right-skewed shaped.
In order to check the accuracy of the bootstrap procedure, we perform some Monte Carlo experiments. In all cases we test H0 at the 10%, 5% and 1% with the statistic Wn2 . All experiments reported in this paper are conducted for different sample sizes n=125, 250 and 500, with B=500 bootstrap replications. For each experiment we report the proportion of the rejections of H0 based on 1000 simulation runs. To compute the test statistic Wn2 , p and q are estimated by maximum likelihood procedure and by matching moments procedure.
1078
J.S. Baixauli and S. Alvarez
In our procedure, we generate i.i.d. observations Ri ≡ (1 − λ ) B(2,9) + λB(9,2) , i=1,…,n, for various λ, λ=0, 0.2, 0.4, 0.6, 0.8 and 1. Thus, H0 is true if and only if λ=0 or 1. In Figure 1, the corresponding pdf are plotted. As Figure 1 shows, a B(2,9) and a B(9,2) can represent the behavior of recovery rates under two different scenarios. On one hand, the pdf of a B(2,9) is concentrated to the left and it could be feasible to catch the behavior of securities with mean recovery rate approximately equals to 18%, which represents an unsecured security. On the other hand, the probability density function of a B(9,2) is concentrated to the right, which means that it could be reasonably modeled a secured security, with mean recovery rate approximately equals to 81%. Additionally, the mixture of two Beta distributions can describe the shape of the empirical distribution of recovery rates corresponding to portfolios composed by secured and unsecured securities. When λ=0.2, 0.4, 0.6, 0.8 the pdf is two-peaked. These cases are considered in order to generate data from different data generated processes that can not be assumed to follow a Beta distribution. For example, when λ=0.2, most of the probability mass is accumulated in the left-hand side of the distribution but there is a small peak in the left-had side which makes this pdf slightly different from a Beta distribution. Table 1. Proportion of Rejections of H0 when the parameters p and q are estimated by maximum likelihood procedure
λ 0 0.2 0.4 0.6 0.8 1
10% 10.4 64 46.8 49.1 63.7 9
n=125 5% 4.7 52.3 35.6 37.3 50.2 4.9
Xi∼(1-λ)B(2,9)+λB(9,2), i=1,…,n n=250 n=500 1% 10% 5% 1% 10% 5% 1% 1.4 8.2 4.4 1.2 9.6 4.1 0.9 30.1 93 86.3 67.4 99.6 99.1 96.1 15.6 67.4 55.6 31.7 91.9 85.9 68.4 18.9 68.9 58.2 33.9 92.1 87.3 70.2 28.9 90.4 84.1 65.1 99.8 99.2 96.3 1.4 9.2 5.1 1 10.8 5.9 1.3
Table 2. Proportion of Rejections of H0 when the parameters p and q are estimated by matching moments procedure
λ 0 0.2 0.4 0.6 0.8 1
10% 9.9 62.8 34.5 36.1 63.9 9.7
n=125 5% 5.7 51.2 24.2 26.1 52.7 5.3
Xi∼(1-λ)B(2,9)+λB(9,2), i=1,…,n n=250 n=500 1% 10% 5% 1% 10% 5% 1% 1.6 10.5 4.7 1.3 9.1 4.6 1.4 28.4 89.9 82.7 65.2 99.8 99.2 95.4 10.1 58.1 47.9 28 82.4 74.1 56.1 12.6 53.8 42.7 24.8 81.4 74 55.5 29.5 89.4 81.6 63.8 99.6 99.1 95.6 0.9 9.3 5.3 1.3 8.9 4.2 1
On the Performance of Recovery Rate Modeling
1079
In Table 1 we report the results when the maximum likelihood estimation procedure is used while in Table 2 we show the results when the matching moments estimation procedure is chosen. In order to evaluate if the difference between the percentage of rejections of H0, r, and the nominal significance level α, we compute the Z statistic given by: Z=
(r − α )
α (1 − α ) / R
,
(4)
where R is the number of Monte Carlo replications and Z distributes normally with mean 0 and variance 1. In particular, the bootstrap test is well specified at the 95% level if r∈[0.38, 1.61], r∈[3.65, 6.35] and r∈[8.14, 11.86] for R=1000 and, α=1%, 5% and 10%, respectively. Our results show that the bootstrap test works reasonably well under both H0 and H1, when the parameters p and q are estimated by maximum likelihood procedure or by matching moments procedure. The estimation method does not seem to affect the empirical size of the test. Under both estimation methods the empirical size is close to the nominal size of the test, as can be checked using the Z statistic. However, the bootstrap test under maximum likelihood estimation is slightly more powerful than when matching moments estimation is used. In the results reported in Tables 1 and 2, there is evidence of power increase as the sample size arises. For the smallest sample size, n=125, the bootstrap test has quite low power, especially for the 1% level. This fact is not present at the larger sample sizes, suggesting that a sample size equals to 250 is enough to assure reasonably high power of the bootstrap test at 1% level. It is worth noting that the power of the bootstrap test decreases as λ gets nearer to 0.5. It is due to that the resulting mixture of beta distributions is U-shaped, which can be captured by a beta distribution.
5 Conclusions In this paper, we have shown using computational methods how to test if the recovery rates follow a Beta distribution using a parametric bootstrap procedure to estimate the asymptotic null distribution of the Cramer-von Mises test statistic, when there are some unknown parameters that must be estimated from the sample. We have estimated the unknown parameters by the maximum likelihood procedure with general constraints on the parameters. For the maximization of the log-likelihood function we have chosen the Newton-Raphson numerical method in which both first and second derivative information are used. The implementation of the Newton-Raphson method involves a numerical calculation of the Hessian. Our simulations reveal that the bootstrap version of the Cramer-von Mises test statistic performs reasonably well under both H0 and H1. Moreover, its performance is not hardly affected by the estimation method chosen to estimate the two parameters which characterize the Beta distribution. Since it is quite often necessary in credit risk analysis to test the true underlying distribution of recovery rates in order to estimate potential credit loses, our bootstrap procedure appears to be a reliable method to implement the test. Furthermore, our bootstrap procedure is flexible enough to be used to test any other possible null distri-
1080
J.S. Baixauli and S. Alvarez
bution of interest. To sum up, computational methods should be taken into account in managing credit risk to ensure a reliable recovery rate distribution.
References 1. Babu, G.J., Rao, C.R.: Goodness-of-fit Tests when Parameters Are Estimated. Sankhya: Indian J. Statist. 66 (2004) 63-74 2. Carty, L.V., Lieberman, D.: Corporate Bond Defaults and Default Rates. Moody’s Investor Service (1996) 3. Darling, D.A.: The Cramer-Smirnov Test in the Parametric Case. Ann. Math. Statist. 26 (1955) 1-20 4. Durbin, J.: Weak Convergence of the Sample Distribution Function when Parameters Are Estimated. Ann. Statist. 1 (1973) 279-290 5. Gordy, M., Jones, D.: Capital Allocation for Securitizations with Uncertaintly in Loss Prioritization. Federal Reserve Board (2002) 6. Gupton, G.M., Stein, R.M.: LossCalcTM: Model for Predicting Loss Given Default (LGD). Moody’s Investors Service (2002) 7. Gupton, G.M., Stein, R.M.: LossCalcv2: Dynamic Prediction of LGD. Moody’s Investors Service (2005) 8. Kac, M., Kiefer, J., Wolfowitz, J.: On tests of Normality and other Tests of Goodness-offit, Based on Distance Method. Ann. Math. Statist. 26 (1955) 189-211 9. Pesaran, M.H., Schuermann, T., Treutler, B.J., Weiner, S.M.: Macroeconomic Dynamics and Credit Risk: a Global Perspective. DAE Working Paper No 0330 (2003) University of Cambridge 10. Romano, J.P.:A Bootstrap Revival of some Nonparametric Distance Tests. J. Amer. Statist. Assoc. 83 (1988) 698-708 11. Stephens, M.A.: Asymptotic Results for Goodness-of-fit Statistics with Unknown Parameters. Ann. Statist. 4 (1976) 357-369 12. Sukhatme, S.: Fredholm Determinant of a Positive Definite Kernel of a Special Type and its Application. Ann. Math. Statist. 43 (1972) 1914-1926
Using Performance Profiles to Evaluate Preconditioners for Iterative Methods Michael Lazzareschi and Tzu-Yi Chen Department of Computer Science, Pomona College, Claremont CA 91711, USA {mdl12002, tzuyi}@cs.pomona.edu
Abstract. We evaluate performance profiles as a method for comparing preconditioners for iterative solvers by using them to address three questions that have previously been asked about incomplete LU preconditioners. For example, we use performance profiles to quantify the observation that if a system can be solved by a preconditioned iterative solver, then that solver is likely to use less space, and not much more time, than a direct solver. In contrast, we also observe that performance profiles are difficult to use for choosing specific parameter values. We end by discussing the role performance profiles might eventually play in helping users choose a preconditioner. Keywords: Iterative methods, preconditioners, performance profiles.
1
Introduction
Iterative methods for solving large, sparse linear systems Ax = b are generally used when the time or memory requirements of a direct solver are unacceptable. However, getting good performance from an iterative method often requires first applying an appropriate preconditioner. Ideally, this preconditioner is inexpensive to compute and apply, and it generates a modified system A x = b that is somehow “better” for the iterative solver being used. Unfortunately, choosing an effective preconditioner for any given system is not usually a simple task. This has led to the proposal of many different preconditioners, combined with an interest in helping users choose from among those preconditioners. For the latter, some researchers have suggested rules-of-thumb based on experimentation (eg, [7, 9]), whereas others have tried using data-mining techniques to extract general guidelines (eg, [24]). While the former can suggest default settings, it does not necessarily suggest when and how to adjust those defaults. While the latter could potentially provide more flexible guidelines, so far there has been limited success in extracting meaningful features. In this paper we consider using performance profiles [14], a technique introduced by the optimization community, to compare preconditioners for sparse linear systems.1 1
Corresponding author. For a survey of other methods used by the optimization community to compare heuristics, see [19].
M. Gavrilova et al. (Eds.): ICCSA 2006, LNCS 3982, pp. 1081–1089, 2006. c Springer-Verlag Berlin Heidelberg 2006
1082
M. Lazzareschi and T.-Y. Chen
Because performance profiles are known to be potentially misleading when used to compare very different solvers [15], in this initial study we only evaluate different value-based incomplete LU (ILU) preconditioners. We find that performance profiles can still be difficult to interpret even when restricted to this limited class of preconditioners. However, they do provide a way to begin addressing different questions about ILU preconditioners, including those asked in papers such as [7, 8, 18]. We end by discussing the role performance profiles might play in evaluating preconditioners.
2
Background
Value-based ILU preconditioners are a popular class of preconditioners which are often computed by mimicking the computation of a complete LU factorization, but also “dropping” small entries by setting their values to 0. Surveys such as [1, 5, 23] speak to the variety of preconditioners in this class; researchers have suggested different ways of deciding what elements to drop, different methods of ordering the matrix prior to computing the incomplete factorization, and other variations. Given the variety of ILU preconditioners that exist, comparisons are necessary from a user’s point of view. Unfortunately, comparing different preconditioners can be difficult in part because of inherent trade-offs between the speed of convergence, memory requirements, and other attributes. Furthermore, different applications stress different criteria, with perhaps the only consensus being that preconditioned solvers that never converge to the correct solution, and preconditioned solvers that require both more space and more time than a direct method, are useless. Currently, when new ILU preconditioners are proposed, or existing ones evaluated, authors often describe their behavior in tables where each row is a single matrix and each column is a measure such as the amount of space needed, the number of iterations, the accuracy to which the solution is computed, and so on (eg, [4, 6]). While these tables present complete data, they are inherently limited by the fact that users are unlikely to want to study a table spanning several pages. In addition, users remain responsible for determining what portions of the table are most relevant to their system. Other papers use graphs to visually display performance data (eg, [8, 18]). Although graphs can present data more compactly, they can also be difficult to interpret unless the author can point to clear trends. A generalizable method for presenting comparison data more compactly than tables, while still revealing essential differences along a number of dimensions, could help users in choosing a preconditioner. We consider using performance profiles, introduced in [14], for this purpose. A performance profile comparing, say, the time needed to solve a set of systems S using any of a set of preconditioned solvers P , begins with the time taken for each preconditioned solver p ∈ P to solve each system s ∈ S. From this timing data we compute rp,s , which is the ratio of the time taken to solve s using the preconditioned solver p to the fastest time taken by any preconditioner to solve
Using Performance Profiles to Evaluate Preconditioners
1083
s.2 The performance profile for p over the sets P and S is a plot of the function ρp (τ ), where |s ∈ S s.t. rp,s ≤ τ | ρp (τ ) = (1) |S| In other words, ρp (τ ) is the probability that the solver p solves a problem in S in no more than τ times the minimum time taken by any solver in P . Although performance profiles present less data than tables, the plots are potentially easier for users to interpret. In addition, performance profiles are sufficiently flexible that they could potentially be used as a standard method of comparison. However, this flexibility brings with it the potential for confusion: the appearance of a plot depends on the problems in S as well as on the solvers plotted in that graph. We ask to what extent these drawbacks impact the viability of using performance profiles as a general tool for evaluating preconditioners.
3
Methodology
We use a set of 115 nonsymmetric, real matrices taken primarily from the University of Florida sparse matrix collection [10]. The exact solution to the system Ax = b was always assumed to be a vector of 1s, and the vector b was calculated accordingly. The iterative solver was GMRES(50) [22] with an upper bound of 500 on the number of iterations and a tolerance of 10−8 , choices which are also used in studies such as [7, 9]. We used the variant of ILU known as ILUTP Mem [8], with values of lfil varying from 0 to 5, values of pivtol varying from 0 to 1, and values of droptol varying from 0 to .1. For each matrix and each combination of parameters we used the natural ordering as well as two fill-reducing orderings: RCM [20] and COLAMD [11]. We also experimented with MC64-based ordering and scaling for stability [16, 17]. To enable comparison to a direct solver, we ran SuperLU [13] on all the systems. Overall we ran several thousand test cases. Because our focus in this paper is on understanding the kind of information that can be read from performance profiles, we discuss only a subset of our data. In particular, we always use pivtol= 1.0 and droptol= 0.0. We always use MC64(5) with scaling for stability, and any subsequent fill-reducing ordering is always applied symmetrically. Furthermore, although we collected a wide range of information about each run, in this paper we use only the overall time taken by the preconditioned solver (including the time to compute and apply the preconditioner, to run the iterative solver, and to recover the solution to the original problem) and the space used by the preconditioner. We plot performance profiles comparing space and time, as these are two aspects of interest to most users. The profiles are plotted for values of τ up to 3 for space and up to 10 for time. The difference reflects the observation that 2
Note that rp,s = 1.0 if and only if p is the best solver for s, and rp,s = ∞ if and only if s cannot be solved using p (ie, the iterative solver did not converge).
1084
M. Lazzareschi and T.-Y. Chen
users can typically wait longer for a solution if necessary, but that excessive space requirements are insurmountable. Finally, in the legend for each performance profile we note the total percentage of problems that can be solved with each solver. In this way the focus of the graphs remains on small values of τ , which represent the regime where the solvers are most competitive. However, by giving the overall percentage of problems solved, we also reveal whether a relatively inefficient solver might be particularly robust.
4
Examples and Analysis
In this section we use performance profiles to address three questions of interest when evaluating ILU preconditioners. As noted in [15], and as will become evident here, performance profiles can be confusing when too many, or very different, solvers are plotted in a single graph. Hence we restrict each profile in this section to a relatively small set of preconditioned iterative solvers. 4.1
Comparing Fill-Reducing Orderings
Just as fill-reducing orderings are used to reduce the fill in complete LU factorizations, they can also be used with ILU factorizations to compute more accurate preconditioners in less space. A general rule of thumb that emerges from previous work is that minimum degree methods are better than other fill-reducing orderings for value-based ILU preconditioners (eg, [2, 3, 7, 12, 18]). We use performance profiles to reevaluate this claim. Figure 1 compares natural, RCM, and COLAMD orderings on their time and requirements in the graphs on the left and right, respectively. We use an Space
Time 1
1
rcm 3: 47% col 3: 46% nat 3: 36%
0.8
0.8
Total Problems: 115
rcm 3: 47% col 3: 46% nat 3: 36% Total Problems: 115
0.6 s
ρs(τ)
ρ (τ)
0.6
0.4
0.4
0.2
0.2
0
2
4
τ
6
8
10
0 1
1.5
2 τ
2.5
3
Fig. 1. Performance profiles comparing time and space requirements for ILUTP Mem preconditioned GMRES(50) using three different orderings: natural, RCM, and COLAMD.
Using Performance Profiles to Evaluate Preconditioners
1085
lfil value of 3 for these examples based on results in [8] that suggest higher levels of fill do not significantly increase the likelihood of convergence for the ILUTP Mem preconditioner. The two graphs show that performance is similar using either the RCM or COLAMD orderings, and that both outperform the natural ordering. The percentages in the legends tell us that RCM solves more problems overall. Furthermore, the fact that the lines for RCM are always above those for COLAMD shows that the former solves more problems within any given percentage of the overall best time or space. However, the steep slope between τ = 1 and τ = 2 for COLAMD in the time plot indicates that COLAMD rarely takes more than twice as much time as the best of the three orderings. Clearly COLAMD is not significantly better than RCM as a fill-reducing ordering for use with this particular ILU preconditioner and these parameter settings. Further experiments might reveal whether COLAMD is, as suggested by previous work, notably better in other contexts. 4.2
Choosing the Value of lfil
Value-based ILU preconditioners often provide parameters for controlling the trade-off between space and time. Typically, the more space used by a preconditioner, the fewer iterations (and hopefully less overall time) needed for the subsequent iterative solver to converge. In ILUT [21] and its variants, the lfil parameter gives one way to control this trade-off.3 To better understand the effect of the value of lfil, Figure 2 plots performance profiles for time and space for lfil values ranging from 0 to 5. Based on the results in the previous section, RCM ordering is used.
Time 1
rcm 0: 20% rcm 1: 32% rcm 2: 38% 0.8 rcm 3: 47% rcm 4: 52% rcm 5: 53% 0.6 Total Problems: 115
0.8
ρ (τ)
0.6
0.4
0.4
0.2
0.2
0
2
4
τ
6
8
10
Space rcm 0: 20% rcm 1: 32% rcm 2: 38% rcm 3: 47% rcm 4: 52% rcm 5: 53% Total Problems: 115
s
s
ρ (τ)
1
0 1
1.5
2 τ
2.5
3
Fig. 2. Performance profiles comparing time and space requirements as a function of different values of lfil 3
The original meaning of the lfil parameter in ILUT is found to be potentially misleading in [8]; we use the modified definition of the lfil parameter they suggest.
1086
M. Lazzareschi and T.-Y. Chen
The fact that these profiles are more complex than those in the previous section reinforces the observation that plotting more than a few solvers on one plot is inadvisable. The plot on the left seems to indicate that a higher lfil value leads to a greater likelihood of convergence within any small multiple of the optimal time. However, since many more problems can be solved using larger values of lfil, as indicated by the percentages in the legend, this observation is not very insightful. At most we can say that once τ is larger than about 1.5, higher lfil values seem to be generally better, though the difference between lfil= 4 and 5 seems small. In other words, higher lfil values are generally better than lower values if our primary concern is with having a solver that is no more than 50% slower than optimal over lfil values between 0 and 5. The performance profile for space in Figure 2 is also difficult to interpret. As expected, lower lfil values generally use less space, but the magnitude and consistency of the advantage are unclear. This is due to the fact that, in general, if a system can be solved with lfil= k, it can also be solved with lfil> k, albeit using more space. This, and the fact that the ILUTP Mem preconditioner uses about twice as much space with lfil= 2 as it does with lfil= 1, explains the jumps at integer values of τ . In particular, the fact that a problem which can be solved with lfil= 1 can also be solved with lfil= 2 is not reflected until τ = 2, when we plot all solvers that can solve with up to twice as much space as optimal. While the above explain some of the trends exhibited in the plots in Figure 2, it does not provide clear insight into how to set lfil to optimize any meaningful measure of performance. 4.3
Evaluating Robustness
Finally we address a question that is sometimes overlooked when evaluating preconditioners for iterative solvers: when, if ever, are they competitive with direct solvers? Figure 3 shows performance profiles for time and space comparing Time
Space 1
0.8
0.8
0.6
0.6 s
s
ρ (τ)
ρ (τ)
1
0.4
0.4
0.2
0 1
rcm 3: 47% SuperLU: 88% 2
3
4
5
τ
6
7
Total Problems: 115 8 9 10
0.2
0 1
rcm 3: 47% SuperLU: 88% 1.5
2 τ
Total Problems: 115 2.5 3
Fig. 3. Performance profiles comparing time and space for RCM-ordered ILUTP Mem and SuperLU on the full set of 115 matrices
Using Performance Profiles to Evaluate Preconditioners
1087
Space
Time 1
1
0.9 0.8
0.8
0.7 0.6 0.5
s
s
ρ (τ)
ρ (τ)
0.6
0.4
0.4
0.3 0.2
0 1
rcm 3:100% SuperLU:100% 2
3
4
5
τ
6
Total Problems: 53 7 8 9 10
0.2
rcm 3:100% SuperLU:100%
0.1 0 1
1.5
2 τ
Total Problems: 53 2.5 3
Fig. 4. Performance profiles comparing time and space for RCM-ordered ILUTP Mem and SuperLU on the subset of 53 systems which they can both solve
the direct solver SuperLU [13] and RCM-ordered ILUTP Mem preconditioned GMRES(50), with lfil= 3. Initially SuperLU looks far superior. However, as previously noted, the fact that it converges on many more systems complicates the interpretation of the performance profiles. If we note that SuperLU is clearly more robust and then replot the performance profiles only on the subset of the problems that are solved by both methods, a different picture emerges. In particular, Figure 4 shows that the preconditioned iterative solver uses less space and that it never uses more than about 50% more space than that used by the more space-efficient of the two solvers. Furthermore, the preconditioned iterative solver is rarely more than three times slower than the faster of the two solvers. From this we conclude that if one has reason to believe a particular system can be solved using an iterative method, then the potential benefits in both time and space are well worth considering.
5
Conclusion
In this paper we discuss the potential use of performance profiles in evaluating ILU preconditioned iterative solvers. Regarding the use of performance profiles for this purpose, we find that they can be a useful way of presenting a summary of results over a large number of systems. However, they can be difficult to interpret when used to compare too many, or very different, solvers in a single plot. We also observe that plotting these profiles only for small values of τ , and additionally noting the total percentage of problems on which each solver converges to give a sense of behavior at large values of τ , makes these plots easier to interpret. From a user perspective, we find that if a user believes their a system can be solved by a preconditioned iterative method, then that solver will likely be both space and time competitive with direct methods. This suggests further work into understanding what makes a system solvable by preconditioned iterative methods.
1088
M. Lazzareschi and T.-Y. Chen
Taken as a whole, the above suggest that what might be useful would be a tool that allowed users to specify the exact solvers and the exact systems that they wanted to compare. While large tables of data such as those in some existing papers on preconditioners technically allow a user to do this, a standardized graph-based format might simplify the task of looking through data for insight into the behavior of different solvers on different classes of systems. While performance profiles might be useful in this role, a user would need to be given hints on how to meaningfully interpret these plots. We are still evaluating the strengths and weaknesses of performance profiles, and we are also interested in evaluating other methods for presenting comparison data.
Acknowledgements This work was funded in part by the National Science Foundation under grant #CCF-0446604. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation.
References 1. M. Benzi. Preconditioning techniques for large linear systems: A survey. J. of Comp. Physics, 182(2):418–477, November 2002. 2. M. Benzi, W. D. Joubert, and G. Mateescu. Numerical experiments with parallel orderings for ILU preconditioners. Electronic Transactions on Numerical Analysis, pages 88–114, 1999. 3. M. Benzi, D. B. Szyld, and A. van Duin. Orderings for incomplete factorization preconditioning of nonsymmetric problems. SIAM J. Sci. Comput., 20(5):1652– 1670, 1999. 4. M. Benzi and M. Tuma. A sparse approximate inverse preconditioner for nonsymmetric linear systems. SIAM Journal on Scientific Computing, 19:968–994, 1998. 5. T. F. Chan and H. A. van der Vorst. Approximate and incomplete factorizations. In D. E. Keyes, A. Samed, and V. Venkatakrishnan, editors, Parallel numerical algorithms, volume 4 of ICASE/LaRC Interdisciplinary Series in Science and Engineering, pages 167–202. Kluwer Academic, Dordecht, 1997. 6. A. Chapman, Y. Saad, and L. Wigton. High-order ILU preconditioners for CFD problems. Int. J. Numer. Meth. Fluids, 33:767–788, 2000. 7. T.-Y. Chen. Preconditioning sparse matrices for computing eigenvalues and solving linear systems of equations. PhD thesis, University of California at Berkeley, December 2001. 8. T.-Y. Chen. ILUTP Mem: A space-efficient incomplete LU preconditioner. In A. Lagan` a, M. L. Gavrilova, V. Kumar, Y. Mun, C. J. K. Tan, and O. Gervasi, editors, Proceedings of the 2004 International Conference on Computational Science and its Applications, volume 3046 of LNCS, pages 31–39, 2004. 9. E. Chow and Y. Saad. Experimental study of ILU preconditioners for indefinite matrices. J. Comp. and Appl. Math., 86:387–414, 1997.
Using Performance Profiles to Evaluate Preconditioners
1089
10. T. Davis. University of Florida sparse matrix collection. NA Digest, v.92, n.42, Oct. 16, 1994 and NA Digest, v.96, n.28, Jul. 23, 1996, and NA Digest, v.97, n.23, June 7, 1997. Available at: http://www.cise.ufl.edu/research/sparse/matrices/. 11. T. Davis, J. Gilbert, S. Larimore, and E. Ng. A column approximate minimum degree ordering algorithm. ACM Trans. on Math. Softw., 30(3):353–376, September 2004. 12. E. F. D’Azevedo, P. A. Forsyth, and W.-P. Tang. Ordering methods for preconditioned conjugate gradient methods applied to unstructured grid problems. SIAM J. Matrix Anal. Appl., 13(3):944–961, July 1992. 13. J. W. Demmel, J. R. Gilbert, and X. S. Li. SuperLU users’ guide, September 1999. Available at: http://www.nersc.gov/∼xiaoye/SuperLU/. 14. E. Dolan and J. Mor´e. Benchmarking optimization software with performance profiles. Mathematical Programming, 91:201–213, 2002. 15. E. Dolan, J. Mor´e, and T. Munson. Optimality measures for performance profiles. Preprint ANL/MCS-P1155-0504, 2004. 16. I. S. Duff and J. Koster. The design and use of algorithms for permuting large entries to the diagonal of sparse matrices. SIAM J. Matrix Anal. Appl., 20(4):889– 901, 1999. 17. I. S. Duff and J. Koster. On algorithms for permuting large entries to the diagonal of a sparse matrix. SIAM J. Matrix Anal. Appl., 22(4):973–996, 2001. 18. J. R. Gilbert and S. Toledo. An assessment of incomplete-LU preconditioners for nonsymmetric linear systems. Informatica, 24:409–425, 2000. 19. C. Khompatraporn, J. D. Pinter, and Z. B. Zabinsky. Comparative assessment of algorithms and software for global optimization. J. Global Opt., 31(4):613–633, 2005. 20. W.-H. Liu and A. H. Sherman. Comparative analysis of the Cuthill-McKee and the reverse Cuthill-McKee ordering algorithms for sparse matrices. SIAM J. Num. Anal., 13(2):198–213, April 1976. 21. Y. Saad. ILUT: A dual threshold incomplete LU factorization. Numer. Linear Algebra Appl., 4:387–402, 1994. 22. Y. Saad and M. H. Schultz. GMRES: A generalized minimal residual algorithm for solving nonsymmetric linear systems. SIAM J. Sci. Stat. Comput., 7(3):856–869, July 1986. 23. Y. Saad and H. A. van der Vorst. Iterative solution of linear systems in the 20th century. J. of Comp. and Appl. Math., 123:1–33, November 2000. 24. S. Xu and J. Zhang. Solvability prediction of sparse matrices with matrix structurebased preconditioners. In Proceedings of Preconditioning 2005, Atlanta, Georgia, May 2005.
Multicast ω-Trees Based on Statistical Analysis Moonseong Kim1 , Young-Cheol Bang2 , and Hyunseung Choo1 1
School of Information and Communication Engineering, Sungkyunkwan University, 440-746, Suwon, Korea Tel.: +82-31-290-7145 {moonseong, choo}@ece.skku.ac.kr 2 Department of Computer Engineering, Korea Polytechnic University, 429-793, Gyeonggi-Do, Korea Tel.: +82-31-496-8292 [email protected]
Abstract. In this paper, we study the efficient multicast routing tree problem with QoS requirements. The new multicast weight parameter is proposed by efficiently combining two independent measures, the link cost and delay. The weight ω ∈ [0, 1] plays an important role in combining the two measures. If the ω approaches to 0, then the tree delay is decreasing. On the contrary if it closes to 1, the tree cost is decreasing. Therefore, if the ω is decided, then the efficient multicast tree can be found. A case study shows various multicast trees for each ω. When network users have various QoS requirements, the proposed multicast weight parameter is very informative for them.
1
Introduction
Multicast Services have been increasingly used by various continuous media applications. For example, the multicast backbone (MBONE) of the Internet has been used to transport real time audio and video for news, entertainment, and distance learning. In multicast communications, messages are sent to multiple destinations that belong to the same multicast group. These group applications demand a certain amount of reserved resources to satisfy their Quality of Service (QoS) requirements such as end-to-end delay, delay jitter, loss, cost, throughputs, and etc. Since resources for multicast tree are reserved along a given path to each destination in a given multicast tree, it may fail to construct a multicast tree to guarantee the required QoS if a single link cannot support required resources. Thus an efficient solution for multicast communications includes the construction of a multicast tree that has the best chance to satisfy the resource requirements [1, 2, 8, 9, 10, 11, 17, 19, 21].
This research was supported by the Ministry of Information and Communication, Korea under the Information Technology Research Center support program supervised by the Institute of Information Technology Assessment, IITA-2005-(C10900501-0019) and the Ministry of Commerce, Industry and Energy under NextGeneration Growth Engine Industry. Dr. Choo is the corresponding author and Dr. Bang is the co-corresponding author.
M. Gavrilova et al. (Eds.): ICCSA 2006, LNCS 3982, pp. 1090–1099, 2006. c Springer-Verlag Berlin Heidelberg 2006
Multicast ω-Trees Based on Statistical Analysis
1091
Previous optimization techniques for multicast routing have considered two optimization goals, delay optimization and cost optimization, but as distinct problems. The optimal delay solution is such that the sum of delays on the links along the path from source to each destination is minimum. Dijkstra’s shortest path algorithm [4] can be used to generate the shortest paths from the source to the destination nodes in O(n2 ) time in a graph with n nodes. This provides the optimal solution for delay optimization. A cost optimized multicast route is a tree spanning the destinations such that the sum of the costs on the links of the tree is minimum. This problem is also known as the Steiner tree problem [9], and is known to be NP-complete [7]. However, some heuristics for the Steiner tree problem have been developed that take polynomial time [11, 17] and produce near optimal results. In this paper, we consider multicast routing as a source routing problem, with each node having full knowledge of the network and its status. We associate a link cost and a link delay with each link in the network. The problem is to construct a tree spanning the destination nodes, such that it has the QoS requirements such as minimum cost tree, minimum delay tree, cost-constrained minimum delay tree, delay-bounded minimum cost tree problem, and etc. The cost of optimal delay solution is relatively more expensive than the cost of cost optimized multicast route, and moreover, the delay of the cost optimized multicast route is relatively higher than the delay of optimal delay solution. The negotiation between the tree cost and the tree delay is important. Hence, we introduce the new multicast parameter that regulates both the cost and the delay at the same time. We use TM algorithm [17], well-known Steiner tree algorithm, with the proposed parameter and can adjust them by the weight ω ∈ [0, 1]. The rest of paper is organized as follows. In Section 2, we describe the network model and a well known TM algorithm. Section 3 presents details of the new parameter and illustrates with example. Then we analyze and evaluate the performance of the proposed parameter by simulation in Section 4. Section 5 concludes this paper.
2 2.1
Preliminaries Network Model
We consider that a computer network is represented by a directed graph G = (V, E) with n nodes and l links, where V is a set of nodes and E is a set of links, respectively. Each link e = (i, j) ∈ E is associated with two parameters, namely link cost c(e) ≥ 0 and link delay d(e) ≥ 0. The delay of a link, d(e), is the sum of the perceived queueing delay, transmission delay, and propagation delay. We define a path as sequence of links such that (u, i), (i, j), . . ., (k, v), belongs to E. Let P (u, v) = {(u, i), (i, j), . . . , (k, v)} denote the path from node u to node v. If all nodes u, i, j, . . . , k, v are distinct, then we say that it is a simple directed path. We define the length of the path P (u, v), denoted by n(P (u, v)), as a number of links in P (u, v). For given a source node s ∈ V and a destination node d ∈ V , (2s→d , ∞) is the set of all possible paths from s to d.
1092
M. Kim, Y.-C. Bang, and H. Choo
(2s→d , ∞) = { Pk (s, d) | all possible paths from s to d, ∀ s, d ∈ V,
∀
k∈Λ} where Λ is an index set. The path cost of Pk is given by φC (Pk ) = e∈Pk c(e) and the path delay of Pk is given by φD (Pk ) = e∈Pk d(e), ∀ Pk ∈ (2s→d , ∞). For the multicast communications, messages need to be delivered to all receivers in the set M ⊆ V \ {s} which is called the multicast group, where |M | = m. The path traversed by messages from the source s to a multicast receiver, mi , is given by P (s, mi ). Thus multicast routing tree can be defined as T (s, M ) = mi ∈M P (s, mi ) and the messages are sent from s to M through T (s, M ). The tree cost of tree T (s, M ) is given by φC (T (s, M )) = e∈T c(e) and the tree delay is φD (T (s, M )) = max{φD (P (s, mi )) | ∀ P (s, mi ) ⊆ T, ∀ mi ∈ M }. 2.2
TM Algorithm
There is a well known approach for constructing multicast tree with minimum cost. The algorithm TM due to Takahashi and Matsuyama [17] is a shortest path based algorithm and works on asymmetric directed networks. Also it was further studied and generalized by Ramanthan [15]. The TM algorithm is very similar to the shortest path based Prim’s minimum spanning tree algorithm [14], and works as following three steps. 1. Construct a subtree, T1 , of network G, where T1 consists a source node s only. Let i = 1 and Mi = {s} 2. Find the closest node, mi ∈ (M \ Mi ), to Ti (if tie, broken arbitrarily). Construct a new subtree, Ti+1 by adding all links on the minimum cost path from Ti to mi . Set Mi = Mi ∪ {mi }. Set i = i + 1. 3. If |Mi | < |M | then go to step 2, otherwise return a final Ti Fig. 1 (a) shows a given network topology with link costs specified on each link. Fig. 1 (b) represents the ultimate multicast tree obtained by the TM. The cost of the tree generated by the TM is 13. In the worst case, the cost of tree by TM is worse than 2(1 − 1/|M |)φC (T ), where T is the optimal tree [17].
Fig. 1. Given a network (a), a multicast tree based on TM is shown in (b)
Multicast ω-Trees Based on Statistical Analysis
3 3.1
1093
Proposed Multicast ω-Trees The Negotiation Between Cost and Delay
We compute two paths the Least Delay path, PLD (s, mk ), and the Least Cost path, PLC (s, mk ), from the source s to the each destination mk . Since only linkdelays are considered to find PLD (s, mk ), φC (PLD ) is always greater than or (PLC ) equal to φC (PLC ). If the path cost, φC (PLD ), is decreased by 100 1− φφCC (P %, LD ) the decreased value is obviously equal to φC (PLC ). The sample set, PLC or PLD from (2s→d , ∞), with non-normal distribution is approximately normally distributed by the Central Limit Theorem (CLT) [13]. The confidence interval 100(1− α)% for the sample mean of the arbitrary sample set can be described by “Confidence Interval” in Fig. 2. Let C¯ be the average of link costs along PLD with (i, j) ∈ PLD then C¯ = φC (PLD )/n(PLD ). We employ the Gaussian distribution by the Central Limit Theorem. We consider (PLC ) (PLC ) the confidence interval 2 × 100 1 − φφCC (P % to decrease 100 1 − φφCC (P % LD ) LD ) and should calculate its percentile. Since the Gaussian density function is sym¯ if the value that has to be decreased is greater than metric to the mean C, or equal to 50% then we interpret this value as 99.9% confidence interval for simplicity.
⎛ φ (P ) ⎞ 100 ⎜1 − C LC ⎟ % ⎝ φC ( PLD ) ⎠
post LD
cost
post LD
Cost Normal Distribution of LD path
C
C
Confidence Interval
Fig. 2. postLD
As shown in Fig. 2, postLD is the detection point to change the path cost from s to mk . So, it is essential to find the percentile zα/2 . In order to obtain it, we can use the cumulative distribution function (CDF). Let the CDF be F (x) such x y2 d that −∞ √12π e− 2 dy. Then the percentile, zα/2 , is a solution of the following φ (P ) φ LC C (PLC ) d equation, F (zα/2 ) − 12 = 1 − φCC (PLD ) if 100 1 − φC (PLD ) % < 50%. d √ SLD After calculating the percentile, we compute the postLD = C¯ − zα/2 n(PLD ) ¯ 2 /(n(PLD ) − 1) 1/2 , is the sample standard dewhere SLD , (c(e) − C) e∈PLD viation. If n(PLD ) = 1, then SLD = 0. The new cost value of each link for mk is as follow, Cf ct(e) = max{ 1, 1 + (c(e) − postLD )
ω }, 0≤ω≤1. 0.5
1094
M. Kim, Y.-C. Bang, and H. Choo
Meanwhile, PLC (s, mk ) is computed by taking the link-cost only into account. So, φD (PLC ) is always greater than or equal to φD (PLD ). If φD (PLC ) is decreased (PLD ) by 100 1 − φφD %, then the decreased value is to be φD (PLD ). The new D (PLC ) delay value of each link for mk can be derived by the same manner used in the case of PLD , Df ct(e) = max{ 1, 1 + (d(e) − postLC )
1−ω }, 0≤ω≤1. 0.5
Once the Cf ct(e) and the Df ct(e) are computed, we calculate the new parameter values with weight ω, Cf ct(e) × Df ct(e), for each link e ∈ E of G for mk ∈ M . Let X mk = ( Cf ct · Df ct )n×n be a adjacency matrix for ∀ mk ∈ M . mk k And we normalize X mk = ( xm . ij )n×n , it is called N mk k i.e., N mk = ( xm ij )n×n /max{xij | 1 ≤ i, j ≤ n} ,
∀
mk ∈ M.
Finally, we obtain the new parameter, N = mk ∈M N mk , that regulates both the tree cost and the tree delay at the same time by the weight ω ∈ [0, 1]. We use the TM algorithm [17] with the new weight parameter N . The weight ω plays an important role in combining the two measures, the delay and cost. If the ω approximates to 0, then the tree delay is reduced. Otherwise, it approaches to 1, and the tree cost is reduced. Therefore if a ω is decided, then the efficient routing tree can be found. 3.2
A Case Study
Fig. 3 shows a example of the trees obtained by different routing algorithms to span destinations. Fig. 3 (a) shows a given network topology G. Link costs and link delays are shown to each link as a pair (cost, delay). To construct a tree rooted at v0 that spans the destination set M = {v2 , v3 , v4 }, we consider either link cost or link delay. Fig. 3 (b) and (f) explain the optimal delay solution using Dijkstra’s algorithm and the cost optimized multicast route using TM algorithm, respectively. Fig. 3 (c)-(e) are good examples to illustrate multicast ω-Trees. In particular, we consider ω = 0.5 for description. The following steps explain processes for obtaining new parameter N . Firstly, we think the destination node v2 . PLD (v0 , v2 ) = {(v0 , v1 ), (v1 , v4 ), (v4 , v3 ), (v3 , v2 )} and PLC (v0 , v2 ) = {(v0 , v1 ), (v1 , v2 )}. φC (PLD ) = 19, φC (PLC ) = ¯ = 16/2 = 8. 10, φD (P = 13. C¯ = 19/4 = 4.75 and D √LC ) = 16, and φD (PLD ) √ d SLD = 14.92 = 3.86 and SLC = 8 = 2.83. 100(1 − 10 ) = 47.37%, so zα/2 = 19 13 3.86 c 1.88 and 100(1 − 16 ) = 18.75%, so zα/2 = 0.50. postLD = 4.75 − 1.88 √4 = √ 1.12 and postLC = 8 − 0.50 2.83 = 7.00. For a link (v0 , v1 ) in Fig. 3 (d), we 2 0.5 calculate Cf ct (v0 , v1 ) = max{1, 1+(5−1.12) 0.5 } = 4.88 and Df ct (v0 , v1 ) = max{1, 1 + (6 − 7) (1−0.5) 0.5 } = 1. Cf ct (v0 , v1 ) ×Df ct (v0 , v1 ) = 4.88. By the same manner, we obtain all new values in the network G for the destination node v2 .
Multicast ω-Trees Based on Statistical Analysis
⎛
1095
⎞ 4.88 · · · 6.88 · 4.88 5.88 1.00 6.88 ⎟ ⎟ 4.88 · 2.88 · · ⎟ C v2 ⎟ 5.88 2.88 · 9.88 · ⎟ ⎠ 1.00 · 9.88 · 9.88 6.88 · · 9.88 · ⎞6×6 1.00 · · · 4.00 · 4.00 1.00 1.00 1.00 ⎟ ⎟ 4.00 · 1.00 · · ⎟ Dv2 ⎟ 1.00 1.00 · 1.00 · ⎟ ⎠ 1.00 · 1.00 · 1.00 1.00 · · 1.00 · 6×6 ⎞ · 4.88 · · · 27.52 ⎛ ⎞ ⎜ 4.88 · 19.52 5.88 1.00 6.88 ⎟ ⎜ ⎟ · 19.52 · 2.88 · · ⎟ ⎜ X v2 =⎝ Cf ct(e) · Df ct(e) ⎠ = ⎜ ⎟ 5.88 2.88 · 9.88 · ⎟ ⎜ · ⎝ ⎠ · 1.00 · 9.88 · 9.88 27.52 6.88 · · 9.88 · ⎛ ⎞ 6×6 · 0.18 · · · 1.00 ⎛ ⎞ · 0.71 0.21 0.04 0.25 ⎟ ⎜ 0.18 ⎜ ⎟ · 0.71 · 0.10 · · ⎟ ⎜ v v N v2 = max{xij2 }−1 ⎝ xij2 ⎠ =⎜ ⎟ 0.21 0.10 · 0.36 · ⎟ ⎜ · ⎝ ⎠ 6×6 · 0.04 · 0.36 · 0.36 1.00 0.25 · · 0.36 · 6×6 · ⎛ ⎞ ⎜ 4.88 ⎜ ⎜ · = ⎝ Cf ct(e) ⎠ =⎜ ⎜ · ⎝ 6×6 · 6.88 ⎛ · ⎛ ⎞ ⎜ 1.00 ⎜ ⎜ · = ⎝ Df ct(e) ⎠ =⎜ ⎜ · ⎝ 6×6 · 4.00 ⎛
Moreover, we think others ⎛ · 0.17 · · 0.60 ⎜ 0.17 ⎜ 0.60 · ⎜ · N v3 = ⎜ 0.14 0.04 ⎜ · ⎝ · 0.04 · 1.00 0.47 · ⎛ · 0.26 · · 0.60 ⎜ 0.26 ⎜ 0.60 · ⎜ · N v4 = ⎜ 0.23 0.03 ⎜ · ⎝ · 0.03 · 1.00 0.57 ·
destinations v3 and v4 . ⎞ · · 1.00 0.14 0.04 0.47 ⎟ ⎟ 0.04 · · ⎟ ⎟ · 0.29 · ⎟ ⎠ 0.29 · 0.29 · 0.29 · ⎞6×6 · · 1.00 0.23 0.03 0.57 ⎟ ⎟ 0.03 · · ⎟ ⎟ · 0.23 · ⎟ ⎠ 0.23 · 0.46 · 0.46 · 6×6
Therefore, we obtain the new parameter N for a multicast tree Tω:0.5 . ⎛ ⎞ · 0.61 · · · 3.00 · 1.91 0.58 0.10 1.29 ⎟ ⎜ 0.61 ⎜ ⎟
· 1.91 · 0.17 · · ⎟ ⎜ N= N mk = ⎜ ⎟ 0.58 0.17 · 0.87 · ⎟ ⎜ · mk ∈M ⎝ ⎠ · 0.10 · 0.87 · 1.10 3.00 1.29 · · 1.10 · 6×6
1096
M. Kim, Y.-C. Bang, and H. Choo
Fig. 3. Some algorithms, and the ω-Trees for each ω =0.0, 0.5, and 1.0
We use the TM algorithm with N and construct the Tω:0.5 as Fig. 3 (d). Fig. 3 (c)-(e) show the trees constructed by the new parameter for each weight ω. As indicated in Table 1, the tree cost sequence is φC (TT M ) ≤ φC (Tω:1.0 ) ≤ φC (Tω:0.5 ) ≤ φC (Tω:0.0 ) ≤ φC (TDijkstra ) and the tree delay sequence is φD (TDijkstra ) ≤ φD (Tω:0.0 ) ≤ φD (Tω:0.5 ) ≤ φD (Tω:1.0 ) ≤ φD (TT M ). Therefore, our method is quite likely performance of k th shortest tree algorithm.
Table 1. The comparison with example results TDijkstra Fig. 3 (b) φC (T ) φD (T ) 19 13
Tω:0.0 Fig. 3 (c) φC (T ) φD (T ) 19 13
Tω:0.5 Fig. 3 (d) φC (T ) φD (T ) 15 14
Tω:1.0 Fig. 3 (e) φC (T ) φD (T ) 14 19
TT M Fig. 3 (f) φC (T ) φD (T ) 14 19
Multicast ω-Trees Based on Statistical Analysis
4
1097
Performance Evaluation
4.1
Random Real Network Topology for the Simulation
Random graphs of the acknowledged model represent different kinds of networks, communication networks in particular. There are many algorithms and programs, but the speed is usually the main goal, not the statistical properties. In the last decade the problem was discussed, for example, by B. M. Waxman (1993) [19], M. Doar (1993, 1996) [5, 6], C.-K. Toh (1993) [18], E. W. Zegura, K. L. Calvert, and S. Bhattacharjee (1996) [20], K. L. Calvert, M. Doar, and M. Doar (1997) [3], R. Kumar, P. Raghavan, S. Rajagopalan, D. Sivakumar, A. Tomkins, and E. Upfal (2000) [12]. They have presented fast algorithms that allow the generation of random graphs with different properties, in particular, these are similar to real communication networks. However, none of them have discussed the stochastic properties of generated random graphs. A. S. Rodionov and H. Choo [16] have formulated two major demands for the generators of random graph: attainability of all graphs with required properties and uniformity of distribution. If the second demand is sometimes difficult to prove theoretically, it is possible to check the distribution statistically. The generation of random real network topologies is proposed by A. S. Rodionov and H. Choo [16], for the evaluation and the simulation results based on the network topology generated. The method uses parameter Pe , the probability of link existence between any node pair. We use the method by Rodionov and Choo. 4.2
Simulation Results
We now describe some numerical results, comparing the tree costs and the tree delays for each algorithms, respectively. Algorithms are the optimal delay solution with Dijkstra’s algorithm, the cost optimized multicast route with TM 70
300
60
250
50
Tree Delay
Tree Cost
200
150
40
30
100
20 50
0 Dijkstra 0.0 -0.2
10
0.2
0.4
0.6
0.8
1.0 TM 1.2
0 -0.2 Dijkstra 0.0
0.2
(a) Tree Cost
0.4
0.6
0.8
Weight ω
Weight ω
(b) Tree Delay Fig. 4. |M| : 15% of |V|=200
1.0 TM 1.2
1098
M. Kim, Y.-C. Bang, and H. Choo
algorithm, and ω-Trees. The proposed one is implemented in C. The 10 different network environments are generated for each size of given 200 nodes with Pe =0.3. A source s and destination nodes M , |M | = 15% of |V | = 200, are randomly selected in the network topology. We simulate 100 times (total 10×100 = 1000) for each network topology. Fig. 4 shows the simulation results for our method. The average tree cost is decreasing as ω is to be nearly 1. On the contrary if the average tree delay is increasing as ω is to be nearly 1. Therefore, ω plays on important role to combine two independent measures, the cost and the delay. If a delay bound is given, then we may find the tree which is appropriate for the minimum tree cost and acceptable the tree delay. Since the new parameter takes into both the cost and the delay consideration at the same time, it seems reasonable to use the new weight parameter.
5
Conclusion
We studied the efficient routing problem in the Steiner tree problem with QoS requirements. We formulated the new weight parameter which had taken into both the cost and the delay consideration at the same time. The cost of optimal delay solution is relatively more expensive than the cost of cost optimized multicast route, and moreover, the delay of the cost optimized multicast route is relatively higher than the delay of optimal delay solution. The weight ω plays on important role to combine two the measures. If the ω is nearly 0, then the tree delay is low. Otherwise the tree cost is low. Therefore if we decide the ω, then we find the efficient multicast routing tree. When network users have various QoS requirements, the proposed multicast weight parameter is very informative for them.
References 1. Y.-C. Bang and H. Choo, “On multicasting with minimum costs for the Internet topology,” Springer-Verlag Lecture Notes in Computer Science, vol. 2400, pp.736744, August 2002. 2. K. Bharath-Kumar and J. M. Jaffe, “Routing to multiple destinations in computer networks,” IEEE Trans. Commun., vol. COMM-31, no. 3, pp. 343-351, March 1983. 3. K.L. Calvert, M. Doar, and M. Doar, “Modelling Internet Topology,” IEEE Communications Magazine, pp. 160-163, June 1997. 4. E. Dijkstra, “A note on two problems in connexion with graphs,” Numerische Mathematik, vol. 1, pp. 269-271, 1959. 5. M. Doar, Multicast in the ATM environment. PhD thesis, Cambridge Univ., Computer Lab., September 1993. 6. M. Doar, “A Better Mode for Generating Test Networks,” IEEE Proc. GLOBECOM’96, pp. 86-93, 1996. 7. M. R. Garey and D. S. Johnson, Computers and Intractability: A Guide to the Theory of NP-Completeness, W. H. Freeman and Co., San Francisco, 1979. 8. E. N. Gilbert and H. O. Pollak, “Steiner minimal tree,” SIAM J. Appl. Math., vol. 16, 1968.
Multicast ω-Trees Based on Statistical Analysis
1099
9. S. L. Hakimi, “Steiner’s problem in graphs and its implication,” Networks, vol. 1, pp. 113-133, 1971. 10. V. P. Kompella, J. C. Pasquale, and G. C. Polyzos, “Multicast routing for multimedia communication,” IEEE/ACM Trans. Networking, vol. 1, no. 3, pp. 286-292, June 1993. 11. L. Kou, G. Markowsky, and L. Berman, “A fast algorithm for steiner trees,” Acta Informatica, vol. 15, pp. 141-145, 1981. 12. R. Kumar, P. Raghavan, S. Rajagopalan, D Sivakumar, A. Tomkins, and E Upfal, “Stochastic models for the Web graph,” Proc. 41st Annual Symposium on Foundations of Computer Science, pp. 57-65, 2000. 13. A. Papoulis and S. U. Pillai, Probability, Random Variables, and Stochastic Processes, 4th ed. McGraw-Hill, 2002. 14. R.C. Prim, “Shortest Connection Networks And Some Generalizations,” Bell System Techn. J. 36, pp. 1389-1401, 1957. 15. S. Ramanathan, “Multicast tree generation in networks with asymetric links,” IEEE/ACM Transactions on Networking, vol. 4, no. 4, pp. 558-568, 1996. 16. A.S. Rodionov and H. Choo, “On Generating Random Network Structures: Connected Graphs,” Springer-Verlag Lecture Notes in Computer Science, vol. 3090, pp. 483-491, September 2004. 17. H. Takahashi and A. Matsuyame, “An approximate solution for the steiner problem in graphs,” Mathematica Japonica, vol. 24, no. 6, pp. 573-577, 1980. 18. C.-K. Toh, “Performance Evaluation of Crossover Switch Discovery Algorithms for Wireless ATM LANs,” IEEE Proc. INFOCOM’96, pp. 1380-1387, 1993. 19. B. W. Waxman, “Routing of multipoint connections,” IEEE J-SAC, vol. 6, no. 9, pp. 1617-1622, December 1988. 20. E.W. Zegura, K. L. Calvert, and S. Bhattacharjee, “How to model an Internetwork,” Proc. INFOVCOM’96, pp. 594-602, 1996. 21. Q. Zhu, M. Parsa, and J. J. Garcia-Luna-Aceves, “A source-based algorithm for near-optimum delay-constrained multicasting,” Proc. IEEE INFOCOM’95, pp. 377-385, March 1995.
The Gateways Location and Topology Assignment Problem in Hierarchical Wide Area Networks: Algorithms and Computational Results Przemyslaw Ryba and Andrzej Kasprzak Wroclaw University of Technology, Chair of Systems and Computer Networks, Wybrzeze Wyspianskiego 27, 50-370 Wroclaw, Poland [email protected] [email protected]
Abstract. This paper studies the problem of designing two-level hierarchical structure wide area network. The goal is to select gateways location, 2nd level network topology, channels capacities and flow routes in order to minimize the total average delay per packet in hierarchical network subject to budget constraint. The problem is NP-complete. Then, the branch and bound method is used to construct the exact algorithm. Also a heuristic algorithm is proposed. Some computational results are reported. Based on computational experiments, several properties of the considered problem, important from practical point of view, are formulated.
1 Introduction Designing the huge wide area networks (WAN) containing hundreds of hosts and communication links is very difficult and demanding task. Design procedures (algorithms) suitable for small and moderate-sized networks, when applied directly to large networks, become very costly (from computational point of view) and sometimes infeasible. Design of huge wide area networks requires decomposition of design process. Common way to decompose design of huge WAN is to introduce hierarchical network topology [1]. In hierarchical WAN, nodes are grouped in clusters on the 1st level hierarchy, which in turn are grouped into 2nd level cluster and so on. The communication network of each cluster can be designed separately. In each cluster special communication nodes called “gateways”, which provide communication between nodes from different 1st level networks, are selected. Traffic between nodes in the same cluster uses paths restricted to local communication network. Traffic between nodes in different 1st level networks is first sent to local gateway, then via 2nd level network of gateways is sent to gateway located in destination 1st level network to finally reach the destination node. Example of structure of the hierarchical wide area network is presented in the Fig. 1. In the paper the exact and heuristic algorithms for simultaneous gateways location, network topology, channels capacity and flow assignment in two-level hierarchical wide area network are presented. The considered problem is formulated as follows: M. Gavrilova et al. (Eds.): ICCSA 2006, LNCS 3982, pp. 1100 – 1109, 2006. © Springer-Verlag Berlin Heidelberg 2006
The Gateways Location and Topology Assignment Problem in Hierarchical WAN
1101
2nd level cluster 1st level clusters
1st level nodes gateways - 2 nd level nodes 1st level channels 2nd level channels
Fig. 1. Structure of the hierarchical wide area network
given:
topology of 1st level networks, potential gateways locations, set of potential 2nd level network channels and their possible capacities and costs (i.e. cost-capacity function), traffic requirements, budget of the network, minimize: total average delay per packet given by Kleinrock’s formula [2], over: gateways locations, 2nd level network topology, channel capacities, multicommodity flow (i.e. routing), subject to: multicommodity flow constraints, channel capacity constraints, budget constraint. Discrete cost-capacity function considered here is most important from the practical point of view as channels capacities are chosen from the sequence defined by ITU-T recommendations. The problem formulated above is NP-complete. In section 3 of the paper NP-completeness of considered problem is proven. Some algorithms for hierarchical network design can be found in [5], [7]. However, they are limited to tree topology of hierarchical network. In the paper [6] algorithm for router location, which includes simplified topology assignment without allocating capacities to channels, is presented. Also, in [5] algorithm for interconnecting two WANs is presented. In the paper [7] it was noticed that it is necessary to design network topology and gateway locations jointly. However, presented in [7] problem was limited to joint gateway location and capacity assignment in tree topology of the hierarchical network and heuristic algorithm was proposed only. This paper joins problem of locating gateways in hierarchical wide area network with topology and capacity assignment problem, i.e. simultaneously gateways locations and topology of 2nd level network and capacities of channels of 2nd level network are selected. Thus, problem presented in the paper is more general and more important from practical point of view than problems considered in the literature.
2 Problem Formulation Consider a hierarchical wide area network consisting of K networks on 1st level of hierarchy, each denoted by S1l , l = 1,..., K , and one 2nd level network denoted by S2. Let N 1l be set of nodes and L1l set of channels of 1st level network S1l . Let n be the
1102
P. Ryba and A. Kasprzak
total number of nodes in hierarchical network. Set of nodes N2 consists of selected gateways and L2 is set of channels connecting them. Let m be number of potential channels in 2nd level of hierarchical WAN and p – total number of channels. For each potential channel i there is the set C i = {c1i ,..., c si (i )−1} of alternative values of capaci-
ties from which exactly one must be chosen if the channel was chosen to build the 2nd level network. Let d ki be the cost of leasing capacity c ki [$/month]. Let c si (i ) = 0 for i = 1,..., m. Then C i = C i ∪ {c si (i ) } be the set of alternative capacities from among which exactly one must be used to channel i. If the capacity c si (i ) is chosen then the channel i is not used to build the network. Let x ki be the discrete variable for choosing one of available capacities for channel i defined as follows: x ki = 1 , if the capacity c ki is assigned to channel i and x ki = 0 , otherwise. Since exactly one capacity from the set C i must be chosen for channel i, the following condition must be satisfied: s(i )
∑ xki = 1 for
i = 1, ..., m
(1)
k =1
Let X i = {x1i ,..., x si (i ) } be the set of variables x ki corresponding to the i-th channel. Let X r' be the permutation of values of variables x ki , i = 1,..., m satisfying the condition (1), and let Xr be the set of variables which are equal to 1 in X r' . Let denote by H l set of gateways to place in network S1l and by Jg set of possible locations for gateway g. Let y ag be the discrete variable for choosing one of available locations for gateway g defined as follows: y ag = 1, if gateway g is located in node a and y ag = 0, otherwise. Each gateway must be placed in exactly one node; thus, it is required to satisfy following condition:
∑ y ag = 1,
a∈J g
g ∈ H l , l = 1,..., K
(2)
Let Yr' be the permutation of values of all variables y ag for which condition (2) is satisfied and let Yr be the set of variables which are equal to 1 in Yr' . The pair of sets (Xr,Yr) is called a selection. Each selection (Xr,Yr) determines locations of gateways and channels capacities in the 2nd level of hierarchical wide area network. Let ℜ be the family of all selections. Let rij be the average packet rate transmitted from node i to node j in the hierarchical network. It is assumed that rii = 0 for i = 1,...,n. Let T(Xr, Yr) be the minimal average delay per packet in the hierarchical network, in which values of channels capacities are given by Xr and locations of gateways are
The Gateways Location and Topology Assignment Problem in Hierarchical WAN
1103
given by Yr. T(Xr, Yr) can be obtained by solving a multicommodity flow problem in the network [2], [3]: T ( X r , Yr ) = min f
f
∑ c i −i f γ 1
i∈L
(3) i
subject to: f is a multicommodity flow satisfying the requirements rij i, j = 1, ..., n and fi ≤ ci for every channel i = 1,...,p, where f = [ f 1 ,..., f p ] is the vector of multi-
commodity flow, fi is the total average bit rate on channel i, and γ is the total packet arrival rate from external sources at all nodes of the wide area network. Then, the considered gateway location, topology assignment problem in hierarchical wide area network can be formulated as follows: min T ( X r , Yr )
(4)
( X r , Yr ) ∈ ℜ
(5)
( X r ,Yr )
subject to: d ( X r , Yr ) =
∑x d i k
i k
≤B
x ik ∈X r
(6)
where B denotes the budget of the hierarchical wide area network.
3 The Branch and Bound Algorithm Assuming that ⎟H l⎜ = 1 for l = 1,...,K, and C i = C i for i = 1,...,p, the problem (4-6) is resolved itself into the “topology, capacity and flow assignment problem” which is known as a NP-complete [3]. Since the problem (4-6) is more general it is also NP-complete. Then, the branch and bound method can be used to construct the exact algorithm for solving the considered problem. The detailed description of the calculation scheme of the branch and bound method may be found in the papers [8], [9]. The branch and bound method involves constructing two important operations specific for considered here problem: branching rules and lower bound. These operations are presented in following sections. 3.1 Branching Rules
The purposes of branching rules is to find the normal variable from the selection (Xr, Yr) for complementing and generating a successor (Xs, Ys) of the selection (Xr, Yr) with the least possible value of the criterion function (3). We can choose a variable
xki or a variable y ag . For fixed variables y ag the problem (4-6) may be resolved into classical topology design problem. Then we can use the choice criterion Δirkj on variables x ki presented in the paper [4].
1104
P. Ryba and A. Kasprzak
To formulate the choice criterion on variables y ag we use the following theorem: Theorem 1. Let ( X r , Yr ) ∈ ℜ . If the selection ( X s , Ys ) is obtained from the selection
( X r , Yr ) by complementing the variable y ag ∈ Yr by the variable ybg ∈ X s then T ( X s , Ys ) ≤ T ( X r , Yr ) − δ abgr , where
δ abgr
~ ⎧1 ⎛ fi fi f i ⎞⎟ ~ ⎪ ⎜ if f i < c i for i ∈ Ll1 − + ~ i i i ⎪ γ ⎜ i∈L c − f i i∈Ll c − f i i∈L − Ll c − f i ⎟ 1 1 ⎠ =⎨ ⎝ ⎪ ⎪∞ otherwise ⎩
∑
∑
∑
(7)
~ f i = f ir − f ia' + f ia'' : f ia'' corresponds to the packets flow between the nodes of network
S1l and nodes of the remaining 1st level networks via the gateway located at node a, f ia' corresponds to the packets exchanged between the nodes of network S1l and nodes of the remaining 1st level networks via the gateway located at node b after reallocating gateway from node a to node b. Let E r = ( X r ∪ Yr ) − Fr , and let Gr be the set of all reverse variables of normal variables, which belong to the set Er, where Fr is the set of variables constantly fixed in the r-th iteration of branch and bound algorithm. We want to choose a normal variable the complementing of which generates a successor with the possible least value of total average delay per packet. We should choose such pairs y ag , ybg : y ag ∈ E r , ybg ∈ Gr or x ki , x ij : xki ∈ E r , x ij ∈ Gr , for which the value of
{(
)
criterion
}
gr δ ab
or Δ
ir kj
{(
)
}
is maximal.
3.2 Lower Bound
The lower bound LBr of the criterion function (3) for every possible successor (Xs,Ys) generated from the selection (Xr,Yr) may be obtained by relaxing some constraints in the problem (4-6). To find the lower bound LBr we reformulate the problem (4-6) in the following way: -
we assume that the variables x ki and y ag are continuous variables, we approximate the discrete cost-capacity curves (given by the set Ci) with the lower linear envelope [2]. Then, the constraint (6) may be relaxed by the constraint Σ d i c i ≤ B , where d i = min (d ki / c ki ) and c i is the capacity of the chanxki ∈X i
nel i (continuous variable). The solution of such reformulated problem can be found using the method proposed and described in the paper [10].
The Gateways Location and Topology Assignment Problem in Hierarchical WAN
1105
4 Heuristic Algorithm The presented exact algorithm involves the initial selection (X1,Y1) ∈ ℜ for which the constraints (5) and (6) are satisfied [8]. Moreover, the initial selection should be the near-optimal solution of the problem (4-6). To find the initial selection the following heuristic algorithm is proposed. This heuristic algorithm may be also used to design of the hierarchical WAN when the optimal solution is not necessary. Step 1. For each gateway g, choose initial location from the set Jg such that cost of 2nd level network is minimal. Next, solve the classical topology assignment problem [4]. If this problem has no solution then the algorithm terminates – the problem (4-6) has no solution. Otherwise, perform T* = T, where T is average packet delay in the network obtained by solving the topology assignment problem. gr Step 2. Choose the pair of gateway locations with maximal value δ ab given by expression (7). Change location of gateway on this indicated by that expression. Step 3. Solve the topology assignment problem. If the obtained value T is lower than T* then perform T* = T and go to step 2. Otherwise, the algorithm terminates. The feasible solution is found. The network topology and gateways locations associated with the current T* is heuristic solution of the problem (4-6).
5 Computational Results The presented exact and heuristic algorithms were implemented in C++ code. Extensive numerical experiments have been performed with these algorithms for many different hierarchical network topologies and for many possible gateway locations. The experiments were conducted with two main purposes in mind: first, to examine the impact of various parameters on solutions to find properties of the problem (4-6) important from practical point of view and second, to test the computational efficiency of the algorithms. The dependence of the optimal average delay per packet T on the budget B has been examined. In the Fig. 2 the typical dependence of T on the budget B is presented for different values of the average packet rates from external sources transmitted between each pair of nodes of the hierarchical network. 0,040
T opt
rk = 0,1 rk = 0,2
0,035
rk = 0,3 0,030 0,025 100
B [∗102 ] 200
300
400
500
600
700
Fig. 2. The dependence of the criterion function T on the budget B
1106
P. Ryba and A. Kasprzak
It follows from the Fig. 2 that there exists such budget B*, that the problem (4-6) has the same solution for each B greater or equal to B*. It means that the optimal solution of the problem (4-6) is on the budget constraint (6) for B ≤ B* and it is inside the set of feasible solutions for B > B*. Conclusion 1. In the problem (4-6), for fixed average packets rate from external sources, there exists such value B* of the budget, that for each B ≥ B* we obtain the same optimal solution. It means that for B ≥ B* the constraint (6) may be substituted by the constraint d(Xr,Yr) ≤ B* Moreover, it follows from results obtained for many considered networks, that the optimal delay per packet in terms of budget B is the function of the following class: T=
u1 + u3 B + u2
(8)
where u1, u2 and u3 are constant coefficients. To find the function (8) for some hierarchical network it is enough to know only the values of u1, u2 and u3, which can be computed by solving the identification problem, e.g. using the Least Squares Method. 0,03 T1
T
T2
0,02 0,01
B [∗102 ]
0,00 90
190
290
390
490
590
Fig. 3. Dependences of T1 and T2 on budget B
Let T1 be the average delay per packet in any 1st level network and T2 be the average delay per packet in 2nd level network obtained by solving the problem (4-6). The typical dependences of T1 and T2 on budget B are presented in the Fig. 3. The observations following from the computer experiments may be formulated in the form of the conclusion below. Conclusion 2. In the problem (4-6) there exists such value B′ that for every B > B′, in each 1st level network, we obtain the same average delay per packet. Moreover, B′ < B*. The observations presented as the conclusions 1 and 2 are very important from practical point of view. It shows that the influence of the investment cost (budget) on the optimal solution of the considered problem is limited, especially for greater values of budget B. Let G be the number of gateways, which must be allocated in some 1st level network of the hierarchical network. The dependence of the optimal value of average
The Gateways Location and Topology Assignment Problem in Hierarchical WAN
1107
delay per packet T on number of gateways G has been examined. Fig. 4 shows the typical dependence of the delay T on number of gateways G.
0,035
T opt
B = 125 B = 175
0,033
B = 250 0,031
G
0,029 1
2
3
4
5
6
Fig. 4. The dependence of T opt on number of gateways G
It follows from the numerical experiments and from the Fig. 4 that the function T(G) is convex. Moreover, there exists the minimum of the function T(G). The observations following from the computer experiments may be formulated in the form of the conclusions below. Conclusion 3. The performance quality of the hierarchical wide area networks depends on the number of gateways in the 1st level networks. Conclusion 4. There exists such value of the number of gateways G, in some 1st level network of the hierarchical network, for which the function T(G) is minimal. Let Gmin be the number of gateways in some 1st level network such that T (Gmin ) = min T (G ). Conclusion 5. The number of gateways Gmin in the hierarchical network, for which the value of the function T(G) is minimal, depends on the budget B. Because the algorithm for the gateways location and topology assignment problem assumes that the numbers of gateway for each 1st level network are given, then the properties described in conclusion 4 and 5 allow to simplify the design process of the hierarchical network. First, we may calculate the number of gateways for each 1st level network and next, we may solve the gateways location and topology assignment problem using the presented exact or heuristic algorithm. In order to evaluate the computational efficiency of the exact algorithm many networks were considered. For each network the number of iteration of the algorithm was recorded and compared for all considered networks. Let Dmax be the maximal building cost of the network, and let Dmin be the minimal building cost of the network; the problem (4-6) has no solution for B < Dmin. To compare the results obtained for different hierarchical wide area networks topologies we introduce the normalized budget B = ((B − Dmin ) / (Dmax − Dmin )) ⋅ 100%.
1108
P. Ryba and A. Kasprzak
( )
Moreover, let Φ i B be the number of iterations of the branch and bound algorithm to obtain the optimal value of T for normalized budget equal to B for i-th considered network topology. Let
Φ(u, v ) =
1 Z
Z
⎛
∑ ⎜⎜ ∑ Φ i ( B ) i =1 ⎝ B∈[u ,v ]
⎞
∑ Φ i ( B ) ⎟⎟ ⋅ 100%
B∈[ 0,100 ]
⎠
be the arithmetic mean of the relative number of iterations for B ∈[u,v] calculated for all considered network topologies and for different number of gateways, where Z is the number of considered wide area networks. Fig. 5 shows the dependency of Φ on divisions [0%, 10%), [10%, 20%), ..., [90%, 100%] of normalized budget B . It follows from Fig. 4 that the exact algorithm is especially effective from computational point of view for B ∈[30%, 100%].
Fig. 5. The dependence of Φ on normalized budget
B
Fig. 6. The difference between heuristic and optimal solutions
The distance between heuristic and optimal solution has been also examined. Let T heur be the solution obtained by heuristic algorithm and T opt be the optimal value obtained by the exact algorithm for the problem (4-6). Let χ denote the distance between heuristic and optimal solutions: χ = T heur − T opt
T opt ⋅ 100% . The value χ
shows how the results obtained using the heuristic algorithm are worse than the optimal solution. Let
The Gateways Location and Topology Assignment Problem in Hierarchical WAN
1109
number of solutions for which χ ∈ [a, b] ⋅ 100% number of all solutions denote number of solutions obtained from heuristic algorithm (in percentage) which are greater than optimal solutions more than a% and less than b%. Fig. 6 shows the dependence Θ on divisions [0%−1%), [1%−5%), ..., [80%−100%).
Θ[a, b] =
6 Conclusion The exact and heuristic algorithms for solving the gateway location and network topology assignment problem in hierarchical network are presented. The considered problem is more general than the similar problems presented in the literature. It follows from computational experiments (Fig. 6) that more than 50% approximate solutions differ from optimal solutions at most 1%. It is necessary to stress that over 90% approximate solutions differ from optimal solutions at most 10%. Moreover, we are of opinion that the hierarchical network properties formulated as conclusions 4 and 5 are important from practical point of view. They say that there is the influence of the gateways number on performance quality of the hierarchical network.
Acknowledgment. This work was supported by a research project of The Polish State Committee for Scientific Research in 2005-2007.
References 1. Kleinrock L., Kamoun F.: Optimal Clustering Structures for Hierarchical Topological Network Design of Large Computer Networks. Networks 10 (1980) 221-248 2. Fratta, L., Gerla, M., Kleinrock, L.: The Flow Deviation Method: an Approach to Storeand-Forward Communication Network Design. Networks 3 (1973) 97-133 3. Kasprzak, A.: Topological Design of the Wide Area Networks. Wroclaw University of Technology Press, Wroclaw (2001) 4. Markowski M., Kasprzak A.: The Web Replica Allocation and Topology Assignment Problem in Wide Area Networks: Algorithms and Computational Results. Lecture Notes in Computer Science 3483 (2005) 772-781 5. Liang S.C., Yee J.R.: Locating Internet Gateways to Minimize Nonlinear Congestion Costs. IEEE Transactions On Communications 42, (1994) 2740-50 6. Liu Z., Gu Y., Medhi D.: On Optimal Location of Switches/Routers and Interconnection. Technical Report, University of Missouri-Kansas City (1998) 7. Saha D. and Mukherjee A.: On the multidensity gateway location problem for multilevel high speed network. Computer Communications 20 (1997) 576-585 8. Wolsey, L.A.: Integer Programming. Wiley-Interscience, New York (1998) 9. Walkowiak K.: A Branch and Bound Algorithm for Primary Routes Assignment in Survivable Connection Oriented Networks. Computational Optimization and Applications 27, Kluwer Academic Publishers (2004) 149-171 10. Markowski M., Kasprzak A.: An exact algorithm for host allocation, capacity and flow assignment problem in WAN. Internet Technologies. Applications and Societal Impact, Kluwer Academic Publishers, Boston (2002) 73-82
Developing an Intelligent Supplier Chain System Collaborating with Customer Relationship Management Gye Hang Hong1 and Sung Ho Ha2 1
Hyundai Information Technology Research Institute, Yongin-Si, Gyoungki-do, 449-910, Korea 2 School of Business Administration, Kyungpook National University, Daegu, 702-701, Korea [email protected]
Abstract. We propose an intelligent supply chain system collaborating with customer relationship management system in order to assess change in a supply partner’s capability over a period of time. The system embeds machine learning methods and is designed to evaluate a partner’s supply capability that can change over time and to satisfy different procurement conditions across time periods. We apply the system to the procurement and management of the agricultural industry.
1 Introduction Nowadays the business environment is changing from the product-centered stage to customer-centered stage. Supply capacity of companies exceeds customer demands because of continuous improvement of production technologies. In addition, the globalization of trade and prevalence of communication networks make companies more compete with competitors. As a result, producers are not a market leader any more. Companies can not survive from the global competition without understanding customers. They must understand who their customers are, what they buy and when they buy, and even predict their behavior. Customer-centered markets convert the traditional production environment to new one which requires quick responses to changes in customer needs. Companies have integrated various internal resources located inside the organization and constructed electric document interchange systems to exchange production information with their partners. They begin to invite production partners into their projects or businesses, organize a valuable supply chain network, and collaborate with them in many business areas. Therefore, the success of business depends on not competition among companies but competition among supply chain networks. In this business environment, constructing a valuable supply chain determines the competitive power of companies: Quick response to customer needs, and effective selection and management of good supply partners. Therefore, we develop a decision support system for constructing an intelligent supply chain collaborating with the customer relationship management. In this system, we explain how manufacturers identify their valuable customers and how they establish competitive supply chains to select and manage their good supply partners. M. Gavrilova et al. (Eds.): ICCSA 2006, LNCS 3982, pp. 1110 – 1118, 2006. © Springer-Verlag Berlin Heidelberg 2006
Developing an Intelligent Supplier Chain System
1111
2 Current Issues in Supply Chain Management 2.1 Evaluating Changes in Supply Capability Conditions over Time Supply capability conditions of partners can change over time. It is difficult for supply partners to maintain the same capability conditions for all supply periods because of changes in delivery conditions, inventory levels, and the overall market environment [10]. In particular, it is hard when suppliers are selected for products which have seasonal demands and capability can fluctuate over time (e.g., agricultural products). Customer consumption patterns also vary over time in industries producing seasonal products. Therefore manufacturers must adopt different procurement strategies and evaluate their suppliers according to the changing consumption. However, most supplier selection models did not consider that supply capabilities and procurement conditions could change over time. Recent studies have considered only comprehensive supply capability conditions during the total analyzed periods [2],[6],[8],[11],[12]. In this case, suppliers may not be selected due to the impact of the low quality of the last season, regardless of the fact that they were in good standing for this period. After a supplier has been chosen, a manufacturer is not able to take necessary action even though product quality could decrease after a certain period of time. Thus, it is important that a supply chain system divide all analyzed periods into meaningful period units (PFA), evaluate the supply conditions of each unit, and compile the results. 2.2 Evaluating Supply Partners with Multiple Criteria A supply chain system has to consider multi-criteria, including quantitative and qualitative aspects, in order to evaluate a supply partner’s capability. In an early study on supplier selection criteria, Dickson [3] identified 23 criteria considered by purchasing managers when selecting various partners. Since the Dickson study, many researchers have identified important criteria that have varied according to industry and purchasing situations. In his portfolio approach, Kraljic identified purchasing in terms of profit impact and supply risk [7]. Profit impact includes such elements as the expected monetary volume, which is tied with goods or services to be purchased and the impact on future product quality. Indicators of supply risk may include the availability of goods or services under consideration and the number of potential partners. Therefore, in this study, we choose such criteria as price, delivery, quality, and order quantity to evaluate the profit impact. In determining supply risk, we use the criteria of reputation and position, warranty and claim strategies, and the share of information. 2.3 Selecting Partners Who Satisfy the Procurement Conditions Supply partners may be able to meet some, but not all of the procurement conditions. One partner may be able to meet price and quality demands, but it may not be able to satisfy quantity or delivery conditions. Another partner can guarantee quantity and delivery, but not price requirement. Due to different conditions among partners, manufacturers want to select supply partners that can satisfy all their needs as well as maximize their revenue. Optimal partners must be selected who not only maximize the revenue of a purchaser, but also manage a level of supply risk. Supply risk changes over a period of time. In order to appropriately manage the level of risk, it needs to be
1112
G.H. Hong and S.H. Ha
measured by period. Then, partners can be qualified, either rigorously or less, according to the level of risk. A supply chain system has to qualify partners rigorously in the case of high risk and qualify partners less rigorously for the case of low risk.
3 Intelligent Supply Chain System (ISCS) The ISCS consists of five main functions, as shown in Fig. 1. They include defining multiple criteria, dividing the total contract period into periods for assessment (PFA), establishing ideal purchasing conditions, segmenting supply partners by PFA, and selecting the optimal partners by PFA.
Fig. 1. An intelligent supply chain system collaborating with a customer relationship management system
The CRM system conjoins customer profiles and sales history, measures customer lifetime values or RFM (Recency, Frequency, and Monetary) values, and distinguishes valuable customers. The system informs the ISCS of valuable customers and their predicted demand. On the other hand, the ISCS defines important criteria with regard to assessing supply partners. The ISCS, then, divides the overall purchasingcontract period into PFAs from the viewpoint of supply market risk. The level of supply risk is defined as the difference (gap) between supply and demand for each period. The derived PFAs feed back to the CRM system. The CRM system identifies buying behaviors of the valuable customers in each PFA. The system finds out important factors which customers consider when they make a purchase an item in each PFA. Customers may consider that quality is an important factor in a period; however, they can change their mind and think that price is more important in other periods. Therefore, a manufacturer must know important factors which have much effect on the buying behaviors of the customers. In addition, the manufacturer should know how much the change in customer satisfaction is sensitive to the change in values of the important factors.
Developing an Intelligent Supplier Chain System
1113
Then, the ISCS searches the available supply partners and qualifies them in order to only select a small number of partners who have similar supply capability. In doing so, the ISCS segments all available partners into several groups having similar characteristics, and evaluates each group to choose qualified ones. The chosen partners have the same characteristics of supply capability, since the ISCS selects them from the same qualified groups. To sum up, the CRM system is helpful in predicting customer demand, identifying behaviors of the valuable customers in each PFA, and discovering the purchasing factors of importance. With this useful information in hand, the ISCS can track changes in supply capability over time, select optimal supply partners whenever appropriate.
4 Application of ISCS to Seasonal Products 4.1 Distinguishing Valuable Customers The CRM system identifies valuable customers in terms of RFM. It extracts RFM values from sales data and divides all the customers into several customer segments which have a similar RFM patterns. The RFM clustering is one of the methods for discovering customer segments [1], [4]. RFM is defined as follows: Recency is the time period of last purchase during the analyzed time period; Frequency is the number of purchases during the analyzed time period; Monetary is the amount of spent money during the analyzed time period. The CRM system uses a self-organizing map (SOM) with three input nodes and three by three output nodes. Each input (i.e., recency, frequency, and monetary values) can fall into one of the following categories: above the average or below the average. Table 1 summarizes the segmentation results. Table 1. Clustering customers into four segments in terms of RFM. The sign ↑ means above the average and ↓ means below the average.
Segment 1 2 3 4 Average
R(ecency) R↑ (0.66) R↓ (0.38) R↓ (0.22) R↑ (0.76) 0.51
F(requency) F↑ (0.42) F↑ (0.88) F↓ (0.17) F↓ (0.09) 0.39
M(onetary) M↑ (0.54) M↑ (0.72) M↓ (0.08) M↓ (0.05) 0.35
No. of customers 20 4 46 70
As shown in Table 1, segment 1 is better than others in terms of RFM. Segment 2 is also superior to others in terms of FM. It implies that segments 1 and 2 contain valuable customers. However, segments 3 and 4 are less valuable groups, since they make a purchase small quantity of items during a short period. 4.2 Diving the Total Contract Period into Periods for Assessment In order to assess the capability of a supply partner dynamically, the ISCS divides all periods into periods for assessment (PFA), in which the risk of the supply market is similar. A large difference between supply and demand indicates a low-market risk
1114
G.H. Hong and S.H. Ha
because the manufacturer can find alternatives easily and is able to pay a low switching cost, when a partner does not deliver a product or service. A small difference between supply and demand entails a high-market risk. The ISCS measures the market risk by period and then groups similar market-risk periods through a genetic algorithm and a linear regression model (See Eq. 1). N
Fitness function = α ∑ Ri2 wi + β F ( N )
(1)
i =1
where Ri2 = max{Ri2,exp , Ri2,linear , Ri2,log } , wi =
no. of periods in the ith interval , no. of whole periods
N = no. of intervals , Ri2 = R 2 (residual error) of the ith interval , F ( N ) ∝ 1/ n, 0 ≤ α ≤ 1, 0 ≤ β ≤ 1, α + β = 1
When setting the total analyzed period to a single year, the ISCS obtains four meaningful periods for assessment. 4.3 Identifying Behaviors of the Valuable Customers in Each PFA After the ISCS divides the overall period into PFAs, the CRM system identifies the important factors which have much influence on buying behavior patterns of valuable customers in each PFA. In doing so, it employs a neural network technique (see Fig. 2) to discover two kinds of important knowledge: IF (Important Factor) and DCR (Degree of Contribution/Risk) of each IF.
Fig. 2. A neural network model for discovering the knowledge regarding customer buying behaviors
Developing an Intelligent Supplier Chain System
1115
The NN has five output nodes: three nodes to classify the customer level, one node to predict the revenue, and one node to predict the number of back orders. In order to represent the customer level, the customer segment number is used. That is, (1, 0, 0) denotes segment 1, (0, 1, 0) denotes segment 2, (0, 0, 1) represents segments 3 and 4. The NN has one hidden layer, six input nodes, and one bias. Six inputs include quality, average price, variance of price, average quantity, variance of quantity, and order frequency. To discover IF, the NN uses a sensitivity method, a feature weighting method [9]. The sensitivity of input node is calculated by removing the input node from the trained neural network (See Eq. 2). (∑L wi =
p0 - pi ) p0 n
(2)
where p0 is the normal prediction value for each training instance after training, and pi is the modified prediction value when the input node i is removed. L is the set of training data, and n is the number of training data. The NN also uses an activity method to measure the DCR of each IF. The activity method measures the variance of activation level of an input node (see Eq. 3). n
6
j =1
i =1
Ri = ∑((w(ji1) ) 2 × ( w(j2 ) ) 2 × var(sigm(∑w(ji1) × xi ))
(3)
Table 2 summarizes several IFs and their DCRs by each PFA. Table 2. Derived key factors of valuable customers over periods in time. Note that P denotes Prediction and C for Classification.
PFA Factor Quality
Level IF DCR FreLevel quency IF DCR Price Level IF DCR QuanLevel tity IF DCR Derived key factors
t1 P
C 1st – 2nd level 0.652 0.176 0.329 High 0.072 0.206 0.010 Above average 0.432 0.176 0.044 High 0.334 0.059 0.254 Quality, Quantity
t2 P
C 4th – 7th level 0.473 0.085 0.426 Average 0.024 0.026 0.042 Average 0.517 0.341 0.506 Average 0.023 0.286 0.001 Quality, Price
t3 P
C 1st – 2nd level 0.345 0.042 0.289 Average 0.002 0.116 0.042 Above average 0.365 0.304 0.427 Average 0.195 0.019 0.174 Quality, Price
t4 P C 1st level 0.486 0.112 0.019 Above average 0.574 0.418 0.356 High 0.045 0.074 0.001 Below average 0.462 0.007 0.316 Frequency, Quantity
Fig. 3 illustrates correlation relationships between IF and DCR in the different PFA. The larger values both IF and DCR have (upper right position in each graph), the more important IF is.
1116
G.H. Hong and S.H. Ha
Fig. 3. Correlation relationships between IF and DCR in the different PFA
Fig. 4. The SOM network using various types of input data. The learning function is w j (n + 1) = w j (n) + lr (n)h j ,i ( x ) (n)[ x (n) − w j (n)] where the numerical value is ( x − w j ) , and the Boolean or categorical values are 1, x = w j , otherwise, 0, x ≠ w j .
Developing an Intelligent Supplier Chain System
1117
4.4 Segmenting Supply Partners by PFA The ISCS segments supply partners into several groups which have similar supply conditions by using a SOM network–a special neural network [5]. Since the criteria for assessing partners consist of both quantitative and qualitative variables, the input data types for a SOM are different. Warranty and claim strategy criterion uses Boolean data, ‘yes (1)’ or ‘no (0)’. Reputation and position, and information sharing criteria use categorical data that include the following scale: outstanding (5), very good (4), good (3), average (2), and poor (1). The other criteria use numerical data. Therefore, in order to use the SOM, each data type is appropriately modified. Fig. 4 shows a SOM network with input variables. 4.5 Selecting the Optimal Supply Partners After clustering the supply partners, the ISCS evaluates the supply conditions of the supplier groups and qualifies them as follows: 1. In the case of a high-supply risk, among the supplier groups which have a higherthan-average value with regard to all qualitative criteria (quality, reputation and position, warranty and claim strategy, and sharing of information), the ISCS selects the candidate groups which are closest to the ideal purchasing conditions; 2. In the case of a low-supply risk, among all supplier groups, the ISCS selects the candidate groups which are nearest to the ideal purchasing conditions; 3. The ISCS conducts this process for all PFAs and obtains the qualified partner groups and qualified suppliers. Table 3 shows the supply conditions of the qualified groups in each PFA. Table 3. Supply conditions of the qualified groups. Note that QN stands for quantity, F for frequency, P for price, QL for quality, R for reputation and position, W for warranty and claim, and I for information sharing.
PFA
Group
QN
F
P
QL
R
W
I
t1
1 4 7 Avg 2 4 5 Avg 2 4 5 Avg 2 5 Avg
0.34 0.37 0.45 0.35 0.16 0.38 0.22 0.30 0.45 0.31 0.29 0.22 0.35 0.20 0.12
0.36 0.38 0.14 0.16 0.09 0.31 0.32 0.10 0.46 0.23 0.25 0.15 0.35 0.28 0.11
0.52 0.42 0.53 0.62 0.32 0.48 0.31 0.52 0.68 0.32 0.19 0.41 0.59 0.42 0.28
0.98 0.87 0.97 0.74 0.80 0.97 0.75 0.76 0.97 0.80 1.0 0.79 0.97 0.89 0.67
4 3 5 4 5 2 5 4 5 5 5 -
1 1 1 1 1 0 1 1 1 1 1 -
3 3 4 3 4 3 4 3 4 4 4 -
t2
t3
t4
# of partners 11 3 6 47 4 15 4 98 6 1 3 30 17 11 89
1118
G.H. Hong and S.H. Ha
5 Conclusion In this paper, we suggested an intelligent supply chain system collaborating with a customer relationship management system. The intelligent SCS adopted the following methods to solve the current issues on supply chains: 1. The ISCS divided the total analyzed period into meaningful PFAs, segmented suppliers into several clusters which have similar supply conditions within each PFA, and evaluated the clusters. 2. The system handled with the multiple criteria of quantitative and qualitative types with regard to a partner’s risks and profits. Then, the system chose supply partners who were deemed to be the most valuable from the selected clusters. It was able to solve the complexity of the problem by using those criteria. We applied this solution to the agricultural industry and illustrated the results.
References 1. Bult, J.R., Wansbeek, T.J.: Optimal Selection for Direct Mail. Marketing Science 14 (1995) 378–394 2. De Boer, L., Labro, E., Morlacchi, P.: A Review of Methods Supporting Supplier Selection. European Journal of Purchasing and Supply Management 7 (2001) 75-89 3. Dickson, G.W.: An Analysis of Vendor Selection Systems and Decisions. Journal of Purchasing 2 (1966) 5-17 4. Ha, S.H., Park, S.C.: Application of Data Mining Tools to Hotel Data Mart on the Intranet for Database Marketing. Expert System with Applications 15 (1998) 1-31 5. Han, J., Kamber, M.: Data Mining–Concepts and Techniques. Morgan Kaufmann, CA (2001) 6. Holt, G.D.: Which Contractor Selection Methodology?. International Journal of Project Management 16 (1998) 153-164 7. Kraljic, P.: Purchasing Must Become Supply Management. Harvard Business Review 61 (1983) 109-117 8. Lee, E.K., Ha, S., Kim, S.K.: Supplier Selection and Management System Considering Relationships in Supply Chain Management. IEEE Transactions on Engineering Management 48 (2001) 307-318 9. Shin, C.K., Yun, U.T, Kim, H.K., Park, S.C.: A Hybrid Approach of Neural Network and Memory-Based Learning to Data Mining. IEEE Trans. on Neural Networks 11 (2000) 637-646 10. Talluri, S., Sarkis, J.: A Model for Performance Monitoring of Suppliers. International Journal of Production Research 40 (2002) 4257-4269 11. Weber, C.A., Current, J.R., Desai, A.: Non-cooperative Negotiation Strategies for Vendor Selection. European Journal of Operational Research 108 (1998) 208-223 12. Weber, C.A., Desai, A.: Determination of Path to Vendor Market Efficiency Using Parallel Coordinates Representation: a Negotiation Tool for Buyers. European Journal of Operational Research 90 (1996) 142-155
The Three-Criteria Servers Replication and Topology Assignment Problem in Wide Area Networks Marcin Markowski and Andrzej Kasprzak Wroclaw University of Technology, Chair of Systems and Computer Networks, Wybrzeze Wyspianskiego 27, 50-370 Wroclaw, Poland [email protected], [email protected]
Abstract. The designing of wide area networks is usually an optimization process with accurately selected optimization criterion. The most utilized criterions are the quality of service in the network (indicated by average packet delay), the capacity cost (channel capacity leasing cost) and server costs (cost of connecting servers or replicas at nodes). This paper studies the problem of designing wide area networks with taking into account those three criteria. Then, the goal is select servers replica allocation at nodes, network topology, channel capacities and flow routes in order to minimize the linear combination of average delay per packet, capacity cost and server cost. The problem is NP-complete. An exact algorithm, based on the branch and bound method is proposed. Some computational results are reported and several properties of the considered problem are formulated.
1 Introduction During the Wide Area Network (WAN) designing process, different optimization criterions may be taken into account. In the literature the one-criteria problems are often studied [1]. The very often criterion is the quality of service in the network [2] and the different kind of investment costs. There are also some algorithms for solving two-criteria problems, proposed in [3]. In fact, there are two basic kinds of the investment costs in the wide area networks. First of them, the network support cost must be borne regularly, for example once a month or once a year. The typical example is the capacity cost (leasing cost of channels). Second kind is disposable cost, borne once when the network is built or expanded. The example of disposable cost is server cost (connecting cost of servers at nodes). In the literature there are no solutions with two different cost criteria. Such approach may be very useful, because WAN designers often can not compare server cost with capacity cost. Then we consider the three-criteria design problem in wide area networks, where the linear combination of the quality of service, capacity cost and server cost are incorporated as the optimization criterion. Designing of network consists of topology, capacity and flow assignment and resource allocation. Resources often have to be replicated in order to satisfy all users demands [4], [5]. In the paper an exact algorithm for simultaneous assignment of M. Gavrilova et al. (Eds.): ICCSA 2006, LNCS 3982, pp. 1119 – 1128, 2006. © Springer-Verlag Berlin Heidelberg 2006
1120
M. Markowski and A. Kasprzak
server’s replica allocation, network topology, channels capacity and flow assignment is proposed. We use the combined criterion composed of the three indicators, representing quality of the network and different network costs. Moreover, we denote, that the capacity cost is limited. The considered problem is formulated as follows: given:
user allocation at nodes, for every server the set of nodes to which server may be connected, number of replicas, budget of the network, traffic requirements user-user and user-server, set of potential channels and their possible capacities and costs (i.e. cost-capacity function) minimize: linear combination of the total average delay per packet, the capacity cost and the server cost over: network topology, channel capacities, multicommodity flow (i.e. routing), replica allocation subject to: multicommodity flow constraints, channel capacity constraints, budget (maximal capacity cost) constraint, replica allocation constraints. We assume that channels’ capacities can be chosen from discrete sequence defined by international ITU-T recommendations. Then, the formulated above three-criteria servers replication and topology assignment problem is NP-complete [1], [6]. Some solutions for the wide area network optimization problems can be found in the literature. The topology assignment problem and capacity and flow assignment problem are considered in papers [1], [6], [7] and resource allocation and replication in [3], [4], [5]. In the paper [3] the two-criteria web replica allocation and topology assignment problem is considered. Problem presented in this paper is much more general than problem presented in [3], because we consider three criteria: two criteria the same as in the paper [3] and a new one − the capacity cost as the third criterion. Then, the three-criteria problem is more complicated and much more useful for wide area network designers. Such problem, with different costs optimization criteria have been not considered in the literature yet.
2 Problem Formulation Consider a WAN consisting of N nodes and b potential channels which may be used to build the network. For each potential channel i there is the set
C i = {c1i ,..., c si (i )−1} of alternative values of capacities from which exactly one must be chosen if the i -th channel was chosen to build the WAN. Let d ij be the cost of leasing capacity c ij [€€ /month]. Let c si (i ) = 0 for i = 1,..., b. Then C i = C i ∪ {csi (i ) } be the set of alternative capacities from among which exactly one must be used to channel i. If the capacity c si (i ) is chosen then the i -th channel is not used to build the wide area network. Let x ij be the decision variable, which is equal to one if the capacity c ij is assigned to channel i and x ij is equal to zero otherwise. Since exactly
The Three-Criteria Servers Replication and Topology Assignment Problem in WAN
1121
one capacity from the set C i must be chosen for channel i, then the following condition must be satisfied: s (i )
∑ x ij
= 1 for i = 1,..., b.
(1)
j =1
Let W i = {x1i ,..., x si (i ) } be the set of variables x ij , which correspond to the i-th channel. Let X r' be the permutation of values of all variables x ij for which the condition (1) is satisfied, and let X r be the set of variables, which are equal to one in X r' . Let K denotes the total number of servers, which must be allocated in WAN and let LK k denotes the number of replicas of k -th server. Let M k be the set of nodes to which k -th server (or replica of k -th server) may be connected, and let e(k ) be the number of all possible allocation for k -th server. Since only one replica of server may be allocated in one node then the following condition must be satisfied
LK k ≤ e(k ) for k = 1,..., K .
(2)
Let y kh be the decision binary variable for k -th server allocation; y kh is equal to one if the replica of k -th server is connected to node h , and equal to zero otherwise. Since LK k replicas of k -th server must be allocated in the network then the following condition must be satisfied
∑ y kh = LK k
for k = 1,..., K .
(3)
h∈M k
Let Yr be the set of all variables y kh , which are equal to one. The pair of sets ( X r , Yr ) is called a selection. Let ℜ be the family of all selections. X r determines the network topology and capacities of channels and Yr determines the replicas allocation at nodes of WAN. Let T ( X r , Yr ) be the minimal average delay per packet in WAN in which values of channel capacities are given by X r and traffic requirements are given by Yr (depending on server replica allocation). T ( X r , Yr ) can be obtained by solving a multicommodity flow problem in the network [8]. Let U (Yr ) be the server cost and let d ( X r ) be the capacity cost. Let Q( X r , Yr ) be linear combination of the total average delay per packet, the server cost and the capacity cost
Q( X r , Yr ) = αT ( X r , Yr ) + βU (Yr ) + ρ d ( X r )
(4)
where α , β and ρ are the positive coefficients; α , β , ρ ∈ [0,1], α + β + ρ = 1. Let B be the budget (maximal feasible leasing capacity cost of channels) of WAN. Then, the considered server replication and topology assignment problem in WAN can be formulated as follows.
1122
M. Markowski and A. Kasprzak
min Q( X r , Yr )
(5)
( X r , Yr ) ∈ ℜ
(6)
∑
(7)
( X r ,Yr )
subject to
d(X r ) =
xij ∈X r
x ij d ij ≤ B
3 Calculation Scheme of the Branch and Bound Algorithm Assuming that LK k = 1 for k=1,...,K and C i = C i for i=1,...,b, the problem (5-7) is resolved itself into the “host allocation, capacity and flow assignment problem”. Since the host allocation, capacity and flow assignment problem is NP-complete [6], [8] then the problem (5-7) is also NP-complete as more general. Then, the branch and bound method can be used to construct the exact algorithm. Starting with the selection ( X 1 , Y1 ) ∈ ℜ we generate a sequence of selections ( X s , Ys ) . Each selection ( X s , Ys ) is obtained from a certain selections ( X r , Yr ) of the sequence by complementing one
variable x ij (or y kh ) by another variable from W i (or {y km : m ∈ M k and m ≠ h} ).
So, for each selection ( X r , Yr ) we constantly fix a subset Fr ∈ ( X r , Yr ) and momentarily fix a set Frt . The variables in Fr are constantly fixed and represent the path from the initial selection ( X 1 , Y1 ) to the selection ( X r , Yr ) . Each momentarily fixed variable in Frt is the variable abandoned during the backtracking process. Variables, which do not belong to Fr or Frt are called free in ( X r , Yr ) . There are two important elements in branch and bound method: testing operation (lower bound of the criterion function) and branching rules. Then, in the next section of the paper, the testing operation and choice operation are proposed. The lower bound LBr and branching rules are calculated for each selection ( X r , Yr ) . The lower bound is calculated to check if the “better” solution (selection ( X s , Ys ) ) may be found. If the testing is negative, we abandon the considered selection ( X r , Yr ) and backtrack to the selection ( X p , Y p ) from which selection ( X r , Yr )
was generated. The basic task of the branching rules is to find the variables for complementing to generate a new selection with the least possible value of the criterion function. The detailed description of the calculation scheme of branch and bound method may be found in the paper [9].
4 Lower Bound To find the lower bound LBr of the criterion function (4) we reformulate the problem (5-7) in the following way. We assume that the variables x ij and y kh are continuos
The Three-Criteria Servers Replication and Topology Assignment Problem in WAN
1123
variables, we omit the constraint (2) and approximate the discrete cost-capacity curves (given by the set C i ) with the lower linear envelope. In this case, the constraint (7) may be relaxed by the constraint Σ d i c i ≤ B , where d i = min (d ij / c ij ) xij ∈W i
and c i is the capacity of the channel i (continuous variable). To easy find the lower bound we create the model of the WAN in the following way. We add to the considered network 2 K new artificial nodes, numbered from n + 1 to n + 2 K . The artificial nodes n + k and n + K + k correspond to the k -th host. Moreover we add to the network directed artificial channels n + k , m , m, n + K + k , n + K + k , n + k ,
such that m ∈ M k and y kh ∈ ZY ' ∪ ZY ' '. The capacities of the new artificial channels are following: c(n + k , m ) = ∞ , c(m, n + K + k ) = ∞ , c(n + K + k , n + k ) =
n
∑ u kh .
h=1
The leasing costs of all artificial channels are equal to zero. Then, the lower bound LBr of minimal value of the criterion function Q( X s , Ys )
for every possible successor ( X s , Ys ) generated from ( X r , Yr ) may be obtained by solving the following optimization problem: ⎛ ⎜ α LBr = min ⎜ 2 γ f ⎜ ⎝
+ρ
∑
x ij d ij + β
xij ∈Fr
∑
d i fi +
i: xij ∈X r − Fr
α γ
∑
fi
i i xij ∈Fr x j c j
− fi
f (n, N + K + k ) + f (N + k , n )
∑
' +r' ) ∑ (rkm km N
ykn∈Yr − Fr
m =1
u kn
+ρ
∑ d i fi +
i: xij ∈X r − Fr
⎞ ⎟ ⎟ + β ∑ u kn y kn ⎟ ⎟ ykn∈Fr ⎟ ⎠
(8)
subject to
N +K
∑
e =1
f
nm
f i ≤ x ij c i for x ij ∈ X r − Fr
(9)
f i ≤ x ij c ij for x ij ∈ Fr
(10)
ir f i ≤ c max for each x ij ∈ X r − Fr
(11)
N +K
(a,e) − ∑
e =1
⎧r~nm ⎪− r~ ⎪ nm nm f (e, a ) = ⎨ ' r ⎪ pm ⎪r ' ⎩ pm
if a = n and n, m ≤ N if a = m and n, m ≤ N if a = m and e = N + p if a = N + p and e = m
(12)
1124
M. Markowski and A. Kasprzak
' where ~ rnm is the average flow rate directed from node n to node m, r pm is the aver' is the average flow rate age flow rate directed from node m to p -th server, r pm ir directed from p -th server to node m and c max = i max c ij . i t x j ∈W − Fr
The solution of problem (8−12) gives the lower bound LBr. To solve the above reformulated problem we can use an efficient Flow Deviation method [6], [8].
5 Branching Rules The purposes of branching rules is to find the normal variable from the selection ( X r , Yr ) for complementing and generating a successor ( X s , Ys ) of the selection
( X r , Yr )
with the least possible value of the criterion function (4). We can choose a
variable x ij or a variable y kh . Complementing variables x ij causes the capacity change in channel i. Then, the values of T ( X r , Yr ) and d ( X r ) changes, and the value of U (Yr ) does not change. Then, in this case we can use the criterion given in the form of the following theorem. Theorem 1. Let ( X r , Yr ) ∈ ℜ. If the selection ( X s , Ys ) is obtained from the selection ( X r , Yr ) by complementing the variable x ij ∈ X r by the variable xli ∈ X s then Q( X s , Ys ) ≤ Δi jl , where
Δi jl
⎧ ⎪Q( X r , Yr ) − α ⎪ γ ⎪ =⎨ ⎪ α ⎪Q( X r , Yr ) − γ ⎪⎩
⎛ ⎜ ⎜ ci ⎝ j ⎛ ⎜ ⎜ ci ⎝ j
⎞ ⎟ − ρ x i d i − x i d i if x i c i > f i j j j j l l − f i cli − f i ⎟ ⎠ ⎞ fi fi ⎟ − ρ x i d i − x i d i otherwise − j j l l ir − f i c max − f i ⎟ ⎠ fi
−
fi
(
)
(
)
(13)
f ri is the flow in the i -th channel obtained by solving the multicommodity flow problem for network topology and channel capacities given by the selection X r and γ is the total average packet rate from external sources. To formulate the choice criterion on variables y kh we use the following theorem. Theorem 2. Let ( X r , Yr ) ∈ ℜ. If the selection ( X s , Ys ) is obtained from the selection
( X r , Yr )
by complementing the variable y kh ∈ Yr by the variable y km ∈ Ys then
k Q( X s , Ys ) ≤ δ hm , where
The Three-Criteria Servers Replication and Topology Assignment Problem in WAN
k δ hm
⎧α ⎪ ⎪γ =⎨ ⎪ ⎪∞ ⎩
∑
1125
~i f
xij ∈X r
~ + β (U (Yr ) − u kh + u km ) + ρ d ( X r ) x ij c ij − f i ~ if f i < x ij c ij for x ij ∈ X r otherwise
(14)
~i f = f ri − f ik' + f ik'' , f ik' and f ik'' are parts of flow in i-th channel; f ik' corresponds to the packets sent from all users to those replica of k -th server which is allocated at node h (before complementing) and from this replica to all users. f ik'' corresponds to the packets exchanged between all users and replica after reallocating replica from node h to node m; u kh is the cost of connecting the server k at node h. Let E r = ( X r ∪ Yr ) − Fr , and let Gr be the set of all reverse variables of normal variables, which belong to the set E r . We want to choose a normal variable the complementing of which generates a successor with the possible least value of criterion (4). We should choose such pairs {( y kh , y km ) : y kh ∈ E r , y km ∈ G r } or
{(x ij , xli ) : x ij ∈ Er , xli ∈ Gr }, for which the value of criterion δ hmk or Δ
i jl
is minimal.
6 Computational Results The presented exact algorithm was implemented in C++ code. Extensive numerical experiments have been performed with this algorithm for many different network topologies and for many sets of possible server’s replica locations and connecting costs. The experiments were conducted with two main purposes in mind: first, to test the computational efficiency of an algorithm and second, to examine the impact of various parameters on solutions. Let NB = ((B − Dmin ) / (Dmax − Dmin )) ⋅ 100% be the normalized budget in percentage. Dmax is the maximal capacity cost of the network and Dmin is the minimal
Fig. 1. The dependence of P on normalized budget NB
1126
M. Markowski and A. Kasprzak
capacity cost - problem (5-7) has no solution for B < Dmin . Normalized budget let us compare the results obtained for different wide area network topologies and for different replica locations. Let P(u, v) , in percentage, be the arithmetic mean of the relative number of iterations for NB∈[u,v] calculated for all considered network topologies and for different replica locations. Fig. 1 shows the dependency of P on divisions [0%,10%), [10%,20%), ..., [90%,100%] of normalized budget NB. It follows from Fig. 1 that the exact algorithm is especially effective from computational point of view for NB∈[0%,20%] ∪ [70%, 100%]. The typical dependence of the optimal value of Q on budget B for different values of coefficients α , β , ρ is presented in the Fig. 2. It follows from Fig. 2 that there exists such budget B*, that the problem (5-7) has the same solution for each B greater than or equal to B*.
Conclusion 1. In the problem (5-7), for B ≥ B* the constraint (7) may be substituted by constraint d(Xr) ≤ B*. This observation shows that the influence of the building cost (budget) on the optimal solution of the problem (5-7) is very limited for greater values of budget B. Let optimal value of Q obtained by solving the problem (5-7) is following:
Q = α T opt + β U opt + ρ d opt . Then, very interesting problem is to study the dependence of T opt , U opt , d opt on coefficients α , β , ρ . The dependence of the average delay per packet T opt on coefficient α has been examined. In the Fig. 3 the typical dependence of T opt [microsecond] on α is presented for different values of budget B. The typical dependence of the optimal value of criterion U opt on coefficient β for different values of the network budget B is presented in the Fig. 4, and the typical dependence of the optimal value d opt on the coefficient ρ for different values of B is presented in the Fig. 5. α = 0,6, β = 0,2, ρ = 0,2 α = 0,5, β = 0,2, ρ = 0,3 α = 0,4, β = 0,2, ρ = 0,4
Q 80 000 60 000 40 000 32 000
52 000
72 000
92 000
112 000
132 000
B
Fig. 2. Dependence of Q on the budget B for different values of coefficients α , β , ρ
The Three-Criteria Servers Replication and Topology Assignment Problem in WAN
T
1127
Β = 70 000 Β = 80 000 Β = 90 000
opt
27 000 23 000 19 000 15 000 0,00
0,20
0,40
0,60
0,80
α
Fig. 3. The typical dependence of T opt on coefficient α for different values of B
It follows from the numerical experiments and from the Fig. 3, 4, 5 that the functions T opt (α ), U opt ( β ) and d opt ( ρ ) are decreasing.
Conclusion 2. Values of coefficients α , β , ρ have the impact on the values of T opt , U opt and d opt in the optimal value of criterion Q, obtained by solving the problem (5-7). The conclusions formulated above are important from practical point of view. They allow to specify the designing assumptions on coefficients α , β and ρ . During the design process, one of the criteria may be more important and the other little less important for designer. When we want one of the three criteria to be the least we have to determine the biggest coefficient for this criterion. Network designers can indicate which part of the criterion Q (i.e. T opt or U opt or d opt ) is the most important. It follows from Conclusion 2 that it may be done by appropriate determine the value of coefficient α , β and ρ .
U
opt
% % %
E
Fig. 4. The typical dependence of U opt on coefficient β for different values of B
1128
M. Markowski and A. Kasprzak
d
Β = 90 000 Β = 72 000 Β = 55 000
opt
77 000 61 000 45 000 0,00
0,20
0,40
0,60
0,80
ρ
Fig. 5. The typical dependence of d opt on coefficient ρ for different values of B
7 Conclusion In the paper an exact algorithm for solving the three-criteria servers replication and network topology assignment problem in WAN is presented. The considered problem is far more general than the similar problems presented in the literature. Considering two different kinds of cost (capacity cost and server costs) is very important from practical point of view. Some properties of the considered problem have been discovered and formulated as conclusions. In our opinion, in practice the most important is conclusion 2, because it helps to make designing assumptions. This work was supported by a research project of The Polish State Committee for Scientific Research in 2005-2007.
References 1. Pioro, M., Medhi, D.: Routing, Flow, and Capacity Design in Communication and Computer Networks, Elsevier, Morgan Kaufmann Publishers, San Francisco (2004) 2. Walkowiak, K.: QoS Dynamic Routing in Content Delivery Network, Lectures Notes in Computer Science, Vol. 3462 (2005), 1120-1132 3. Markowski, M., Kasprzak, A.: The web replica allocation and topology assignment problem in wide area networks: algorithms and computational results, Lecture Notes in Computer Science, Vol. 3483 (2005), 772-781 4. Radoslavov, P., Govindan, R., Estrin, D.: Topology-Informed Internet Replica Placement, Computer Communications, Volume: 25, Issue: 4 (2002), 384-392 5. Qiu, L., Padmanabhan, V. N., Voelker, G. M.: On the Placement of Web Server Replicas, Proc. of 20th IEEE INFOCOM, Anchorage, USA (2001), 1587 –1596 6. Kasprzak, A.: Topological Design of the Wide Area Networks. Wroclaw University of Technology Press, Wroclaw (2001) 7. Kang, C. G., Tan, H. H.: Combined channel allocation and routing algorithms in packed switched networks, Computer Communications, vol. 20 (1997), 1175-1190 8. Fratta, L., Gerla, M., Kleinrock, L.: The Flow Deviation Method: an Approach to Store-andForward Communication Network Design. Networks 3 (1973), 97-133 9. Wolsey, L.A.: Integer Programming. Wiley-Interscience, New York (1998)
An Efficient Multicast Tree with Delay and Delay Variation Constraints Moonseong Kim1 , Young-Cheol Bang2 , Jong S. Yang3 , and Hyunseung Choo1 1
3
School of Information and Communication Engineering, Sungkyunkwan University, 440-746, Suwon, Korea Tel.: +82-31-290-7145 {moonseong, choo}@ece.skku.ac.kr 2 Department of Computer Engineering, Korea Polytechnic University,429-793, Gyeonggi-Do, Korea Tel.: +82-31-496-8292 [email protected] Korea Institute of Industrial Technology Evaluation and Planning, Seoul, Korea Tel.: +82-42-860-6333 [email protected]
Abstract. With the rapid evolution of real time multimedia applications like audio/video conferencing, interactive distributed games and real time remote control system, a certain Quality of Service (QoS) needs to be guaranteed in underlying networks. Multicast routing algorithms should support the required QoS. There are two important QoS parameters, bounded delay and delay variation, that need to be guaranteed in order to support the real time multimedia applications. Here we solve Delay and delay Variation Bounded Multicast Tree (DVBMT) problem which has been proved to NP-complete. In this paper, we propose an efficient algorithm for DVBMT. The performance enhancement is up to about 21.7% in terms of delay variation as compared to the well-known algorithm, KBC [9].
1
Introduction
New communication services involving multicast communications and real time multimedia applications are becoming prevalent. The general problem of multicasting is well studied in the area of computer networks and algorithmic network theory. In multicast communications, messages are sent to multiple destinations that belong to the same multicast group. These group applications demand a certain amount of reserved resources to satisfy their Quality of Service (QoS) requirements such as end-to-end delay, delay variation, cost, loss, throughput, etc.
This research was supported by the Ministry of Information and Communication, Korea under the Information Technology Research Center support program supervised by the Institute of Information Technology Assessment, IITA-2005(C1090-0501-0019) and the Ministry of Commerce, Industry and Energy under NextGeneration Growth Engine Industry. Dr. Choo is the corresponding author and Dr. Bang is the co-corresponding author.
M. Gavrilova et al. (Eds.): ICCSA 2006, LNCS 3982, pp. 1129–1136, 2006. c Springer-Verlag Berlin Heidelberg 2006
1130
M. Kim et al.
The multicast tree problem can be modelled as the Steiner tree problem which is NP-complete. A lot of heuristics [1] [2] [3] that construct low-cost multicast routing based on the problem are proposed. Not only the tree cost as a measure of bandwidth efficiency is one of the important factors, but also networks supporting real-time traffic are required to receive messages from source node in a limited amount of time. If the messages cannot reach to the requested destination nodes within the time, they may be useless information. Therefore, it can be an important factor to guarantee an upper bound on the end-to-end delay from the source to each destination. This is called the multicast end-to-end delay problem [4]. There is another important factor to consider real-time application in the multicast network. During video-conferencing, the person who is talking has to be heard by all participants at the same time. Otherwise, the communication may lose the feeling of an interactive screen-to-screen discussion. Moreover, the situation of the on-line video gaming may have the problem that the characters who are in the game must be moved simultaneously. All these problems can be explained that the delay variation has to be kept within a restricted time. They are all related to the multicast delay variation problem [5] [6]. In this paper, we study the delay variation problem under the upper bound on the multicast end-to-end delay. We propose an efficient algorithm in comparison with current algorithms known as the best algorithms so far. The proposed algorithm has the better performance than current algorithms have in terms of the multicast delay variation. The rest of the paper is organized as follows. In Section 2, we state the network model for the multicast routing, the problem formulations, and the previous algorithms. Section 3 presents the details of the proposed algorithms. Then, we evaluate the proposed algorithms by the computer simulation, in Section 4. Finally, Section 5 concludes this paper.
2 2.1
Preliminaries Network Model
We consider that a computer network is represented by a directed graph G = (V, E) with n nodes and l links, where V is a set of nodes and E is a set of links, respectively. Each link e = (i, j) ∈ E is associated with two parameters, namely link cost c(e) ≥ 0 and link delay d(e) ≥ 0. The delay of a link, d(e), is the sum of the perceived queueing delay, transmission delay, and propagation delay. We define a path as sequence of links such that (u, i), (i, j), . . ., (k, v), belongs to E. Let P (u, v) = {(u, i), (i, j), . . . , (k, v)} denote the path from node u to node v. For given a source node s ∈ V and a destination node d ∈ V , (2s→d , ∞) is the set of all possible paths from s to d. (2s→d , ∞) = { Pk (s, d) | all possible paths from s to d, ∀ s, d ∈ V, ∀ k ∈ Λ } (1) where Λ is an index set. The path cost of P is given by φC (P ) = e∈P c(e) and the path delay of P is given by φD (P ) = e∈P d(e). (2s→d , Δ) is the set
An Efficient Multicast Tree with Delay and Delay Variation Constraints
1131
of paths from s to d for which the end-to-end delay is bounded by Δ. Therefore (2s→d , Δ) ⊆ (2s→d , ∞). For the multicast communications, messages need to be delivered to all receivers in the set M ⊆ V \ {s} which is called the multicast group, where |M | = m. The path traversed by messages from the source s to a multicast receiver, mi , is given by P (s,mi ). Thus multicast routing tree can be defined as T (s, M ), the Steiner tree of mi ∈M P (s, mi ), and the messages are sent from s to M through T (s, M ). The tree delay is φD (T ) = max{ φD (P (s, mi )) | ∀ P (s, mi ) ⊆ T, ∀ mi ∈ M }. The multicast delay variation, φδ (T ), is the maximum difference between the end-to-end delays along the paths from the source to any two destination nodes. φδ (T ) = max{|φD (P (s, mi ))−φD (P (s, mj ))|, ∀P ⊆ T,∀ mi , mj ∈ M, i = j} (2) The issue defined and discussed in [5], initially, is to minimize multicast delay variation under multicast end-to-end delay constraint. The authors referred to this problem as Delay- and delay Variation-Bounded Multicast Tree (DVBMT) problem. The DVBMT problem is to find the tree that satisfies min{ φδ (Tα ) | ∀ P (s, mi ) ∈ (2s→mi , Δ),∀ P (s, mi ) ⊆ Tα ,∀ mi ∈ M,∀ α ∈ Λ } (3) where Tα denotes any multicast tree spanning M ∪ {s}, and is known to be NP-complete [5]. 2.2
Previous Algorithms
The DVBMT problem has been introduced in [5]. Rouskas and Baldine proposed Delay Variation Multicast Algorithm (DVMA), finds a Multicast Tree spanning the set of multicast nodes. DVMA works on the principal of finding the k th shortest paths to the concerned nodes. If these paths do not satisfy a delay variation bound δ, longer paths are found. The complexity of DVMA is O(klmn4 ), where k and l are the largest value among the numbers of the all appropriate paths between any two nodes under Δ, |M | = m, and |V | = n. DVMA is a high time complexity does not fit in modern high speed computer network environment. Sheu and Chen proposed Delay and Delay Variation Constraint Algorithm (DDVCA) [7] based on Core Based Trees (CBT) [8]. Since DDVCA is meant to search as much as possible for a multicast tree with a smaller multicast delay variation under the multicast end-to-end delay constraint Δ, DDVCA in picking a central node has to inspect whether that central node will violate the multicast end-to-end delay constraint Δ. In spite of DVMA’s smart performance in terms of the multicast delay variation, its time complexity is very high. But DDVCA has a much lower time complexity O(mn2 ) and has a satisfactory performance. Kim, et al. have recently proposed a heuristic algorithm based on CBT like DDVCA [9], their algorithm hereafter referred to as KBC. DDVCA overlooked a portion of the core selection. The selection of a core node over several candidates (possible core nodes) is randomly selected among candidate nodes. Meanwhile
1132
M. Kim et al.
KBC investigates candidate nodes to select the better node with the same time complexity of DDVCA. KBC obtains the better minimum multicast delay variation than DDVCA. 2.3
Weighted Factor Algorithm (WFA)
Kim, et al. have recently proposed a unicast routing algorithm, their algorithm hereafter referred to as WFA [10], which is probabilistic combination of the link cost and delay and its time complexity is O(W l + W nlogn), where {ωα } is set of weights and |{ωα }| = W . The authors investigated the efficiency routing problem in point-to-point connection-oriented networks with a QoS. They formulated the new weight parameter that simultaneously took into account both the link cost and delay. The Least Delay path (PLD ) cost is relatively more expensive than the Least Cost path (PLC ) cost, and moreover, φD (PLC ) is relatively higher than φD (PLD ). The unicast routing algorithm, WFA, is quite likely a performance of k th shortest path algorithm. The weight ω plays on important role in combining the two the independent measures. If the ω is nearly to 0, then the path delay is low as in Fig. 1. Otherwise the path cost is low. Thus, the efficient routing path can be determined once ω is selected. Our proposed algorithm uses WFA for construction of multicast tree. Since a multicast tree is union of paths for every multicast member, we can select suitable weight for every multicast member.
PATH DELAY is increasing
PATH COST is decreasing
The w approaches to 1
Fig. 1. Relationship between path delay and path cost from [10] (Pe : 0.3, |V |: 100)
3
The Proposed Algorithm
We now present algorithm to construct a multicast tree satisfying constraints (3) for the given value of the path delay Δ. We assume that complete information
An Efficient Multicast Tree with Delay and Delay Variation Constraints
1133
Fig. 2. Weight selection in proposed algorithm
Fig. 3. Scenario illustrating loop creation in G∗ = P (s, m1 ) ∪ P (s, m2 )
regarding the network topology is stored locally at source node s, making it possible to determine the multicast tree at the source node itself. This information may be collected and updated using an existing topology broadcast algorithm. Our objective is to obtain the feasible tree of minimum delay variation for the given value of Δ. We introduce a heuristic algorithm for DVBMT. WFA can check various paths with path delay for each ω as Fig. 1. For each multicast member, the proposed algorithm uses WFA and finds a pertinence weight with Δ. Let G = (V, E) be a given network and M = { mα , mβ , mγ } ⊂ V be a multicast group. As indicated in Fig. 2, we select ω ˆ i∗ such that max{ φD (Pωˆ i ) | ∀ Pωi ∈ s→mi ∀ (2 , Δ) ωi ∈ W }. If the difference between φD (Pωˆ α∗ ) and φD (Pωˆ γ ∗ ) is big, ˆ Let Δ∗ be a virthen a potential delay variation δˆ is large. So we have to reduce δ. tual delay boundary in place of Δ. Hence we have Δ∗ = mi ∈M φD (Pωˆ i∗ )/|M | as the arithmetic mean for φD (Pωˆ i∗ ). And we select ω ˜ i∗ such that min{ | φD (Pωi )− ∗ ∀ s→mi Δ |, Pωi ∈ (2 , Δ) }. Intuitively, a potential delay variation δ˜ may be ˆ ˜ ˆ smaller than δ(i.e., δ ≤ δ). ∗ Let G = mi ∈M Pω˜ i (s, mi ) be a connected subgraph of G. See Fig. 3, G∗ must be not tree. We have to find the minimal spanning tree, T ∗ , of G∗ . If
1134
M. Kim et al.
there are several minimal spanning trees, pick an arbitrary one. And then, the proposed algorithm constructs a multicast tree, T (s, M ), from T ∗ by deleting links in T ∗ , if necessary, so that all the leaves in T (s, M ) are multicast members.
4 4.1
Performance Evaluation Random Real Network Topology for the Simulation
Random graphs of the acknowledged model represent different kinds of networks, communication networks in particular. There are many algorithms and programs, but the speed is usually the main goal, not the statistical properties. In the last decade the problem was discussed, for examples, by B. M. Waxman (1993) [11], M. Doar (1993, 1996) [12][13], C.-K. Toh (1993) [14], E. W. Zegura, K. L. Calvert, and S. Bhattacharjee (1996) [15], K. L. Calvert, M. Doar, and M. Doar (1997) [16], R. Kumar, P. Raghavan, S. Rajagopalan, D. Sivakumar, A. Tomkins, and E. Upfal (2000) [17]. They have presented fast algorithms that allow the generation of random graphs with different properties, in particular, these are similar to real communication networks. However, none of them have discussed the stochastic properties of generated random graphs. A. S. Rodionov and H. Choo [18] have formulated two major demands for the generators of random graph: attainability of all graphs with required properties and uniformity of distribution. If the second demand is sometimes difficult to prove theoretically, it is possible to check the distribution statistically. The method uses parameter Pe , the probability of link existence between any node pair. We use the method by Rodionov and Choo. 4.2
Simulation Results
We now describe some numerical results with which we compare the performance of the proposed schemes. We generate 100 different networks for each size of 50, 100, and 200. Each node in network has the node average degree 4 or Pe = 0.3. The proposed algorithm is implemented in C. We randomly select a source node. The destination nodes are picked uniformly from the set of nodes in the network topology (excluding the nodes already selected for the destination). Moreover, the destination nodes in the multicast group, M , are occupied 10% ∼ 80% of the overall nodes on the network. The delay bound Δ value in our computer experiment is set to be 1.5 times the minimum delay between the source node and the farthest destination node [7]. We simulate 1000 times (10 × 100 = 1000) for each |V |. For the performance comparison, we implement DDVCA and KBC in the same simulation environments. As indicated in Fig. 4, it is easily noticed that the proposed algorithm is always better than others. KBC is a little better than DDVCA. Since our algorithm specially regards the weight selection, its delay
An Efficient Multicast Tree with Delay and Delay Variation Constraints
1135
(a) |V | = 50 with average degree 4
(b) |V | = 100 with average degree 4
(c) |V | = 100 with Pe = 0.3
(d) |V | = 200 with average degree 4
Fig. 4. Tree delay variations
variation cannot help being good. The enhancement is up to about 16.1% ∼ 21.7% (|V | = 200) in terms of delay variation for KBC.
5
Conclusion
We studied the problem of constructing minimum delay variation multicast tree that satisfies the end-to-end delay bound and delay variation bound, which is called as DVBMT, and has been proved to be NP-complete. We proposed algorithm using expected multiple paths. The expected multiple paths are obtained by WFA which is introduced in [10]. WFA is efficiently combining two independent measures, the link cost and delay. The weight ω plays on important role in combining the two measures. Our algorithm finds suitable weight ω for each destination member, and they construct the multicast tree for DVBMT. The efficiency of our algorithm is verified through the performance evaluations and the enhancements are 16.1% ∼ 21.7% in terms of the multicast delay variation. Also, the time complexity is O(W ml + W mnlogn) which is comparable to one of previous works.
1136
M. Kim et al.
References 1. Y.-C. Bang and H. Choo, “On multicasting with minimum costs for the Internet topology,” Springer-Verlag Lecture Notes in Computer Science, vol. 2400, pp. 736744, August 2002. 2. L. Kou, G. Markowsky, and L. Berman, “A fast algorithm for Steiner trees,” Acta Informatica, vol. 15, pp. 141-145, 1981. 3. H. Takahashi and A. Matsuyama, “An Approximate Solution for the Steiner Problem in Graphs,” Mathematica Japonica, vol. 24, no. 6, pp. 573-577, 1980. 4. V. P. Kompella, J. C. Pasquale, and G. C. Polyzos, “Multicast routing for multimedia communication,” IEEE/ACM Trans. Networking, vol. 1, no. 3, pp. 286-292, June 1993. 5. G. N. Rouskas and I. Baldine, “Multicast routing with end-to-end delay and delay variation constraints,” IEEE JSAC, vol. 15, no. 3, pp. 346-356, April 1997. 6. M. Kim, Y.-C. Bang, and H. Choo, “On Estimation for Reducing Multicast Delay Variation,” Springer-Verlag Lecture Notes in Computer Science, HPCC 2005, vol. 3726, pp. 117-122, September 2005. 7. P.-R. Sheu and S.-T. Chen, “A Fast and Efficient Heuristic Algorithm for the Delay- and Delay Variation-Bounded Multicast Tree Problem,” Computer Communications, vol. 25, no. 8, pp. 825-833, 2002. 8. A. Ballardie, B. Cain, Z. Zhang, “Core Based Trees (CBT version 3) Multicast Routing protocol specification,” Internet Draft, IETF, August 1998. 9. M. Kim, Y.-C. Bang, and H. Choo, “Efficient Algorithm for Reducing Delay Variation on Bounded Multicast Trees,” Springer-Verlag Lecture Notes in Computer Science, Networking 2004, vol. 3090, pp. 440-450, September 2004. 10. M. Kim, Y.-C. Bang, and H. Choo, “On Algorithm for Efficiently Combining Two Independent Measures in Routing Paths,” Springer-Verlag Lecture Notes in Computer Science, ICCSA 2005, vol. 3483, pp. 989-998, May 2005. 11. B. W. Waxman, “Routing of multipoint connections,” IEEE JSAC, vol. 6, no. 9, pp. 1617-1622, December 1988. 12. M. Doar, “Multicast in the ATM environment,” Ph.D dissertation, Cambridge University, Computer Lab., September 1993. 13. M. Doar, “A Better Mode for Generating Test Networks,” IEEE Proc. GLOBECOM’96, pp. 86-93, 1996. 14. C.-K. Toh, “Performance Evaluation of Crossover Switch Discovery Algorithms for Wireless ATM LANs,” IEEE Proc. INFOCOM96, pp. 1380-1387, 1996. 15. E. W. Zegura, K. L. Calvert, and S. Bhattacharjee, “How to model an Internetwork,” IEEE Proc. INFOCOM96, pp. 594-602, 1996. 16. K. L. Calvert, M. Doar, and M. Doar, “Modelling Internet Topology,” IEEE Communications Magazine, pp. 160-163, June 1997. 17. R. Kumar, P. Raghavan, S. Rajagopalan, D Sivakumar, A. Tomkins, and E. Upfal, “Stochastic models for the Web graph,” Proc. 41st Annual Symposium on Foundations of Computer Science, pp. 57-65, 2000. 18. A. S. Rodionov and H. Choo, “On Generating Random Network Structures: Connected Graphs,” Springer-Verlag Lecture Notes in Computer Science, vol. 3090, pp. 483-491, August 2004.
Algorithms on Extended (δ, γ)-Matching Inbok Lee1 , Rapha¨el Clifford2 , and Sung-Ryul Kim3, 1
3
King’s College London, Department of Computer Science, London WC2R 2LS, UK [email protected] 2 University of Bristol, Department of Computer Science, UK [email protected] Konkuk University, Division of Internet & Media and CAESIT, Seoul, Republic of Korea [email protected]
Abstract. Approximate pattern matching plays an important role in various applications, such as bioinformatics, computer-aided music analysis and computer vision. We focus on (δ, γ)-matching. Given a text T of length n, a pattern P of length m,and two parameters δ and γ, the aim is to find all the substring T [i, i + m − 1] such that (a) ∀1 ≤ j ≤ m, |T [i+j−1]−P [j]| ≤ δ (δ-matching) , and (b) 1≤j≤m |T [i+j−1]−P [j]| ≤ γ (γ-matching). In this paper we consider three variations of (δ, γ)matching: amplified matching, transposition-invariant matching, and amplified transposition-invariant matching. For each problem we propose a simple and efficient algorithm which is easy to implement.
1
Introduction
Approximate pattern matching plays an important role in various applications, such as bioinformatics, computer-aided music analysis and computer vision. In real world, patterns rarely appear exactly in the text due to various reason. Hence we are interested in finding approximate occurrence of the pattern from the text. When we handle numeric data, approximate occurrences mean small differences between the text and the pattern. We focus on (δ, γ)-matching. Informally (δ, γ)-matching refers to the problem of finding all the substrings of the text where each character differs at most δ and the sum of the differences is equal to or smaller than γ. Even though (δ, γ)matching is one way of modelling approximate matching, we need to handle another approximation in addition to that. We consider three variations of (δ, γ)matching. Informally, our problems are as follows: – [Amplified matching] refers to finding all the substrings in T where each character of P is multiplied by an arbitrary number.
This research was supported by the MIC(Ministry of Information and Communication), Korea, under the ITRC(Information Technology Research Center) support program supervised by the IITA(Institute of Information Technology Assessment). Contact author.
M. Gavrilova et al. (Eds.): ICCSA 2006, LNCS 3982, pp. 1137–1142, 2006. c Springer-Verlag Berlin Heidelberg 2006
1138
I. Lee, R. Clifford, and S.-R. Kim
– [Transposition-invariant matching] refers to finding all the substrings in T where each character of P is added by an arbitrary number. – [Amplified Transposition-invariant matching] refers to the combination of two. We want to find all the substrings in T where each character of P is first multiplied as in amplified matching, then added by an arbitrary number. Another way to explain these problems is to consider a linear transformation y = αx + β. If α = 0, it is transposition-invariant matching. If β = 0, then it is amplified matching. Otherwise, it is amplified transposition-invariant matching. These problems happen in real world applications. In music analysis, a simple motive can be transformed into different variations, changing frequencies or duration of notes. If we represent musical data in numerical form, the original pattern is amplified or added by a constant number. J.S. Bach’s Goldberg Variations and L. Beethoven’s Diabelli Variations are famous examples of extending a simple motive into a great masterpiece. Hence finding the occurrence of the simple motive in varied forms plays an important role in music analysis. Figure 1 shows an example. The pattern is presented in (a). In (b), each character in P is multiplied by two. They are represented by dashed circles. Now assuming δ = 1 and γ = 2, there is an occurrence of amplified matching in the text (black dots). In (c), each character in P is added by one. The dashed circles and black dots are the same as in (b) and it shows an occurrence of transposition-invariant matching. Finally, each character in P is first multiplied by two, and added by one. There have been works on efficient algorithms for (δ, γ)-matching. Recent works on (δ, γ)-matching include suffix automata using bit-parallel technique [4] and fast Fourier transform [2]. For the transposition-invariant matching, several algorithms were proposed recently [5, 6, 7]. The problem with these previous approaches is that they are based on sparse dynamic programming, which is not easy to understand and implement. character
multiplied by 2
(a)
pattern
character
(b)
text
character multiplied by 2
transposed by 1
transposed by 1
(c)
text
(d)
text
Fig. 1. In (a), the pattern is presented. (b),(c), and (d) are examples of amplified, transposition-invariant, and amplified transposition-invariant matching.
Algorithms on Extended (δ, γ)-Matching
2
1139
Preliminaries
Let T and P be strings over an integer alphabet Σ. T [i] denotes the i-th character of T . T [i, j] denotes the substring T [i]T [i + 1] · · · T [j]. |T | = n and |P | = m. The basic problem, (δ, γ)-matching is defined as follows. Definition 1. Given a text T = T [1, n], a pattern P = P [1, m], and two parameters δ and γ, (δ, γ)-matching is is to find all the substring T [i, i + m − 1] such that (a) ∀1 ≤ j ≤ m, |T [i + j − 1] − P [j]| ≤ δ (δ-matching) , and (b) 1≤j≤m |T [i + j − 1] − P [j]| ≤ γ (γ-matching). Note that this problem makes sense if δ ≤ γ ≤ δm. Usually δ is quite small in applications. Clifford et al. [2] showed how to use the Fast Fourier Transform (FFT) in (δ, γ)-matching. We briefly introduce their methods. The most important property of the FFT is that all the inner-products P [1, m] · T [i, i + m − 1] =
m
P [j]T [i + j − 1], 1 ≤ i ≤ n − m + 1.
j=1
can be calculated in O(n log m) time (see [3, Chap 32] for more details). If there is an exact match between P and T [i, i + m − 1], m
(P [j] − T [i + j − 1])2 =
j=1
m
P [j]2 − 2P [1, m] · T [i, i + m − 1] +
j=1
m
T [i + j − 1]2
j=1
should be zero. The first and last terms can be computed in O(n + m) time and the second term can be computed in O(n log m) time using the FFT. If there is an δ-matching between P and T [i, i + m − 1], each pair of P [j] and T [i + j − 1] δ should satisfy =−δ (P [j] − T [i + j − 1] + )2 = 0. Hence m δ
(P [j] − T [i + j − 1] + )2 = 0.
j=1 =−δ
The total time complexity for δ-matching is O(δn log m). For (δ, γ)-matching, they used another complex tricks to achieve the same time complexity. We do not mention more details here. For those interested, refer to [2]. Unfortunately O(δn log m) time for (δ, γ)-matching cannot applied to amplified matching because they require the original pattern (not the amplified one).
3
Algorithms
Now we define the problems formally and show an efficient algorithm for each problem. Our algorithms are based on Clifford et al.’s technique [2]. We first find the candidates for amplified, transposition-invariant, and amplified transpositioninvariant matching. Then we verify the candidates whether they are real matches
1140
I. Lee, R. Clifford, and S.-R. Kim
or not. If T [i, i + m − 1] is a candidate, it is easy to show that the verification takes O(m) time for amplified matching and O(δm) time for transposition-invariant and amplified transposition-invariant matching. Using the technique in [7], both verifications can be done in O(m) time. 3.1
Amplified (δ, γ)-Matching
Definition 2. Given a text T = T [1, n], a pattern P = P [1, m], and two parameters δ and γ, the amplified (δ, γ)-matching is is to find all the substring T [i+m−1] such that for an integer α, (a) ∀1 ≤ j ≤ m, |T [i+j−1]−α×P [i]| ≤ δ, and (b) 1≤j≤m |T [i + j − 1] − α × P [i]| ≤ γ. The need to determine α. Once α is known, m difference lies in the fact 2that 2we m 2 (α × P [j] − T [i + j − 1]) = α × j=1 j=1 P [j] − 2α × P [1, m] · T [i, i + m − m 1] + j=1 T [i + j − 1]2 . And for δ-matching, m δ
(α × P [j] − T [i + j − 1] + )2 = 0
j=1 =−δ
can be computed in O(δn log m) time. To store α, we use an integer array α[1, m]. First we select a base element P [k]. For simplicity, assume that P [k] is the greatest in P . Then α[i] = [T [i + k − 1]/P [k]], 1 ≤ i ≤ n − k + 1. While computing the FFT to find a match between T [i, i + m − 1] and P , we use α[i]. The base element P [k] should meet one condition. For any character T [i], α[i] − 0.5 ≤ T [i]/P [k] < α[i] + 0.5. If there is an occurrence of (δ, γ)-matching at position i, |T [i] − α[i] × P [k]| ≤ δ. By replacing T [i] with α[i] × P [k] − δ and α[i] × P [k] + δ, we get two inequalities, α[i] − 0.5 ≤
α[i] × P [k] − δ < α[i] + 0.5 and P [k]
α[i] − 0.5 ≤
α[i] × P [k] + δ < α[i] + 0.5. P [k]
After some tedious computation using the fact δ ≥ 0, we get P [k] > 2δ. Note that it doesn’t mean that all the characters in P should meet this condition. Just one character is enough. Fortunately this condition can be met easily in applications. For example, A=440Hz in music analysis and human being cannot hear lower than 20Hz. Hence we can use δ up to 10, which is enough. Note that the answers from the FFT is just considering δ-matching. For (δ, γ)-matching, we need the verification mentioned above. Theorem 1. The amplified (δ, γ)-matching can be solved in O(δn log m + occ × m) time, where occ is the number of candidates. Proof. Computing the array D takes O(n) time. The FFT runs in O(δn log m) time. After finding occ candidates, each requires O(m) time verification.
Algorithms on Extended (δ, γ)-Matching
3.2
1141
Transposition-Invariant Matching
Definition 3. Given a text T = T [1, n], a pattern P = P [1, m], and two parameters δ and γ, the transposition-invariant (δ, γ)-matching is is to find all the substring T [i, i + m − 1] such thatfor an integer β, (a) ∀1 ≤ j ≤ m, |T [i + j − 1] − (P [j] + β)| ≤ δ, and (b) 1≤j≤m |T [i + j − 1] − (P [j] + β)| ≤ γ. Instead of using sparse-dynamic programming, we use a simpler method. We create two new strings T = T [1, n − 1] and P = P [1, m − 1] such that T [i] = T [i + 1] − T [i] and P [i] = P [i + 1] − P [i]. Then the following simple lemma holds. Lemma 1. If there is a (δ, γ)-matching of P at position i of T , then there is a (2δ, 2γ)-matching of P at position i of T . Proof. We first begin proving 2δ-matching. If there is an occurrence of δ-matching at position i of T , it is evident that −δ ≤ T [i + j − 1] − P [j] ≤ δ and −δ ≤ T [i + j] − P [j + 1] ≤ δ for 1 ≤ j ≤ m − 1. From these equations, it follows that −2δ ≤ (T [i + j] − T [i + j − 1]) − (P [j + 1] − P [j]) ≤ δ. Since T [i + j − 1] = T [i + j] − T [i + j − 1] and P [j] = P [j + 1] − P [j], −2δ ≤ T [i + j − 1] − P [j] ≤ 2δ. Now we prove 2γ-matching. Now we consider about gamma-matching. If there is γ-matching of P in T at position i, it should be m j=1 |T [i + j − 1] − P [j]| ≤ γ. Now we use a simple fact |A| + |B| ≥ |A + B|. m−1
|T [i + j − 1] − P [j]| =
j=1
m−1
|(T [i + j] − T [i + j − 1]) − (P [i + 1] − P [i])|
j=1
=
m−1
|(T [i + j] − P [i + 1]) + (P [i] − T [i + j − 1])|
j=1
≤
m−1
(|(T [i + j] − P [i + 1])| + |P [i] − T [i + j − 1])|)
j=1
≤2×
m
|T [i + j − 1] − P [i]|
i=1
≤ 2γ. Using this fact, we find occurrences of (2δ, 2γ)-matching of P from T . The results are candidates for (δ, γ)-matching of P from T . Then we check whether they are real occurrences of (δ, γ)-matching or not. Theorem 2. The transposition-invariant (δ, γ)-matching can be solved in O(δn log m + occ × m) time, where occ is the number of candidates. Proof. Computing T and P takes in O(m + n) time. The FFT runs in O(δn log m) time. Each verification runs in O(m) time.
1142
I. Lee, R. Clifford, and S.-R. Kim
Now we consider how large occ is. If T and P are drawn randomly from Σ, it is easy to show that the probability that T [i] and P [j] can have a 2δ-matching is (4δ + 1)/|Σ|. Hence, the probability is ((4δ + 1)/|Σ|)m−1 . The expected number of candidates is n((4δ + 1)/|Σ|)m−1 , which is small when δ is quite small and |Σ| is large. 3.3
Amplified Transposition-Invariant Matching
We simply explain the outline of amplified transposition-invariant matching. The first observation is that if there is an occurrence of amplified transpositioninvariant matching of P in T , then there is an occurrence of amplified matching of P in T (we can get P and T as we did in transposition-invariant matching). Therefore, we first create P and T , then we do amplified matching. Then we verify the results as we did in transposition-invariant matching, using α array obtained during amplified matching. It is easy to show that the time complexity is O(δn log m + occ × m).
4
Conclusion
We showed simple O(δn log m + occ × m) time algorithms for amplified, transposition-invariant, and amplified transposition-invariant matching . Further research includes parameterised version of the problems discussed in this paper, which means finding a mapping Σ → Σ For exact scaled matching, there is an algorithm for the parameterised version [1]. Another interesting problem is to find occurrences of (δ, γ)-matching by more complex transforms.
References 1. A. Amir, A. Butman, and M. Lewenstein. Real scaled matching. Information Processing Letters, 70(4):185–190, 1999. 2. P. Clifford, R. Clifford, and C. S. Iliopoulos. Faster Algorithms for (δ, γ)-Matching and Related Problems. In Proc. of 16th Combinatorial Pattern Matching (CPM ’05), pages 68–78. 2005. 3. T. H. Cormen, C. E. Leiserson, and R. L. Rivest. Introduction to Algorithms. MIT Press, 1990. 4. M. Crochemore, C. S. Iliopoulos, G. Navarro, Y. Pinz´ on, and A. Salinger. Bit-parallel (δ, γ)-matching suffix automata. Journal of Discrete Algorithms, 3(2-4):198-214, 2004. 5. H. Hyrr¨ o. Restricted Transposition Invariant Approximate String Matching. In it Proc. of 12th String Processing and Information Retrieval (SPIRE ’05), pages 257–267, 2005. 6. K. Lemstr¨ om, G. Navarro, and Y. Pinz´ on. Practical Algorithms for TranspositionInvariant String-Matching. Journal of Discrete Algorithms, 3(2-4):267-292, 2005 7. V. M¨ akinen, G. Navarro, and E. Ukkonen. Transposition Invariant String Matching. Journal of Algorithms, 56(2):124-153, 2005.
SOM and Neural Gas as Graduated Nonconvexity Algorithms* Ana I. González, Alicia D’Anjou, M. Teresa García-Sebastian, and Manuel Graña Grupo de Inteligencia Computacional, Facultad de Informática, UPV/EHU, Apdo. 649, 20080 San Sebastián, España [email protected] Abstract. Convergence of the Self-Organizing Map (SOM) and Neural Gas (NG) is usually contemplated from the point of view of stochastic gradient descent. This class of algorithms is characterized by a very slow convergence rate. However we have found empirically that One-Pass realizations of SOM and NG provide good results or even improve over the slower realizations, when the performance measure is the distortion. One-Pass realizations use each data sample item only once, imposing a very fast reduction of the learning parameters that does not conform to the convergence requirements of stochastic gradient descent. That empirical evidence leads us to propose that the appropriate setting for the convergence analysis of SOM, NG and similar competitive clustering algorithms is the field of Graduated Nonconvexity algorithms. We show they can easily be put in this framework.
1 Introduction In this paper we focus on two well known competitive artificial neural networks: Self Organising Map (SOM) [3, 14, 15] and the Neural Gas (NG) [17]. They can be used for Vector Quantization [1, 9, 10, 12], which is a technique that maps a set of input vectors into a finite collection of predetermined codevectors. The set of all codevectors is called the codebook. In designing a vector quantizer, the goal is to construct a codebook for which the expected distortion of approximating any input vector by a codevector is minimized. Therefore, the distortion computed over the sample data is the natural performance feature for VQ algorithms. It is important to note this, because for the SOM there is a body of work devoted to its convergence to organized states (i.e.[22]), however for the NG there is no meaning for such a concept. Organized states are related to the nonlinear dimension reduction ability of the SOM based on topological preservation properties of the organized states. The starting assumption in this paper is that both SOM and NG are to be used as VQ design algorithms. Some works [3] have already pointed that SOM may be viewed as a robust initialization step for the fine-tuning of the Simple Competitive Learning (SCL) to perform VQ. Both SOM and NG algorithms have the appearance of stochastic gradient descent (online) algorithms [8] in their original definitions, that is, whenever an input vector is *
The work is partially supported by MEC grants DPI2003-06972 and VIMS-2003-20088-c0404, and UPV/EHU grant UE03A07.
M. Gavrilova et al. (Eds.): ICCSA 2006, LNCS 3982, pp. 1143 – 1152, 2006. © Springer-Verlag Berlin Heidelberg 2006
1144
A.I. González et al.
presented, a learning (adaptation) step occurs. It has been shown that an online version of the NG algorithm can find better local solutions than the online SOM [17]. Online realizations are very lengthy due to the slow convergence rate of the stochastic gradient descent. To speed up computations, there are batch versions for both algorithms. Batch realizations correspond to deterministic gradient descent algorithms. The parameter estimation is performed using statistics computed over the whole data sample. The batch version of SOM was already proposed in [14] as a reasonable speed-up of the online SOM, with minor solution quality degradation. In the empirical analysis reported in [6], the main drawbacks for the batch SOM are its sensitivity to initial conditions and the bad organization of the final class representatives that may be due to poor topological preservation. Good initialisation instances of the batch SOM may improve the solutions obtained by the online SOM. On the other hand, the online SOM is robust against bad initializations and provides good topological ordering, if the adaptation schedule is smooth enough. The batch version of the NG algorithm has been studied in [18] as an algorithm for clustering data. It has been proposed as a convenient speed-up of the online NG. Both the online and batch algorithm versions imply the iteration over the whole sample several times. On the contrary, One-Pass realizations visit only once the sample data. This adaptation framework is not very common in the neural networks literature; in fact, the only related reference that we have found is [4]. The effective scheduled sequences of the learning parameters applied to meet the fast adaptation requirement fall far from the theoretical convergence conditions. However, as we shall see, in some cases the distortion results are competitive with the conventional SOM and NG online and batch versions. If we take into account the computation time, the One-Pass realization superiority becomes spectacular. These results lead us to think that may be there are more critical phenomena working in the convergence other than the learning rate. We postulate that both SOM and NG are instances of the Graduated Nonconvexity (GNC) algorithms [19, 20], which are related to the parameter continuation methods [2]. GNC algorithms try to solve the minimization of a non-convex objective function by the progressive search of the minima of a sequence of functions that depending of a parameter are morphed from a convex function up to the non-convex original function. Continuation methods perform the search for roots of highly nonlinear systems in a similar way. For independent verification, the Matlab code of the experiments described in this paper is available in the following web address: www.sc.ehu.es/acwgoaca/ under the heading “proyectos”. Section 2 presents the formal definition of the algorithms. Section 3 gives the experimental results. Section 4 discusses the formulation of the SOM and NG as GNC algorithms. Section 5 is devoted to conclusions and discussion.
2 Algorithm Definitions X = {x 1, ,x n } the input data sample real valued vectors and Y = {y 1 ,..,y c } the set of real valued codevectors (codebook). The design of the codebook is performed minimizing the error/distortion function E:
Let it be
SOM and Neural Gas as Graduated Nonconvexity Algorithms n
2
E = ∑ x i − y k(i) ; i=1
⎧ k(i ) = argmin⎨ x i − y j j=1,..,c ⎩
1145
2⎫
⎬ ⎭
(1)
Each algorithm described below has some control parameters, like the learning ratio, the neighbourhood size and shape, or the temperature. The online realizations usually modify their values following each input data presentation and adaptation of the codebook. The batch realizations modify their values after each presentation of the whole input data sample. Both online and batch realizations imply that the input data set is presented several times. On the contrary, the One-Pass realizations imply that each input data is presented at most once for adaptation, and that the control parameters are modified after each presentation. 2.1 One-Pass Version of Self-Organizing Map and Neural Gas The SOM is a particular case of the general Competitive Neural Network algorithm: y i (t +1) = y i (t) + α (t) H i (x(t), Y(t))(x(t) − y i (t))
(2)
Where t is the order of presentation of sample vectors; the size of the sample is n. We denote by H i (x,Y ) the so-called neighbouring function, and by α i (t ) the (local) learning rate. In the case of a conventional online realization, t corresponds to the iteration number over the sample, and the learning rate and neighbour value is fixed during iteration. In the case of One-Pass realization, t corresponds to the input vector presentation number, and the learning rate and neighbour value is updated during iteration. In their general statement, Competitive Neural Networks are designed to perform stochastic gradient minimisation of a distortion-like function similar to that in equation (1). In order to guarantee theoretical convergence, the learning rate must comply with the following conditions:
lim α (t ) = 0, ∑
t→∞
∞ α t=0
∞
(t ) = ∞, ∑ t=0 α 2 (t ) < ∞
(3)
However, these conditions imply very lengthy adaptation processes, for finite samples they usually force walking over the sample several times. The idea of performing a One-Pass realization of the minimization process violates these conditions. Besides, it imposes strong schedules of he algorithm control parameters. In the experiments, the learning rate follows the expression [5]: t
α (t ) = α 0 (α n α 0 ) n
(4)
Where α 0 and α n are the initial and final value of the learning rate, respectively. Therefore after n presentations the learning rate reaches its final value. In the case of the SOM, the neighbouring function is defined over the space of the neuron (codevector) indices. In the experiments reported in this paper, we assume a 1D topology of the codevector indices. The neighbourhoods considered decay exponentially following the expression:
1146
A.I. González et al.
⎧⎪ 1 H i (x, Y ) = ⎨ ⎪⎩0
{
⎡
w - i ≤ h0 (hn h0 )
w = argmin x − y k
8t n
⎤−1;
otherwise 2
}
(5)
k = 1,..,c ;1 ≤ i ≤ c
The initial and final neighbourhood radius are h0 and hn , respectively. The expression ensures that the neighbouring function reduces to the simple competitive case (null neighbourhood) after the presentation of the first 1 8 inputs of the sample. With this neighbourhood reduction rate, we obtain, after a quick initial ordering of the codevectors, a slow local fine-tuning. We proposed this scheduling in [11] to approach real-time constraints and other authors have worked with this idea [3] in the context of conventional online realizations. The NG introduced in [17] shares with the SOM the structure shown in equation (2) it is characterized by the following neighbouring function:
H i (x, Y ) = exp(− ranking(i,x,Y ) λ )
(6)
The ranking function returns the position {0,...,c −1} of the codevector yi in the set of codevectors ordered by their distances to the input x. All codevectors are updated, there are not properly defined neighbours, but the temperature parameter λ decays exponentially according to the following expression: t
λ (t ) = λ 0 (λ n λ 0 ) n
(7)
Where λ 0 and λ n are its initial and final value. The expression ensures that the neighbouring function reduces to the simple competitive case (null neighbourhood) as happens with SOM. In the case of his online version, t would correspond to the input vector presentation number, and the temperature parameter value would be fixed for all the input samples during a complete presentation of the input data set. 2.2 Batch Version of Self-Organizing Map and Neural Gas Kononen’s Batch Map [14, 15] defined the batch version of the SOM algorithm. Among its advantages, there is no learning rate parameter and the computation is faster than the conventional online realization. This algorithm can be viewed as the LBG algorithm [16] plus a neighbouring function. When the input data set is completely presented, each input sample is classified in the Voronoi region defined by the winner codevector y w (t ) : ∀i ,1 ≤ i ≤ c , x (t ) − y w (t ) ≤ x (t ) − y i (t )
(8)
Where t corresponds to the iteration number over the sample and learning parameter values are fixed during each iteration. To recalculate the codebook, each codevector is computed as the centroid of its Voronoi region and the Voronoi regions of his neighbour codevectors as follows:
SOM and Neural Gas as Graduated Nonconvexity Algorithms
y i (t) = ∑
x(t)∈U i
1147
(9)
x (t ) n(U i )
Where Ui is the union of Voronoi regions corresponding to the codevectors that lie up to a certain radius h(t ) from codevector i, in the topology of the codevector indices. And n(Ui ) means the number of samples x(t) that belong to Ui. To determine the radius of the neighbourhood we applied the following function: t⎤ ⎡ h(t ) = ⎢ h0 (hn h0 )n ⎥ −1 ⎢ ⎥
(10)
The expression ensures that the neighbouring function reduces to hn after n iterations. In the experiments, it takes value 0.1, which implies that its operation is equivalent to LBG or the k-means: the codevector is the arithmetic mean of the input sample that lies in the unique Voronoi region associated with the codevector. In the Batch SOM, all neighbours have the same contribution to the centroid calculation, as in the SOM online realization. A definition of Batch NG arises from the idea of changing the contribution to the codebook in function of neighbour-region distances, imitating the online realization of NG. We produce this effect applying a weighting mean as follows: y i (t) = ∑
x(t)
x(t)w x(t)
∑ wx(t)
(11)
Where w x(t) is the weighting term for the input samples in the Voronoi region, defined by the codevector y i , given by the following expression:
(
)
w x( t ) = exp − ranking(i ,x(t ),Y (t )) λ (t )
(12)
The ranking function and temperature parameter λ are equal to those in the OnePass case. The neighbour-region contribution decays exponentially due to the evolution of λ in equation (6). As with the Batch SOM, the Batch NG converges to the LBG algorithm: only the region corresponding to the codebook contributes to its calculus.
3 Experimental Results The results presented here are continuation of the ones reported in [21], we have applied the algorithms to higher dimension data. The figure 1 shows two 3D data sets with some SOM VQ solution. The 2D versions have been used in [5,7] for evaluation of clustering and VQ algorithms. Figure 1(a) is three-level Cantor set distribution on 3D space. Figure 1(b) is a mixture of Gaussians in 3D space. The three-level Cantor set (2048 data points) is uniformly distributed on a fractal; it is constructed by starting with a unit interval, removing the middle third, and then recursively repeating the procedure on the two portions of the interval that are left. And the third data set is a collection of data points generated by a mixture of ten Gaussian distributions. The codebook initialization used in this paper and reflected in the results of figure 2 is a random selection of input sample data. The codebook size is set to c = 16
1148
A.I. González et al.
codevectors. The maximum number of sample presentations has been established at n = 50 for conventional online and batch realizations of the algorithms. Nevertheless, we introduce a stopping criterion on the relative decrement of the distortion; the process will stop if it is not greater than ξ = 0.001 .
(a)
(b)
Fig. 1. The 3D benchmark data sets (a) Cantor set and (b) mixture of Gaussian distributions, with sample VQ solutions obtained by the SOM
For SOM algorithms the neighbourhood initial and final parameter values have been set to: h0 = c / 2 + 1; hn = 0.1, and for NG algorithms they have been set to λ 0 = c / 2; λ n = 0.01. In both One-Pass version algorithms the learning rate values are α 0 = 0.5 and α 0 = 0.005. We have executed 100 times each algorithm. In figures 2a, 2b the y-axis corresponds to the mean and 0.99 confidence interval of the product of the final distortion and the computation time as measured by Matlab. The algorithms tested are: the online conventional realizations of SOM and NG, the batch versions (BSOM and BNG) and the online One-Pass realizations (SOMOP and NGOP). The inspection of the figure 2 reveals that the relative efficiencies of the algorithms measured by the distortion depend on the nature of the data. From our point of view, the most salient feature of the distortion plots is that the One-Pass realizations are competitive with the batch and online realizations. Although this is not the main concern of this paper, the distortion results show that the NG improves the SOM most of the times, confirming the results in the literature [17]. The batch realization sometimes improves the online realization (Cantor), sometimes not (Gaussian). When we take into account the computation time in the plots of figures 2a, 2b, the improvement of the One-Pass realization over the batch and online conventional realizations is spectacular. It can also be appreciated the improvement of the batch realization over the conventional online realization. We have used the product of time and distortion instead of the ratio distortion/time in order to maintain the qualitative interpretation of the plots: the smallest are the best. The figure 3 shows the evolution of the error for some instances of the algorihms. It can be appreciated that the OnePass realizations achieve really fast the error values equivalent to those of the conventional realizations.
SOM and Neural Gas as Graduated Nonconvexity Algorithms
(a)
1149
(b)
Fig. 2. Results on the three benchmark algorithms. The product of distortion and the computational time for (a) Gaussian data, (b) cantor data.
(a)
(b)
Fig. 3. Sample plots of the evolution of the quantization distortion along the training process over the (a) Cantor set, (b) mixture of Gaussian data.
(a)
(b)
Fig. 4. (a) the product of the distortion and the computation time for the Indian Pines image data, (b) samples of the error trajectories during training
1150
A.I. González et al.
The last experimental data is an AVIRIS hyperspectral image, the so called Indian Pines image, used and described, for instance in [23]. The image is a 145x145 pixel image, where each pixel has 220 bands. The figure 4 shows the average efficiency of the algorithms realizations and a sample trajectory of the error during the training process. Again, in this high dimensional data, the One-Pass realization is much more efficient and gives distortion results comparable with the Batch and conventional online realizations.
4 SOM and NG as GNC The previous results cannot be understood in the framework of the convergence of the stochastic gradient algorithms. The basic formulation of the GNC approach [19, 20] is that the function to be minimized is the MAP estimate of a sampled surface corrupted by additive noise M (x ) = D(x ) + N (x) . This MAP estimate p(R = D M )takes the form E[R] = − log p(M R = D)− log(D = R) = Ed [R] + Es [R] ,
(13)
where Ed [R] is the data term and Es [R] is the smoothness term. The data term is quadratic under the usual Gaussian noise assumption, and the smoothness term express any a priori information about the surface. In [20] the smoothness term is formulated over the surface gradient. The GNC function general formulation is: E[R] =
∑ (M (x) − R(x)) + Es [R] , 2
(14)
x
where the smoothness term depends on some parameter Es [R] = fσ (R). The key of GNC methods is that the function to be minimized E[R] is embedded in a oneparameter functional family Eσ [R] so that the initial functional Eσ 0 [R] is convex,
and the final functional is equivalent to the original function E0 [R] ≡ E[R] . The minimization is performed tracking the local minimum of Eσ [R] from the initial to the final functional. As the initial functional is convex, the algorithm becomes independent of the initial conditions. It must be noted that one of the properties that the SOM and NG show over the bare SCL algorithms is the robustness against bad initial conditions [3]. The NG was proposed [17] as the minimization of the following functional, Eng (w, λ ) =
1 N ∑ ∫ d D vP(v)hλ (ki (v, w))(v − w i )2 , 2C (λ ) i =1
(15)
that we discretize here, assuming sample data {v1 , v 2 ,..., v M } , Eng (w, λ ) =
2 ( ( ))(v j − wi ) ,
N M 1 ∑ ∑ hλ ki v j ,w 2C (λ ) i=1 j=1
(16)
SOM and Neural Gas as Graduated Nonconvexity Algorithms
(
1151
)
where ki v j ,w is the ranking function. Note that we can reorganize it as follows: N
N M
2 ( ( ))(v j − wi )
Eng (w, λ ) = ∑ hλ (ki (v i , w))(v i − w i ) + ∑ ∑ hλ k i v j ,w 2
i=1 j=1 j≠i
i=1
(17)
if hλ (k i (v i ,w)) is constant and equal to one, the first term in equation (17) is equivalent to the data term in equation (14). This may be so if there is a specific ordering of the weights so that they match the first N data points. If, additionally, the value of hλ (k i (v i ,w)) is one for the best matching unit, then the coincidence is exact. In the NG this function is an exponential function that performs a role like the focusing Gaussians in [20]. The second term in equation (17) corresponds to the smoothing term in equation (14). One key problem in GNC is to ensure that the initial functional is convex [20]. Other problem is to ensure that there are no bifurcations or other effects that may affect the continuation process. It seems that for NG and SOM it is very easy to ensure the convexity of the initial functional and that the continuation is also an easy process. In the case of the SOM, when the neighbourhood function is the one used in the experiments it is assumed that the functional to be minimized is the extended distortion: N M
(
)(
)
2
ESOM (Y, λ ) = ∑ ∑ H i x j ,Y x j − y i , i=1 j=1
⎧1 w - i ≤ λ H i (x, Y ) = ⎨ ; ⎩0 otherwise
{
2
}
(18)
w = argmin x − y k k = 1,..,c ;1 ≤ i ≤ c
Again it is easy to decompose the functional in a structure similar to that of equation (14). ESOM (Y , λ ) =
M
∑( j=1
x j − yw
N M
) ∑ ∑ H i (x j , Y )(x j − y i ) , 2
+
i=1 j=1 i≠w
2
(19)
Therefore, the SOM can be assimilated to a GNC algorithm. It is very easy to ensure the convexity of its functional for large neighbourhoods, and the continuation process seems to be very robust.
5 Discussion and Conclussions The paradoxical results found in [23] and extended here, showing that the One-Pass realization of the SOM and NG can give competitive performance in terms of distortion, and much better than the “conventional” batch and online realizations in terms of computational efficiency (time x distortion) leads us to the idea that these algorithm’s performance is more sensitive to the neighbourhood parameters than to the learning gain parameter. We think that the context of continuation methods [2] and GNC [19,20] is an adequate context to analyze the convergence properties of the
1152
A.I. González et al.
algorithms. We have shown that the SOM and NG minimized functional can be easily seen to be like the ones considered in GNC. We will devote our future efforts to deepen in the analysis of these algorithms from this point of view.
References [1] S.C. Ahalt, A.K. Krishnamurthy, P. Chen, D.E. Melton (1990), Competitive Learning Algorithms for Vector Quantization, Neural Networks, vol. 3, p.277-290. [2] E. L. Allgower and K. Georg, Numerical Continuation Methods. An Introduction, Vol. 13, Springer Series in Computational Mathematics. Berlin/Heidelberg, Germany: Springer-Verlag, 1990. [3] E. Bodt, M. Cottrell, P. Letremy, M. Verleysen (2004), On the use of self-organizing maps to accelerate vector quantization, Neurocomputing, vol 56, p. 187-203. [4] C. Chan, M. Vetterli (1995), Lossy Compression of Individual Signals Based on String Matching and One-Pass Codebook Design, ICASSP'95, Detroit, MI. [5] C. Chinrungrueng, C. Séquin (1995), Optimal Adaptive K-Means Algorithm with Dynamic Adjustment of Learning Rate, IEEE Trans. on Neural Networks, vol. 6(1), p.157-169. [6] Fort J.C., Letrémy P., Cottrell M. (2002), Advantages and Drawbacks of the Batch Kohonen Algorithm, , in M. Verleysen (ed), Proc. of ESANN'2002,Brugge, Editions D Facto, Bruxelles, p. 223-230. [7] B. Fritzke (1997), The LBG-U method for vector quantization - an improvement over LBG inspired from neural networks, Neural Processing Letters, vol. 5(1) p. 35-45. [8] K. Fukunaga (1990), Statistical Pattern Recognition, Academic Press. [9] A. Gersho (1982), On the structure of vector quantizers, IEEE Trans. Inf. Th., 28(2), p.157-166. [10] A. Gersho, R.M. Gray (1992), Vector Quantization and signal compression, Kluwer. [11] A. I. Gonzalez, M. Graña, A. d'Anjou, F.X. Albizuri (1997), A near real-time evolutive strategy for adaptive Color Quantization of image sequences, Joint Conference Information Sciences, vol. 1, p. 69-72. [12] R.M. Gray (1984), Vector Quantization, IEEE ASSP, vol. 1, p.4-29. [13] T. Hofmann and J. M. Buhmann (1998) Competitive Learning Algorithms for Robust Vector Quantization IEEE Trans. Signal Processing 46(6): 1665-1675 [14] T. Kohonen (1984) (1988 2nd ed.), Self-Organization and associative memory, Springer Verlag. [15] T. Kohonen (1998), The self-organising map, Neurocomputing, vol 21, p. 1-6. [16] Y. Linde, A. Buzo, R.M. Gray (1980), An algorithm for vector quantizer design, IEEE Trans. Comm., 28, p.84-95. [17] T. Martinetz, S. Berkovich, K. Schulten (1993), Neural-Gas network for vector quantization and his application to time series prediction, IEEE trans. Neural Networks, vol. 4(4), p.558-569. [18] S. Zhong, J. Ghosh (2003) A Unified Framework for Model-based Clustering Journal of Machine Learning Research 4:1001-1037 [19] A. Blake, and A. Zisserman, Visual Reconstruction. Cambridge, Mass.: MIT Press, 1987. [20] Nielsen, M. (1997) Graduated nonconvexity by functional focusing, IEEE Trans. Patt. Anal. Mach. Int. 19(5):521 – 52 [21] A.I. Gonzalez, M. Graña (2005) Controversial empirical results on batch versus one pass online algorithms, Proc. WSOM2005, Sept. Paris, Fr. pp.405-411 [22] J.C. Fort, G. Pagès (1996) About the Kohonen algorithm: strong or weak Selforganization? Neural Networks 9(5) pp.773-785 [23] Gualtieri, J.A.; Chettri, S. ‘Support vector machines for classification of hyperspectral data’, Proc. Geosci. Rem. Sens. Symp., 2000, IGARSS 2000. pp.:813 - 815 vol.2
Analysis of Multi-domain Complex Simulation Studies James R. Gattiker, Earl Lawrence, and David Higdon Los Alamos National Laboratory {gatt, earl, dhigdon}@lanl.gov Abstract. Complex simulations are increasingly important in systems analysis and design. In some cases simulations can be exhaustively validated against experiment and taken to be implicitly accurate. However, in domains where only limited validation of the simulations can be performed, implications of simulation studies have historically been qualitative. Validation is notably difficult in cases where experiments are expensive or otherwise prohibitive, where experimental effects are difficult to measure, and where models are thought to have unaccounted systematic error. This paper describes an approach to integrate simulation experiments with empirical data that has been applied successfully in a number of domains. This methodology generates coherent estimates of confidence in model predictions, model parameters, and estimates, i.e. calibrations, for unobserved variables. Extensions are described to integrate the results of separate experiments into a single estimate for simulation parameters, which demonstrates a new approach to modelbased data fusion.
1
Introduction
Computational simulation applications are increasingly used to explore a number of domains, including: climate, ocean, and weather modeling; atomic scale physics modeling; aerodynamic modeling; and cosmology applications. A significant challenge for using simulation studies is the quantitative analysis of simulation results, and the comparison and integration of the simulations with experimental data. At Los Alamos National Laboratory, a challenge is to certify the safety and reliability of nuclear weapons where only indirect physical experiments can be performed[1]. Simulations model physical experimental results. Uncertainties arise from a variety of sources that include: uncertainty in the specification of initial conditions, uncertainty in the value of important physical constants (e.g., melting temperatures, equations of state, stress-strain relationships, shock propagation, and transient states), inadequate mathematical models, and numerical computation effects. Experimental observations constrain uncertainties within the simulator, and are used to validate simulation components and responses[10]. The methodology described here addresses three main goals in simulation analysis. First is the quantification of uncertainty in predictions. Most simulations systems lack the ability to directly assess the uncertainty in their results, M. Gavrilova et al. (Eds.): ICCSA 2006, LNCS 3982, pp. 1153–1162, 2006. c Springer-Verlag Berlin Heidelberg 2006
1154
J.R. Gattiker, E. Lawrence, and D. Higdon
although it is clear from failure to match reality that both bias and uncertainty exist. The second goal is the calibration of unknown parameters. Simulations often have parameters that are either non-physical or are unmeasurable in experiments, and must be determined, or calibrated. The third goal addressed, discussed here for the first time, is the linking and joint calibration of variables common to separate experiments. Additional issues constrain approaches to this problem. Experimental data in typical application areas is generally difficult to collect because of expense, difficulty in making physical measurements, or external constraints. Simulation studies are often computationally expensive, having usually been developed at the limits of feasible computation, and so there is limited access to alternative simulation parameter settings. Some exploration of alternatives is therefore possible, but putting the simulator directly into an iterated method can be prohibitive. This paper describes a methodology that has been implemented and demonstrated to be effective in addressing these issues in real problems[2]. The approach is to model simulation response with an accurate emulated response. Parameters of this emulator as well as simulation parameters are simultaneously determined using a Bayesian parameter formulation and associated Markov chain Monte Carlo sampling. The emulator itself is a stochastic process model, modeling both simulation response and systematic bias. 1.1
The Model Evaluation Problem
This section follows an explanatory “toy” problem that captures many of the issues in analyzing (simulation) models in conjunction with experiments[3]. The task is to analyze models of gravitational attraction, through the combination of an analytical model and experiment. To study the nature of falling objects a test object is dropped from various heights to study their descent time. In addition to characterizing the measurements of limited experiments, we wish to extrapolate the behavior. In practice, the drop time is not a simple effect due to atmospheric effects. For explanatory purposes, experimental data is generated according to d2 z dz = −1 − 0.3 + , dt2 dt which includes the square law of gravitational attraction, plus a term for linear 2 effects. If the simulation models only the gravitational attraction ddt2z = −1, the results do not explain the data well, and extrapolate even more poorly. This model would give a fitted response as shown in Fig. 1a. To analyze the simulation results, we need a model of the data that explicitly allows for systematic error in the simulation system. In this approach, we use a model η that models the simulation responses, and an additional model δ of the discrepancy between the simulation and the data, so that the model is comprehensive, i.e., Y (z) = η(z) + δ(z) + . The data can now be modeled accurately as a systematic discrepancy from the simulation. Incorporating uncertainty into the model response, the results are shown in Fig. 1b. The model’s η,
Analysis of Multi-domain Complex Simulation Studies
(a)
1155
(b)
(c) Fig. 1. Explanatory diagrams of experiment drop times. a) Results compared to an inadequate ideal model; b) model results and (bounded) discrepancy adjustment; c) revised model.
or simulation-based prediction, remains distinctly low, while the discrepancy adjustment, with uncertainty incorporated, follows the data closely and responds better in extrapolation. Continuing the analysis, the discrepancy term suggests the postulation of an improved model, for example: d2 z dz = −1 − θ + , 2 dt dt where θ is an unknown to be calibrated. Determining this model, including uncertainty both in the model and the calibration of the parameter, the results are shown in Fig. 1c. In this case, our best simulation result with the calibrated parameter closely follows the data, and the complete model closely bounds the prediction. The enhanced model gives greater confidence in extrapolation as compared to the incorrect model relying on estimated discrepancy to fit the data. To summarize, the problem starts with the modeling of simulation response, and the modeling of experiments as the simulation response plus some systematic bias. This discrepancy describes simulation model insufficiency (or other systematic bias). Parameters of these models are determined so that uncertainty
1156
J.R. Gattiker, E. Lawrence, and D. Higdon
in predictions is implicitly given. Unknown parameters are calibrated to best reconcile the experimental data with the simulations, reporting on plausible values. Discrepancy estimates may be further used to examine problem foundations.
2
Model Formulation
This modeling approach was originally generated by Kennedy and O’Hagan [6], and has been discussed in application in [8, 9]. For space limitations, the model cannot be completely described here, but complete formulation of the single experiment configuration methodology is available [10, 11]. The core of the modeling effort is a Gaussian stochastic process model (GPM). These models offer an expressive method for modeling a response over a generic space [12]. The GPM relies on a specified covariance structure, called the covariogram, that describes the correlation between data points that scales with distance. In this case, nearby is measured as a scaled Euclidean distance in the simulation parameter space. The covariogram is then: C(x1 , x2 ) =
2 1 1 exp−d(x1 ,x2 ,β) + I , λz λs
where d = β i (xi1 − xi2 )2 . The λ parameters are precisions (inverse variances), with λz corresponding to the variability of the data, and λs corresponding to the variability of the residual. This correlation assumption constrains the response surface to a family functions. Predictions are made by computing a joint covariogram of the known and predicted datapoints, and producing the multivariate normal conditional distribution of the unknown locations. The predictions are then distributions rather than point estimates: Xp ∼ N (μp , Σp ). This distribution can be used to produce realizations of possible values, and also allows the extraction of confidence bounds on the estimates. η models the simulations, but a two-part model is used to also explicitly model the discrepancy δ between the observed data, yobs , and the simulation response, ysim , such that: ysim = η(x, t) + , yobs = η(x, θ) + δ(x) + . t are simulation parameters whose corresponding values are not known for the observed data. These unknown θ are determined (i.e., calibrated) in modeling, along with the β and λ parameters for for the η and δ models. η and δ model parameters and θ values are generated with Markov chain Monte-Carlo sampling of a Bayesian posterior. The Bayesian approach gives the posterior density for the η model as π(, β, λ|ysim (x)) ∝ L(y(x)|η(x, β, λ)) × π(β) × π(λ).
Analysis of Multi-domain Complex Simulation Studies
1157
In words: the posterior distribution of the parameters given the simulation data is proportional to the likelihood of the data given the parameters times the prior probability of the parameters. The likelihood corresponds to a least squares measure of the model predictions compared to the given data, though this is not computed explicitly. This formulation has been extended to produce a single likelihood of the η + δ model fitting the observed and simulated data simultaneously [11]. The posterior of the parameters can be sampled from using Metropolis-Hastings MCMC sampler, resulting in samples from joint the posterior density of all parameters. It is possible to optimize directly on the likelihood function or the posterior for a point solution, but the resulting “optimal” solution has no associated confidence on the parameters. Computing the likelihood requires inversion of the k × k covariance matrix, where k = n(p + q) + mq, where n is the number of experimental data and m is the number of simulated data, and p is the dimensionality of the simulation response, and q is the dimensionality of the discrepancy response. This quickly becomes intractable if p grows large, so an important enhancement to the model is the use of linear basis dimension reduction in the output space, the effects of which can be compensated for in the modeling approach. Good results have been obtained with principle components reducing the p dimension, and kernel regression constraining and reducing q. This makes even problems with large data sizes, for example time series or images, computationally tractable [10].
3
Example Application: Flyerplate Experiments
In order to study the properties of materials under shock, plates of the material are subjected to a high-velocity impact. The velocity of the impacted plate is measured over time, revealing several material property driven regimes, as detailed in Fig. 2. Results of flyerplate experiments are shown Fig. 3, which shows both the simulations from a 128 experiment design over 7 variables, and a trace of measured data. The unknown θ parameters ranges have been defined by subject matter experts, and scaled to [0,1] for the purposes of the analysis. θ in this problem
Fig. 2. Theoretical regions of flyerplate velocity measurements
1158
J.R. Gattiker, E. Lawrence, and D. Higdon
are parameters from the Preston-Tonks-Wallace stress-strain model[5]. Parameters include: θ0 Initial strain hardening rate; κ material constant in thermal activation energy; − log(γ) material constant in thermal activation energy; y0 , maximum yield stress; y∞ , minimum yield stress; s0 , maximum saturation stress; s∞ , minimum saturation stress. The simulation data is modeled with the first five principal components. The discrepancy model is a kernel basis constraining relatively smooth variation over the parameter space, modeling an arbitrary general effect in the absence of a more specific model. Figure 4 shows the results of the full joint parameter calibration as contours of the two-dimensional PDF projections, as sampled by the MCMC procedure. There are clear trends in some variables, which have been calibrated more tightly than their a priori ranges, whereas some do not have clear preferred values. An
Fig. 3. Observed and simulated tantalum flyerplate velocity results, native data scale and standardized to mean 0 variance 1
Fig. 4. Calibration results of unknown model parameters explored with simulations
Analysis of Multi-domain Complex Simulation Studies
1159
additional variable importance measure comes from the sampled spatial scaling parameters β, where lower spatial correlation corresponds to more active variables. This experiment verified that β2 , β3 , and β4 are important, consistent with the θ calibration. Proper sensitivity analysis can be performed by analyzing model predictions, which can be produced cheaply from the emulator model. Figure 5 shows predictive results of the model in the scaled space. These predic-
Fig. 5. Observed and calibrated predictions of the velocimetry results for tantalum flyerplate experiments (scaled). On the upper left is observed data in large dots, with the η model. Lower left shows the δ model contribution. The upper right is the observed data with the Y = η + δ result. The lower right plot repeats all results on one scale. Solid lines are mean response, and dotted lines show the associated 10%-90% confidence region of each quantity.
tions cover several model realizations (drawn from the Normal model distribution), predicted over many MCMC drawn parameter sets. The closest emulator simulation response is not able to capture the observed data, in particular in the edges where the simulations showed an initial higher-than-expected measurement, and late-time bump that is not observed in the data. The discrepancy adjusted response reduces this error. Of particular interest is that the simulator response alone not only failed to capture the observed data, but the confidence region is also not satisfactory. The Y = η + δ model’s confidence regions are more appropriate.
4
Joint Calibration of Models
In complex applications, separate effects test are used to explore different aspects of a problem through surrogate experiments. Flyerplate results may be scientifically interesting in their domain, but the experiments are also intended to collect data on effects that are part of larger physical simulations. Some of these experiments will inform on the same parameters, and it is desired to perform a calibration that uses all of these results to simultaneously and consistently.
1160
J.R. Gattiker, E. Lawrence, and D. Higdon
If two models have separate likelihoods L1 (θ1 , θ2 |y) and L2 (θ1 , θ3 |y), they can be considered a joint model as LJ = L1 × L2 . Using MCMC, the draws can be simplified, computing LJ for the draws related to θ1 , while for the parameters of independent models the likelihoods, L1 and L2 are computed independently for draws of θ2 and θ3 , saving computation. This is a method to quantify the effects of variables in common between experiments. In the shock speed modeling problem, it is desired to model measured shock speed quantities of several materials. A single experiment in a single material consists of shock particle velocity up measured in response to shock of speed us . Several experiments characterize a material response curve, which is modeled by simulations, as shown in Fig. 6. In the full application, there are many materials and several parameters, some of which are shared between models. We will limit the discussion to the characterization of hydrogen (H) and deuterium (D), which use the same parameters, referred to here as θ1 -θ3 . Figure 7 shows the calibration of parameters for each single model, as well as the joint model. The joint model shows a more compact and stable calibration, in a region that is expected by domain experts. These
Fig. 6. Shock speed simulation models and measured data for hydrogen (left) and deuterium
Fig. 7. Joint calibration of model parameters. The left plot shows the calibrated theta vectors for a hydrogen model alone, the middle plot shows the same parameters in a deuterium model, and the right plot shows the joint calibration of the parameters given both datasets.
Analysis of Multi-domain Complex Simulation Studies
1161
results show that this methodology can successfully capture data from different experiments and even different simulation systems to calibrate underlying parameters with greater fidelity than the single model calibrations are capable.
5
Discussion
The approach described provides a method to quantify the implications of experimental data combined with simulation experiments. It is a tool to be used with domain experts from both the modeling and simulation domain, as well as the experimental data domain. Expert judgement is required in the generation of appropriate simulation studies, the construction of plausible discrepancy models, and the assessment of results. If domain knowledge is available to describe strong parameter priors, including θ parameter bounds and relationships, these may be incorporated, though by default weak priors that do not inappropriately constrain modeling and calibration can be used. As is usual in complex models, attention to diagnostics is important. Because this modeling approach incorporates a number of trade-offs in the modeling space, it is possible that the model could enter an inappropriate domain, where it is fitting the data well, but the parameters are not plausible (e.g., counter to physics). Also, some expertise in MCMC is needed to identify the initial transient and to determine appropriate step sizes over the parameter ranges to ensure adequate mixing of the chain. Without response smoothness, it is difficult to envision how to model and calibrate in this (or any) framework. Thus, a key issue in successful application is ensuring simulation response smoothness through the parameter space under analysis. If this is not an inherent quality of the data, variable transformations and feature construction studies are necessary. In summary, this modeling approach: – provides a method for integrating simulation experiments with empirical data, modeling systematic error in the simulation; – calibrates unknown simulation parameters; – provides well-founded uncertainty estimates in parameters and predictions; and – allows separate experiment results to be fused into one result of parameter calibration. When given two distinct datasets with a relationship in their underlying variables, it is generally not clear how to fuse the information in the datasets into a single answer. Information must be of the same type before it can be quantitatively reconciled, and this is usually solved by transforming the data directly into the same domain. The application methodology described here shows how different datasets may be linked by a generating model, in this case a simulation that can produce results in the various model domains. Through this approach, inverse problems from two distinct experimental domains can be combined, and a composite model realized.
1162
J.R. Gattiker, E. Lawrence, and D. Higdon
References 1. Los Alamos Science special issue on Science-Based Prediction for Complex Systems, no.29, 2005. 2. J. Gattiker, “Using the Gaussian Process Model for Simulation Analysis Code”, Los Alamos technical report LA-UR-05-5215, 2005. 3. Christie, Glimm, Grove, Higdon, Sharp, Schultz, ”Error Analysis in Simulation of Complex Phenomena”, Los Alamos Science special issue on Science-Based Prediction for Complex Systems, no.29, 2005, pp.6-25. 4. S. Chib, E. Greenberg, ”Understanding the Metropolis-Hastings Algorithm”, The American Statistician, Nov. 1995; 49, 4; p.327. 5. M Fugate, B Williams, D Higdon, K Hanson, J Gattiker, S Chen, C Unal, “Hierarchical Bayesian Analysis and the Preston-Tonks-Wallace Model”, Los Alamos Technical Report, LA-UR-05-3935, 2005. 6. M Kennedy, A. O’Hagan, “Bayesian Calibration of Computer Models (with discussion)”, Journal of the Royal Statistical Society B, 68:426-464. 7. D.Jones, M. Schonlau, W.Welch, “Efficient Global Optimization of Expensive Black-Box Functions”, Journal of Global Optimization 13, pp. 455-492, 1998. 8. D. Higdon, M. Kennedy, J. Cavendish, J. Cafeo, and R. Ryne, “Combining field Observations and Simulations for Calibration and Prediction”, SIAM Journal of Scientific Computing, 26:448-466. 9. C. Nakhleh, D. Higdon, C. Allen, V. Kumar, “Bayesian Reconstruction of Particle Beam Phase Space from Low Dimensional Data”, Los Alamos technical report LA-UR-05-5897. 10. Dave Higdon, Jim Gattiker, Brian Williams, Maria Rightley, ”Computer Model Calibration using High Dimensional Output”, Los Alamos Technical report LAUR-05-6410, submitted to the Journal of the American Statistical Association. 11. Brian Williams, Dave Higdon, James Gattiker, ”Uncertainty Quantification for Combining Experimental Data and Computer Simulations”, Los Alamos Technical Report LA-UR-05-7812. 12. D.R.Jones, M.Schonlau, W.Welch, “Efficient Global Optimization of Expensive Balck-Box Functions”, Journal of Global Optimization 13, pp. 455-492, 1998.
A Fast Method for Detecting Moving Vehicles Using Plane Constraint of Geometric Invariance* Dong-Joong Kang1, Jong-Eun Ha2, and Tae-Jung Lho1 1 Dept. of Mechatronics Eng., Tongmyong University, 535, Yongdang-dong, Nam-gu, Busan 608-711, Korea {djkang, tjlho}@tit.ac.kr 2 Dept. of Automotive Eng., Seoul National University of Technology, 138, Gongrung-gil, Nowon-gu, Seoul 139-743, Korea [email protected]
Abstract. This paper presents a new method of detecting on-road highway vehicles for active safety vehicle system. * We combine a projective invariant technique with motion information to detect overtaking road vehicles. The vehicles are assumed into a set of planes and the invariant technique extracts the plane from the theory that a geometric invariant value defined by five points on a plane is preserved under a projective transform. Harris corners as a salient image point are used to give motion information with the normalized cross correlation centered at these points. A probabilistic criterion without demand of a heuristic factor is defined to test the similarity of invariant values between sequential frames. Because the method is very fast, real-time processing is possible for vehicle detection. Experimental results using images of real road scenes are presented. Keywords: Geometric invariant, plane constraint, motion, vehicle detection, active safety vehicle.
1 Introduction There are growing social and technical interests in developing vision-based intelligent vehicle systems for improving traffic safety and efficiency. Intelligent on-road vehicles, guided by computer vision systems, are a main issue in developing experimental or commercial vehicles in numerous places in the world [1-7]. Reliable vehicle detection in images acquired by a moving vehicle is an important problem for the applications such as active safety vehicles (ASV) equipped with driver assistance system to avoid collision and dangerous accidents. Several factors including changing environmental condition affect on-road vehicle detection and the appearance changes of foregoing vehicles with scale, location, orientation, and pose transition makes the problem very challenging. Foregoing vehicles are come into several views with different speeds and may always vary in shape, size, and color. Vehicle appearance depends on relative pose * This work was supported by Tongmyong Univ. of Information Tech. Research Fund of 2005. M. Gavrilova et al. (Eds.): ICCSA 2006, LNCS 3982, pp. 1163 – 1171, 2006. © Springer-Verlag Berlin Heidelberg 2006
1164
D.-J. Kang, J.-E. Ha, and T.-J. Lho
between observer and foregoing vehicles and occlusion by nearby objects affects the detection performance. In case of real implementation of intelligent road vehicle, realtime processing is another important issue. We consider the problem of rear-view detection of foregoing and overtaking vehicles from gray-scale images. Several previous researches assume two main steps to detect road vehicles [1]. The first step of any vehicle detection system is hypothesizing the locations in images where vehicles are present. Then, verification is applied to test the hypotheses. Both steps are equally important and challenging. Well known approaches to generate the locations of vehicles in images include using motion information, symmetry, shadows, and vertical/horizontal edges [4-7]. The purpose of this paper is to provide a method for the hypothetical candidates of on-road vehicle detection. Once the hypothetical regions including vehicles are extracted first, then several methods could be applied to verify the initial detection. Detecting moving objects from images acquired by a static camera can be usually performed by simple image difference based methods. However, when the camera undergoes an arbitrary motion through a scene, the task is much more difficult since the scene is no longer static in the image. Simple image differencing techniques no longer apply. Road vehicle detection problem belongs to the second category because the observer camera is mounted on a moving vehicle. For general segmentation, optical flow from all image points or corresponding information from prominent image features can be used. In this paper, we present a method based on the projective invariant and motion information. Based on the sparsely obtained motion field or corresponding data, the method selects initial segmentation clusters by using projective invariant method that can be described by a plane constraint. Moving vehicle segmentation is based on the fact that a geometric invariant value of point-set defined on a plane of the vehicle is preserved after motion of the plane [8-10]. Harris corner [11] is a good image feature to provide motion information with the normalized correlation. The probabilistic criterions to test the similarity of invariant values between frames are introduced without a need of magic factors for the threshold. The proposed method is more exact in initial segmentation than simple methods clustering similar motion vectors because a side or rear part of a vehicle could be separately extracted under the strong plane constraint. The method is very fast and the processing time of each module is evaluated. Among the vehicles surrounding host vehicle, close-by front and rear, and overtaking side vehicles are more dangerous for collision and threaten car driver [1]. Methods detecting vehicles in these regions might be better to employ motion information because there are large intensity changes and detailed image features such as edges and corners by close view. Vehicles in the far distance region are relatively less dangerous and appearance is more stable since the full view of a vehicle is available.
2 Segmentation of Planes on Moving Vehicle 2.1 Projective Invariants Projective invariants are quantities which do not change under projective transformations. Detailed contents of the uses of invariants are given in Mundy and
A Fast Method for Detecting Moving Vehicles
1165
Zisserman [8]. There are two convenient invariants that can be defined for groups of five points. Four points (no three collinear) form a projective basis for the plane and the invariants correspond to the two degrees of freedom of the projective position of the fifth point with respect to the first four - there exist positions that invariants do not change their values in some directions. Fig. 1 shows the five-point sets on a plane under arbitrary projective transformation. The invariant value defined by five points on the plane is not changed under a projective transformation.
Fig. 1. Five point invariance on a plane
The two invariants may conveniently be written as the ratios of determinants of matrices of the form M ijk , which denotes area of a triangle consisting of three image points. Then the two invariants are given by [8-9]:
I1 =
M 124 M 135
I2 =
M 241 M 235
(1)
M 134 M 125 (2)
M 234 M 215
where M ijk = ( x i , x j , x k ) and x i is position ( xi , yi ) of an image point. These two quantities may be seen to be preserved under a projective transformation if x′ is substituted for x , ′ M 135 ′ M 124 ′ M 125 ′ M 134
=
λ1 Ρx 1 , λ 2 Ρx 2 , λ 4 Ρx 4 λ1 Ρx 1 , λ3 Ρx 3 , λ5 Ρx 5 λ1 Ρx 1 , λ3 Ρx 3 , λ 4 Ρx 4 λ1 Ρx 1 , λ 2 Ρx 2 , λ5 Ρx 5
(3)
which gives, ′ M 135 ′ M 124 λ12 λ 2 λ3 λ 4 λ5 Ρ M 124 M 135 . = 2 ′ M 125 ′ M 134 λ12 λ 2 λ3 λ 4 λ5 Ρ M 134 M 125 2
where
P is the projectivity matrix and λi is scaling factor.
(4)
1166
D.-J. Kang, J.-E. Ha, and T.-J. Lho
2.2 Extract Point Groups from Motion Data Point features corresponding to high curvature points are extracted from image before motion [11-13]. A salient image feature should be consistently extracted for different views of object and there should be enough information in the neighborhood of the feature points so that corresponding points can be automatically matched. A popular method for corner detection is Harris detector [11] using convolution operation, in which the method obtains the high curvature matrices related to the convolution for partial derivatives of image data. In notations of Harris corner detector, the g in eq. (6) denotes gray scale image and w is the Gaussian smoothing operator, k is an experimental parameter. And g x and g y indicates the x and y directional derivative for the grey image, respectively. Corners are defined as local maxima of the corner response function R :
R ( x, y ) = det[C] − k ⋅ trace2 [C]
(5)
⎡ g x2 g x g y ⎤ C = w⋅⎢ ⎥. 2 ⎢⎣ g x g y g y ⎥⎦
(6)
where C is
Given the high curvature points, we can use a correlation window of small size
(2r + 1) × (2c + 1) centered at these points. A rectangle search area of size
(2d x + 1) × (2d y + 1) is defined around the points in the next image, and the intensity-normalized cross correlation (NCC) [14] on a given window is performed between a point in the first image and pixels within the search area in the second image. This operation provides the matched motion vectors.
plane Fig. 2. Selection of four points defining invariant values around a center point
n motion data is needed to define invariant values. N = n C5 independent ways of choosing five points from n . Therefore,
Selection of five points from There are
it is impractical to test all possible combinations of five points in an image.
A Fast Method for Detecting Moving Vehicles
1167
Instead of, groups of five points are selected as four nearest neighbors outside a small circular neighborhood of the fifth point and inside of a larger circle, as shown in Fig. 2. This gives only small groups of points to be tested and reduction of processing time is possible. In the circular band region, four points are selected from random combination of the points in the band. Among the combination, the candidates that give three collinear points or have the points that close to each other are rejected. 2.3 A Probabilistic Criterion to Decide Point Groups on a Plane Five-point sets are randomly selected in the circular band to define projective invariant values. A set before motion defines a model invariant value and the corresponding set after motion defines the corresponding invariant value. If this pair exists on the same plane or on an object moving under weakly perspective projection assuming far distance from the observer, the difference of two invariant values will be small. This constraint makes the plane segmentation possible. Pairs of points giving a similar invariant value between after and before motion are considered as side planes of independently moving objects. We introduce a threshold to test the similarity of two invariant values.
I′i − I i < Thresi
(7)
For selecting thresholds, a probabilistic criterion is introduced by a measurement uncertainty associated with the estimated position of corner features [9-10]. The invariant is a function of five points:
I = I ( x1 , x 2 , x 3 , x 4 , x 5 ) . Let
(8)
x i be the true and ~ xi be the noisy observation of ( xi , y i ) , then we have ~ x = x +ξ ~yi = yi + ηi i i i
(9a) (9b)
where the noise terms ξ i and η i denote independently distributed noise terms hav2 ing mean 0 and variance σ i . From these noisy measurements, we define the noisy invariant,
~~ ~ ~ ~ ~ I ( x1 , x 2 , x3 , x 4 , x5 ) To determine the expected value and variance of ries at (x1 , x 2 , x 3 , x 4 , x 5 ) :
(10)
~ ~ I , we expand I as a Taylor se-
~ ~ ∂I ∂I ⎤ ~ 5 ⎡ ~ 5 ⎡ ~ ~ I ≈ I + ∑ ⎢( xi − xi ) ~ + ( y i − y i ) ~ ⎥ = I + ∑ ⎢ξ i ∂xi ∂yi ⎦ i =1 ⎣ i =1 ⎣
~ ∂I + ηi ∂~ xi
~ ∂I ⎤ ⎥ ∂~yi ⎦
(11)
1168
D.-J. Kang, J.-E. Ha, and T.-J. Lho
Then, the variance becomes 2 ~ 2 5 ⎡⎛ ~ ⎞ ⎛ ∂I ⎞ ⎤ ∂I ~ 2 2 E ( I − I ) = σ 0 ∑ ⎢⎜⎜ ~ ⎟⎟ + ⎜⎜ ~ ⎟⎟ ⎥ . i =1 ⎢⎝ ∂xi ⎠ ⎝ ∂yi ⎠ ⎥⎦ ⎣
[
]
Hence, for a given invariant
I , we can determine a threshold:
~ ΔI = 3 ⋅ E[( I − I ) 2 ] . The partial derivative
(12)
(13)
∂I1 / ∂x1 , for example, is given by
⎡ y − y4 y3 − y5 y3 − y4 y2 − y5 ⎤ ∂I 1 = I1 ⋅ ⎢ 2 + + + ⎥. M 135 M 134 M 125 ⎥⎦ ∂x1 ⎢⎣ M 124
(14)
Because there are two different invariant values, we have to define two threshold values ΔI 1 and ΔI 2 . If there are point groups smaller than two threshold values defined in the circular band region, the local region by the point set is selected as plane candidate on a moving vehicle.
3 Experiments Experiments show overtaking close-by vehicles are well detected. Fig. 3 presents a detection example for real road scene. Corners on the rear and side part of a vehicle are extracted by Harris corner algorithm as shown in Fig. 3(a). We set r1 = 3 and r2 = 15 pixels to define the circular band of Fig. 2 for calculation of the geometric invariant values. Image size and processing region is 256x256 and 232x150 pixel2, respectively. Fig. 3(b) shows motion vectors from the normalized cross correlation employed to Harris corners of Fig. 2(a) between two sequential frames. The motion vectors in Fig. 3(b) are magnified to 3 times for good visualization. The correlation is performed with small size window of 7x7 pixel2( r = c = 3 ) for search regions of 11x11 pixel2 ( d x = d y = 5 ) for the image of next frame. Points giving a small difference of invariant value between two frames are showed as the small white boxes in Fig. 3(c), and Fig. 3(d) shows each MBR from five point sets. Nine regions on side planes are found after no motion MBRs are rejected if exists. Position variance σ o in eq. (12) sets to 0.2. As the center points of detected five point sets are appeared on side of vehicle, the side of a vehicle is recognized as an approximate planar object with the invariant values preserved during moving of the vehicle. The range of the automatic threshold value ΔI 1 obtained during processing is between about 0.06~2.0 in case of Fig. 4(d). Fig. 4 shows rear and side parts of a vehicle are separated as two different planes for sequential image frames. The two planes are consistently detected during 8 frames
A Fast Method for Detecting Moving Vehicles
(a)
1169
(b)
(c)
(d)
Fig. 3. Extraction of side planes on a overtaking vehicle. (a) Harris corners; (b) Motion vectors by NCC; (c) Center corners included in invariant planes; (d) Detection of vehicle planes.
of sequential images. Small white rectangles in Fig. 4 present detection of rear and side planes of a moving vehicle. The regions with smaller motion than 3 pixels in average of five points in the white region are rejected to prevent noisy extraction of far distance vehicles and background regions. Different parameter of Harris detector as change of the threshold value presents different level of corner detection. Table 1 shows the resulting computing time for two different corner detection images. The processing image region is 232x150(=wxh) pixels2 and we use Pentium-IV 2.0Ghz processor under Windows-XP environment. We do not use any optimization procedures such as Intel SIMD, MMX technologies. The total elapsed time for two cases is 39.1 and 40.1 msec, respectively. Table 1. Computing time for each module of the proposed algorithm
SUSAN detector Motion calculation Invariant plane detection Total elapsed time
Case A (corner #: 202) 31 msec 5.1 msec 3 msec 39.1 msec
Case B (corner #: 255) 31 msec 6.3 msec 3.6 msec 40.9 msec
1170
D.-J. Kang, J.-E. Ha, and T.-J. Lho
(a)
(b)
(c)
(d)
(e)
(f)
(g)
(h)
Fig. 4. Side plane detection of close-by overtaking vehicle on highway for 8 sequential frames during about 0.2 sec
4 Conclusions We present a geometric invariant and motion based method for vision-based vehicle detection. The method uses the projective invariant value from the motion information
A Fast Method for Detecting Moving Vehicles
1171
obtained on corner points of sequential frames. The proposed method assumes the 3-D moving vehicle into a set of planar surfaces that come from rear and side surface parts of a vehicle. The method is more exact in the plane segmentation than the clustering methods using similarity of motion vectors because side or rear parts of a vehicle could be independently extracted under strong plane constraint of projective invariance. The five point invariant values can prove detection of real planes on a targeted vehicle. For consecutive images, a corner extraction algorithm detects prominent corner features in the first image. Correspondence information of detected corner points is then found by the normalized correlation method for next sequential image. Based on the sparsely obtained motion data, the segmentation algorithm selects points on a plane that maintain consistent projective invariants between frames. These points set can form initial clusters to segment the plane and other post-procedures might be applied to merge the neighboring points having a similar motion. Each processing module is very fast and adequate to real-time processing for fast moving highway vehicles even though many processing units including corner detector, NCC matching, random sampling-based five point selector, and probabilistic invariant plane extraction are introduced. Through the experiments of real road scenes, the proposed method presents a plane-by-plane segmentation of fast moving and overtaking vehicles is possible.
References 1. Sun, Z., Bebis G., Miller R.: On-road vehicle detection using optical sensors: A review, IEEE Intelligent Transportation Systems Conference, Washington D.C., (2004) 585-590 2. A. Polk and R. Jain, I.Masaki: Vision-based Vehicle Guidance, Springer-Verlag, (1992) 3. Kang, D.J., Jeong M.H.: Road lane segmentation using dynamic programming for active safety vehicles, Pattern Recognition Letters, Vol.24 (2003) 3177-3185 4. Smith, S.M.: ASSET-2 Real-Time Motion Segmentation and Object Tracking, Int. Conf. on Computer Vision, (1995) 237-244 5. Kuehnle, A.: Symmetry-based recognition for vehicle rears, Pattern Recognition Letters, Vol.12 (1991) 249-258 6. Handmann, U., Kalinke, T., Tzomakas, C., Werner, M., Seelen, W.: An image processing system for driver assistance, Image and Vision Computing, Vol. 18 (2000) 367-376 7. Betke, M., Haritaglu, E., Davis, L.: Multiple vehicle detection and tracking in hard real time, IEEE Intelligent Vehicles Symposium, (1996) 351-356 8. Joseph, L., Mundy, Zisserman, A.: Geometric Invariance in Computer Vision, MIT Press, (1992) 9. Roh, K.S., Kweon, I.S.: 2-D Object Recognition Using Invariant Contour Descriptor and Projective Refinement, Vol. 31, (1988) 441-455 10. Sinclair, D., Blake A.: Quantitative Planar Region Detection, Int. Journal of Computer Vision, Vol.18, (1996) 77-91 11. Harris, C., Stephens, M.: A combined corner and edge detector, The Fourth Alvey Vision Conference, (1988) 147-151 12. Press, W., Teukolsky, S.A., Vetterling, W.T., and Flannery, B.P.: Numerical Recipes in C, Cambridge University Express, (1992) 13. Smith, S.M., and Brady, J.M.: SUSAN - a new approach to low level image processing, Int. Journal of Computer Vision, Vol.23(1) (1997) 45-78 14. Gang, X., Zhang, Z.: Epipolar geometry in Stereo, Motion and Object Recognition, Kluwer Academic Publishers, (1996) 223-237
Robust Fault Matched Optical Flow Detection Using 2D Histogram Jaechoon Chon1 and Hyongsuk Kim2 1
Center for Spatial Information Science at the University of Tokyo, Cw-503 4-6-1 Komaba, Meguro-ku, Tokyo [email protected] 2 Chonbuk National University, Korea [email protected]
Abstract. This paper propose an algorithm by which to achieve robust outlier detection without fitting camera models. This algorithm is applicable for cases in which the outlier rate is over 85%. If the outlier rate of optical flows is over 45%, then discarding outliers with conventional algorithms in real-time applications is very difficult. The proposed algorithm overcomes conventional difficulties by using a three-step algorithm: 1) construct a two-dimensional histogram with two axes having the lengths and directions of the optical flows; 2) sort the number of optical flows in each bin of the two-dimensional histogram in descending order, and remove bins having a lower number of optical flows than the given threshold: 3) increase the resolution of the two-dimensional histogram if the number of optical flows grouped in a specific bin is over 20%, and decrease the resolution if the number of optical flows is less than 10%. This process is repeated until the number of optical flows falls into a range of 10%-20%. The proposed algorithm works well on different kinds of images having many outliers. Experimental results are reported.
1 Introduction Using an image sequence to generate graphical information, such as image mosaics, stereoscopic imagery, 3D data, and moving object tracking, requires the use of optical flows. Presently, active studies on optical flow detection algorithms can be categorized into three different approaches: gradient-based [1], frequency-based [2], and feature-based [3]. In the gradient-based approach, the optical flow of each pixel is computed using a spatial and temporal gradient under the assumption of constant intensity in local areas [4]. In the frequency-based approach, the velocity of each pixel is computed by detecting the trajectory slope with velocity-tuned band-pass filters, such as Gabor filters [2]. The optical flows obtained from the plain area of an image are often incorrect and are not suitable for practical or real-time applications. Fortunately, to produce image mosaicking, epipolar geometry, and camera motion estimations, determining the optical flow at every point is unnecessary; in fact, ascertaining a small number of correct optical flows is more useful, and the feature-based approach is effective for making such determinations. Generally, feature points are extracted by algorithms that detect
M. Gavrilova et al. (Eds.): ICCSA 2006, LNCS 3982, pp. 1172 – 1179, 2006. © Springer-Verlag Berlin Heidelberg 2006
Robust Fault Matched Optical Flow Detection Using 2D Histogram
1173
corner points. Current corner-point-detection algorithms are referred to as Plessey [5], Kitchen/Rosenfeld [6], Curvature Scale Space [7], and Smallest Univalue Segment Assimilating Nucleus (SUSAN) [8]. The SUSAN algorithm used in this paper is an efficient feature-point-detection algorithm, utilizing the strong Gaussian operation of the intensity difference in the spatial domain. The feature-based approach requires feature point matching. In the common approach to match feature points as a pair, the first step is to take a small region of pixels surrounding the feature point to be matched between two frames. Then, compare the matched feature point with a similar window around each of the potential matching feature points in the other image. Each comparison yields a score related to the measure of similarity between the feature points by examination of the highest matching rate representing the best match. The measure of similarity can be evaluated via several ways, such as Normalized Cross-Correlation (NCC), Sum of Squared Difference (SSD), and Sum of Absolute Difference (SAD). The NCC used in this paper is the standard statistical method to determine similarity. After a process of feature point matching or optical flow detection, robust regression algorithms are needed to discard incorrect optical flows; such incorrect optical flows result from many sources, such as image blur and changes caused by camera vibration, stereo motion, or zoom motions. Let correct and incorrect optical flows be inlier and outlier, respectively. The representative algorithms of robustly discarding outliers are E-estimators [9], Least Median of Square (LMedS) [9],[10], Random Sample Consensus (RANSAC) [11], and Tensor Voting [12]. The RANSAC algorithm is generally used to discard the outliers and to select the eight best-detected optical flows for robust fitting epipolar geometry [13]. In contrast to RANSAC, Tensor Voting can discard outliers without fitting camera models. The input of optical flows is first transformed into a sparse 8D point set. Dense, 8D tensor kernels are then used to vote for the most salient hyper plane that captures all inliers inherent in the input. With this filtered optical flows, the normalized eight-point algorithm can be used to estimate the fundamental matrix accurately. Meanwhile, probability of choosing inliers with well fitting models related with camera motion is so higher than that by E-estimator and LMedS, Tensor Voting and RANSAC are away from real-time application. In contrast to Tensor Voting and RANSAC, the E-estimator and LMedS can be applied to real-time computation. If the outlier rate among the detected optical flows is over 45%, discarding the outliers with LMedS is very difficult. For the Eestimator, it is impossible to apply optical flows that include over 35% outliers. To overcome these limitations, we propose a robust outlier detection algorithm without fitting camera models for real-time applications and for cases in which the outlier rate is over 85%. When detecting optical flows between two input images, inliers in local areas have similarities, such as the similar directions and lengths of optical flows. In contrast, outliers have no similarities. Even if there are keeping the similarity, the number of outliers with the similarity is very few. Using basic concept of the different point, we propose an algorithm for grouping similar optical flows in a two-dimensional histogram that has two axes for the lengths and directions of optical flows and for removing some bins with lower numbers of optical flows. The proposed algorithm of detecting inliers in the two-dimensional histogram comprises the following three-steps: 1) construct a two-dimensional histogram having two axes for the lengths and directions of optical flows: 2) sort the number of optical flows in each bin
1174
J. Chon and H. Kim
of the two-dimensional histogram by descending order, and remove some bins with lower numbers of optical flows than the threshold; 3) increase the resolution of the two-dimensional histogram if the number of optical flows grouped in a specific bin is over 20%, and decrease the resolution if the number of optical flows is less than 10%. This process is repeated until the number of optical flows falls into the range of 10%20%. After completing the proposed three-steps, we can directly apply the chosen inliers to a computation of epipolar geometry by RANSAC to save computation costs, image mosaicking, moving objects tracking under a moving camera, and so on.
2 Outlier Detection Using Two-Dimensional Histogram Optical flows, such as the vectors shown in Fig. 1(a), may be detected via processes of feature points extraction and matching without supporting any image motion information. The detected optical flows are composed of direction, θ , and length, l, as shown in Fig. 1(b). When sorting the number of optical flows into a two-dimensional histogram with two axes for the directions and lengths of the optical flows, the result can be depicted as shown in Fig. 1(c).
θ l
(a)
(b)
(c)
Fig. 1. Optical flows and their two-dimensional histogram. (a) optical flows, (b) the magnitude and directional elements of an optical flow, (c) two-dimensional histogram
The proposed algorithm for detecting inliers on the created two-dimensional histogram is composed of three steps shown in the flow chart in Fig. 2. Step 1 is to construct a two-dimensional histogram with two axes for the lengths and directions of the optical flows. Step 2 is to sort the number of optical flows grouped in each bin of the two-dimensional histogram in descending order and to remove some bins having lower numbers of optical flows than the threshold. Step 3 is to increase the resolution of the two-dimensional histogram if the number of optical flows in a specific bin is over 20% and to decrease the resolution if the number of optical flows is less than 10%. This process is repeated until the number of optical flows falls into the range of 10%-20%. When the percentage of optical flows that are inliers over is 40%, we can obtain good results, even if setting the upper limit of the range as a higher value. If the percentage is lower than 30%, then good results are not guaranteed. Therefore, to successfully apply the proposed algorithm to any kind of input images, the upper limit is
Robust Fault Matched Optical Flow Detection Using 2D Histogram
1175
set as 20%. If the lower limit of the range is set as a so lower value and optical flows are detected on images taken from a camera with optical axis motion, then most optical flows will be discarded as outliers, because the number of locally grouped optical flows is too low. Consequently, to detect a stable number of optical flows from these images, the lower limit is set as 10%. If the total number of optical flows grouped in all bins is over 20% of all initial optical flows and, simultaneously, that grouped in a specific bin with the highest number of optical flows is over 35% of the optical flows grouped in all bins, then the input images are taken only from a camera with translational motion, except for zoom motions. Of course, when the distance between objects and a panning and titling camera is far, the phenomenon will be generated. In these cases, only optical flows grouped in the specific bin are chosen as inliers.
pG
p
k YkGo vGYWL vGZ\L O G GGP
l
u u
p
u ~GYWL OG G GP
Fig. 2. Flow chart of the proposed algorithm
3 Experimental Results LMedS is a well-known algorithm for detecting outliers with fitting given models as real-time computation. However, for cases in which the outlier rate is over 45%, discarding outliers with LMedS is very difficult. For images taken with a camera having
1176
J. Chon and H. Kim
(a)
(b)
(c)
(d)
(e) Fig. 10. Optical flow detection on the images taken while the camera rotates and zooms-in; (a) reference image frame. (b) subsequent image frame, (c) optical flows obtained with correlationbased matching, (d) selected optical flows using the proposed method, (e) optical flows detected by affine transformation with the locally grouped optical flows.
a zoom motions and itself optical axis motions or a camera with big translating motion, images capturing many similar textures or moving objects, and so on, the outlier rate will be increased over 60%. To demonstrate that the proposed method can robustly detect outliers, this paper performed experiments using a moving camera, as
Robust Fault Matched Optical Flow Detection Using 2D Histogram
1177
shown in Fig. 10. Using a SUSAN operator, we extracted over one hundred feature points from each respective input image. The mask size of correlation for matching the extracted feature points was 11 pixels. When correlation scores between the feature points extracted from two images are over 0.8, the pairs are set initial optical flows. Two images, as shown in Figures 10(a) and (b) were taken with a camera with zoom-in motion and itself optical axis motions. As can be seen, the detected optical flows are very complex and irregular. The results of applying the proposed method to the detected optical flows shown in Figs. 10(c) are shown in Figs. 10(d). The figures clearly show that locally similar optical flows are grouped. Optical flows shown in Figs. 10(e) were detected by affine transformation with the locally grouped optical flows. Table 1. Robustness comparison between the proposed and the LMedS algorithms
Outlier / inlier 0.6 0.8 1.0 1.2 1.4 2.0 2.8 3.6 4.0
Inlier1 LMedS Proposed LMedS Proposed LMedS Proposed LMedS Proposed LMedS Proposed LMedS Proposed LMedS Proposed LMedS Proposed LMedS Proposed
31 52 18 51 14 55 13 51 × 53 × 54 × 50 × 45 × 38
Outlier1 2 2 3 1 8 2 10 1 × 3 × 5 × 20 × 29 × 41
Inlier1 /(Outlier1+Inlier1) 0.92 0.96 0.82 0.97 0.64 0.96 0.56 0.971 × 0.93 × 0.90 × 0.714 × 0.606 × 0.48
Estimation Good Good Good Good Middle Good Bad Good × Good × Good × Good × Middle × Bad
Table 1 shows a comparison between the proposed method and the LMedS estimator applying optical flows mixed the initially detected, as shown in Fig. 10(c), and randomly generated optical flows. Even when generating the same number of optical flows randomly, according to position of the generated optical flows, the results are only slightly changed. To compare results obtained with the proposed method and with the LMedS estimator, while avoiding this influence, we used the average result of one hundred experiments. Table 1 shows that the proposed method can robustly detect inliers, even if there are four times as many outliers as inliers. In contrast, the LMedS estimator could not detect inliers, even though the times is less than 1.0.
1178
J. Chon and H. Kim
5 Conclusions We have proposed an algorithm that can robustly discard outliers using twodimensional histogram with variable resolution, even if outlier rate in image matching is over 85%. The adopted two-dimensional histogram is used to group similar optical flows. By changing the resolution of the two-dimensional histogram, the proposed method limits the amount of grouped optical flows as a percentage of all optical flows to within the range of 10%-20%. Our experiments demonstrated that the proposed method can robustly detect inliers among optical flows, even among a high numbers of outliers, and it can be applied to moving objects tracked by a moving camera. We expect that computation costs for RANSAC will be noticeably lower when using inliers detected by the proposed method for epipolar geometry. In addition, the proposed method can be applied to a moving robot equipped with CCD camera for real-time detection of moving objects.
Acknowledgement This research was supported by the MIC(Ministry of Information and Communication), Korea, under the ITRC(Information Technology Research Center) support program supervised by the IITA(Institute of Information Technology Assessment), (IITA-2005-C1090-0502-0023)
References 1. J. K. Kearney, W. B. Thompson, and D. L. Boley, “Optical flow estimation: An error analysis of Gradient-based methods with local optimization”, IEEE Tr. on PAMI, Vol. 9, No. 2, pp. 229-244, 1987. 2. E. H. Adelson and J. R. Bergen, “Spatiotemporal energy models for the perception of motion”, Journal of Optical Society of America, Vol. 2, No. 2, pp. 284-299, 1985. 3. S. M. Smith and J. M. Brady, “Real-Time Motion Segmentation and Shape Tracking”, IEEE Tr. on PAMI, Vol. 17, No. 8, 1995. 4. B. K. P. Horn and B. G. Schunck, “Determining Optical Flow ”, Artificial Intelligence 1981, pp. 185-203. 5. G. Harris, “Determination of Ego-Motion From Matched Points”, Proc. Alvey Vision Conf., pp. 189-192, Cambridge UK, 1987. 6. L. Kitchen and A. Rosenfeld, “Gray Level Corner Detection”, Pattern Recognition Letters, pp. 95-102, 1982. 7. Farzin Mokhatarian and Riku Suomela, “Robust Image Corner Detection Through Curvature Scale Space”, IEEE Tr. on PAMI, Vol. 12, 1998. 8. S. M. Smith and J. M. Brady, “SUSAN - a new approach to low level image processing”, In IJCV, Vol. 23, No. 1, pp. 45-78, 1997. 9. R. M. Haralick, H. Joo, C. Lee, X. Zhuang, V. Vaidya, and M. Kim, “Pose estimation from corresponding point data”, IEEE Tr. on SMC, Vol. 19, No. 6, pp. 1426-1446, Nov. 1989. 10. P. J. Rousseeuw, “Least median of squares regression”, Journal of American Statistics Association, Vol. 79, pp. 871-880, 1984.
Robust Fault Matched Optical Flow Detection Using 2D Histogram
1179
11. M.A. Fischler and R.C. Bolles, “Random Sample Consensus: A Paradigm for Model Fitting with Applications to Image Analysis and Automated Cartography”, Comm. Of the ACM, Vol. 24, pp. 381-395, 1981. 12. C.K. Tang, G. Medioni, and M.S. Lee, “N-Dimensional Tensor Voting and Application to Epipolar Geometry Estimation”, IEEE Tr. on PAMI, Vol. 23, No. 8, 2001. 13. R. Hartly, “In defense of the eight-points algorithm”, IEEE Tr. on PAMI, Vol. 19, No. 6, pp. 580-593, 1997.
Iris Recognition: Localization, Segmentation and Feature Extraction Based on Gabor Transform Mohammadreza Noruzi1, Mansour Vafadoost1, and M. Shahram Moin
2
1
Biomedical Engineering Faculty, Amirkabir University of Technology, Tehran, Iran {noruzi, vmansur}@cic.aut.ac.ir 2 Multimedia Dept., IT Research Faculty, Iran Telecom. Research Center, Tehran, Iran [email protected]
Abstract. Iris recognition is one of the best methods in the biometric field. It includes two main processes: “Iris localization and segmentation” and “Feature extraction and coding”. We have introduced a new method based on Gabor transform for localization and segmentation of iris in eye image and also have used it to implement an Iris Recognition system. By applying the Gabor transform to an eye image, some constant templates are extracted related to the borders of pupil and iris. These features are robust and almost easy to use. There is no restriction and no tuning parameter in algorithm. The algorithm is extremely robust to the eyelids and eyelashes occlusions. To evaluate the segmentation method, we have also developed a gradient based method. The results of experimentations show that our proposed algorithm works better than the gradient based algorithm. The results of our recognition system are also noticeable. The low FRR and FAR values justify the results of segmentation method. We have also applied different Gabor Wavelet filters for feature extraction. The observations show that the threshold used to discriminate feature vectors is highly dependant on the orientation, scale and parameters of the corresponding Gabor Wavelet Transform.
1 Introduction Biometric features are unique and dependant on personal genetic. Iris is the most attractive feature in human authentication field and also is the most accurate after the DNA. The authentication by iris can even distinguish between twins since the texture of iris is affected by the environment that the person has grown in it. The error rate of authentication by iris is about one per million. This low error rate makes it very suitable for highly secure access control applications [1, 2]. Iris recognition includes two steps: at first step the iris area must be extracted and then the feature vector must be constructed from the texture of segmented iris. The judgment is done by comparing the feature vectors and extracting the similarity between them. The main work in this area is done by Daugman. He developed a system and introduced a method for each of the above mentioned steps. He used gradient of image for iris segmentation and Gabor Wavelet for feature extraction and introduced a constant threshold for discriminating the feature vectors [2]. M. Gavrilova et al. (Eds.): ICCSA 2006, LNCS 3982, pp. 1180 – 1189, 2006. © Springer-Verlag Berlin Heidelberg 2006
Iris Recognition: Localization, Segmentation and Feature Extraction
1181
In this paper we introduce a new method for iris segmentation with two characteristics: first, it can find the position of iris automatically and second, it uses Gabor transform as a more robust feature for finding the borders of iris. There is no restriction on the position of eye and no tuning parameter in algorithm. An Iris recognition system has also been implemented. Gabor wavelet filters have been used for feature extraction. The similarities between feature vectors have been analyzed and the results show that by changing the parameters of the Gabor filter, the discriminating threshold must be changed and tuned. In other words, it is possible to obtain similar results using different filters with different thresholds. The remaining of this paper is organized as follows. In section 2 we will present a brief introduction to Gabor transforms. The proposed algorithm will be explained in section 3. It is divided into segmentation and feature extraction processes. The segmentation algorithm has 3 steps: Finding the position of pupil, Exaction of the interior border of iris, Exaction of the exterior border of iris. There is a novel idea in each step. Section 4 is dedicated to feature extraction and comparison. To evaluate the proposed method, we have developed a gradient based segmentation method. The results of experimentations are presented in section 5.
2 Gabor Transform Gabor transform is a Time-Frequency transform. In two dimensions it is defined as follows [2]: G ( x, y )
=e
−π [( x − x0 ) 2 / α 2 + ( y − y0 ) 2 / β 2 ]
.e
−2π i [ u0 ( x − x0 ) + v0 ( y − y0 )]
Where G ( x, y ) depicts a filter with center ( x0 , y0 ), i = −1
(1)
is a constant,
ω0 = u0 + v0 is modulation frequency, θ 0 = arctan(v0 / u0 ) is angle of modulation and α , β are the effective lengths of filter in x and y axes respectively. 2
2
It can be shown that the Gabor transform is a Short Time Fourier Transform with a Gaussian window and a modulation in frequency domain, therefore some of its characteristics are similar to its parent. By applying the Gabor transform to an image, the frequency contents of windowed image in direction θ 0 will be extracted. There is a reverse relation between the time resolution and frequency resolution, the wider windows in time domain will extract more accurate frequency contents, but the position of these contents will be more ambiguous [3]. The suitable amounts for α , β are depend on the resolution of input image and its texture. For a given filter if the input image has higher resolution, then thinner borders will have greater affects on the filter output. This transform not only can extract the borders but also can illuminate the expansion angles of these borders. This interesting property of Gabor filter has made it a powerful tool for feature extraction [2, 3]. This filter has real and imaginary parts with even and odd symmetry, respectively. The even part has nonzero mean values in general. We have found that removing this mean value improves the results. This process changes the real part of Gabor filter to
1182
M. Noruzi, M. Vafadoost, and M.S. Moin
a bandpass filter. As a result, the background illumination of image will have negligible effect on filter output. This filter is different from the Stretched-Gabor filter presented by Heitger et al. [4]. By applying a Gabor filter in different orientations and resolutions the Gabor Wavelet transform is formed. This wavelet is generated from a fixed Gabor elementary function by dilation, translation and rotation [5].
3 Iris Localization and Segmentation In this section we propose a new for localization and segmentation the iris. The proposed method resembles to the gradient method in some extent, but it using Gabor coefficients instead of gradient vector. Therefore the well known gradient method will be explained at first and then the proposed method will be explained. We have also implanted a gradient based method and have compared its results with results of the proposed method. The segmentation of iris is based on two assumptions: iris border is circular and iris and pupil are concentric. The methods developed by other researchers mostly have a pre-assumption about the position of pupil in input image. They mostly assume that the position of the center of pupil is approximately known. For examples they suppose that the eye image is located in center of image frame, thus the pupil center is close to the input image center. If the above assumption is not satisfied an extra pre-processing must be applied to the input image, for estimating the center of pupil. After approximating the center of pupil, some points around this center are chosen as candidates for exact pupil center location and then, a feature will be computed for every candidate point. The winner is the candidate with the highest feature. This feature is computed as follows: it is obvious that the border between the pupil and iris is circular and the highest contrast between two areas belongs to their border. To find this border, some circles with different radius will be drawn from each candidate point and the values of mage gradients on circle’s perimeter are summed. The circle that is exactly fitted over the pupil border reaches the maximum value. The same method is repeated to find the exterior border of iris. This technique has been used in many researches [6]-[10]. Although the basics of our method are similar to those of the gradient method, but it does not use gradient or any derivation operator. It is completely based on Gabor transform. Our approach has 3 steps: estimation of the pupil center, extraction of the interior border of iris and extraction of the exterior border of iris. As mentioned before, we have implemented another method based on gradient to evaluate the results of Gabor based method. Our gradient based method is slightly different from Daugman’s method. In Daugman’s method the radial gradient is used, but our gradient based method is based on horizontal gradient, i.e. it searches for vertical edges. It is obvious that our gradient based algorithm is much faster than Daugman’s method, because it calculates the gradient of image once, in contrary to the Daugman’s method which calculates the radial gradient for each circle separately.
Iris Recognition: Localization, Segmentation and Feature Extraction
1183
3.1 Estimation of the Pupil Center The position of pupil is estimated with good approximation by using the phase of Gabor coefficients of filtered image. There is no essential restriction on input image and no need for pre-filtering. By applying a suitable Gabor filter to an image in horizontal direction, the pupil will produce a constant template including 2 half circles (Fig. 1). The results have shown that the effective window length must be greater than the half size of pupil and shorter than the size of pupil.
Fig. 1. The image of eye and the phase of its Gabor transform
A simple constant template has been used in our method to estimate the position of pupil. This template is driven directly from one of the images in the database and its size is equal to the size of the iris in source file. This template is shown in Fig. 2.
Fig. 2. Template used for estimating the pupil position
The estimation is done by applying the Normalized Cross Correlation [11] between the template and the phase of Gabor coefficients of transformed image. It is noticeable that the algorithm has low sensitivity to the size of template. The size of template must be near the size of pupil. The practical results have shown that 50 percent change in the size of template will not affect on results. In other words, the algorithm has low sensitivity to the changes in the size of pupil, thus it is possible to find the position of pupils by using a constant template and a constant Gabor filter. We have used a constant Gabor filter and a constant template for all of the images in database. To evaluate the proposed method for estimation of the pupil position, we have also implemented an alternative method. The pupil is like a dark circle in image. Therefore we define a new template like a dark circle and compute the cross correlation between the new template and the raw image. This method can also estimate the position of
1184
M. Noruzi, M. Vafadoost, and M.S. Moin
pupil but the experimental results have shown that this new method is not as good as the Gabor based method. The mean error of pupil center estimation in gradient based method is twice the mean error of Gabor based method. After the estimation of pupil center the interior and exterior borders of iris must be extracted. The next two sections are dedicated to this problem. 3.2 Extraction of Interior Border of Iris and Its Center Suppose that the coordinates of the center of pupil are estimated. Obviously, the exact location of the center of the circle that is fitted on interior border of iris is close to the estimated center. Thus, some points around the estimated center are chosen as candidates for real location of center. Then, related to each candidate point, a series of circles with different radius are drawn in transformed image. In the next step, the phase values of filtered image are computed on circumference of each circle and are summed over 360 degrees. As it’s shown in Fig. 1, the left and right half circle of interior border of iris have opposite phase values. Therefore the sign of these values must be considered in the summation. Finally the accumulated value is divided to the radius of related circle. This normalized feature is the criterion for selecting the best fitted circle: the circle with highest radius and highest feature value is chosen. 3.3 Extraction of Exterior Border of Iris and Its Center The exterior boundary of iris is determined in a similar way which is used for interior boundary. The upper and lower areas of iris are sometimes covered by eyelids and eyelashes. To solve this problem, the accumulated phase value feature is not computed in 360 degrees. As shown in Fig. 3, the eye image is divided into four 90degrees areas and features are computed on left and right quarters. As the interior and exterior borders of iris are not concentric [2], some points around the exact center of interior border are chosen as candidate for exact center of exterior border. Then, like the previous section, some circles with variable radius are drawn in transformed image and the feature values are computed for every circle. As the radius of exterior border is greater than the radius of interior border and also for preventing possible mistakes, the radius of each circle must be greater than the radius of interior circle which is computed previously. The rest of algorithm is like the one used for interior border extraction.
Fig. 3. Because of eyelids and eyelashes, the exterior border computations are done in left and right quarters
Iris Recognition: Localization, Segmentation and Feature Extraction
1185
4 Feature Vector Extraction and Comparison As mentioned before, the upper and lower parts of iris are usually occluded by eyelashes and eyelids. Therefore, the feature vector is extracted from the lateral part of iris image [2]. We divide the iris image to the 4 equal quarters as shown in Fig. 3. The left and right quarters are then mapped to a rectangular image so called dimensionless image (Fig. 4). The size of the mapped images is 256x64 and is identical for all of the images in database. The details of the mapping method can be found in [2].
Fig. 4. An example of mapping the left and right parts of a iris image to a rectangular image
The reason of this mapping is that the most of fast and conventional techniques for feature extraction are implemented in Cartesian coordinate system. The feature extraction is done by applying the Gabor transform in different resolutions and orientations. It has been shown that the most amounts of information are in the phase of Gabor transform [5]. This characteristic has been used in iris recognition technique [2]. After applying the Gabor filters to the rectangular mapped image, the amplitude of output is put aside and just the sign of real part and imaginary part of the output is used as discriminating features. The final feature vector is a series of 1 and 0 bits. This reduces the complexity of feature vector and accelerates the comparison procedure. Although some of information are eliminated, but the remained information are enough for constructing a reliable system. For comparing the feature vectors the simplest way is hamming distance which is done by applying Exclusive-OR operator on each pair of feature vectors [2]. If the distance between two feature vectors is lower than a threshold value then they belong to one person.
5 Experimental Results We have tested the system using different Gabor Wavelet filters and obtained the following results: (1)- Ma in [12] has reported that the most information is in rotational direction in iris image and implemented a system based on this characteristic. Our results showed that this characteristic is very powerful but one direction is not enough for a system that is based on Gabor transform. (2)- The threshold value that is used for discriminating the feature vectors is dependant on orientation, scale and parameters of Gabor transform. It was observed that it is possible to implement different systems with different thresholds but with the same identification rate.
1186
M. Noruzi, M. Vafadoost, and M.S. Moin
(3)- The parts of iris that are near the interior border have much more information than the other parts. It was observed that using 75 percents of iris image reduces the error rate 50 percents and improves the results. This characteristic is also used in the system that has been developed by Ma [12]. The final image is shown in fig 5.
Fig. 5. An example of truncated mapped image of iris
We implemented the system in identification mode. In this case the feature vector of each image is compared with the others feature vectors. The CASIA1 Iris Database has been used in our experiments. It contains 756 eye images from 108 persons. The resolution of images is 320x280.This database has also been used by other researchers and it has been reported explicitly that some errors of authentication are resulted from miss localization of irises [13, 14]2. The database includes 108 persons and 7 pictures for each person. These 7 pictures have taken in two sessions. Therefore we suppose that a person has enrolled in one session and is identified in another session, in other words each image in a session of a person is compared with the all of the images in other sessions. A program in MATLAB client and on a 2400 MHz PC has been written to simulate our proposed method. The total run time for each segmentation process is 7 seconds. The segmentation method has applied to each image in database. Some of the results are shown in Fig. 6. Presence of eyelids and eyelashes in eye image are the most important problem for iris localization and segmentation. Images in Fig. 6 are the samples containing this trouble clearly. For comparison, a gradient based method has also been implemented. The cross correlation between a dark circle and iris image is used to estimate the pupil position in gradient based method. The elapsed time for the gradient base method has been 5 seconds for each image in MATLAB client. It is noticeable that the performance of our implemented gradient based algorithm seems to be better than Daugman’s method in CASIA database [14].
Fig. 6. Extracted borders for some samples in database 1 2
Chinese Academy of Sciences Institute of Automation. The results of these articles are qualitative and do not report any percentage to compare with the results of the proposed algorithm.
Iris Recognition: Localization, Segmentation and Feature Extraction
1187
The results have been analyzed by experts. The acceptance and rejection criteria are the accuracy of fitness between the borders which have been extracted by the program and the experts. Averaging the results shows that the experts have judged using following rule: if the difference between positions of two borders is below 7 percents of the corresponding circle, then the segmentation is assumed successful. The results of segmentations are shown in Table 1. Table 1. Result of two implemented segmentation methods judged by experts
Elapsed Time For Each Image (Seconds) Gradient Based Method
Gabor Based Method
Interior Border Exterior Border Extraction Error Extraction Error % %
5
3
3.5
7
1.5
2.5
The results of identification system justify the results of proposed segmentation method. Many different Gabor filter have been used in identification process. For example in an implementation with the threshold value equal to 0.32, the FAR3 and FRR4 were obtained equal to zero and in another implementation using different Gabor Wavelet filter with threshold value equal to 0.36, the same results were obtained again. In other words, it was observed that the shape of statistical distributions of hamming distances is almost constant, but it is shifted slightly as a result of changing the parameters of Gabor Wavelet transform. The results show that the desirable length for a feature vector in a practical system varies between 400 to 800 bytes.
6 Conclusions A novel method has been introduces based on Gabor transform was introduced for iris segmentation and localization. Two important characteristics of method are: (1) automatic localization and segmentation of iris with good accuracy and (2) noticeable speed of processing. A gradient based method has also been implanted to evaluate the performance of proposed method. The implemented gradient based method is slightly different from the Daugman’s method due to use of horizontal gradient instead of radial gradient. Since the horizontal gradient is computed once, our gradient based method is faster than Daugman’s method. It is noticeable that the performance of gradient based algorithm is better than Daugman’s method in CASIA database. 3 4
False Acceptance Rate False Rejection Rate
1188
M. Noruzi, M. Vafadoost, and M.S. Moin
An important question remains concerning about the method for pupil center estimation: although the implemented algorithm works correctly for the pictures in our database, how we can be assured that it works well for a new sample? In other words, how good a constant template and a constant filter work for every eye image? Of course, we can not guaranty that. The solution for this problem is multi resolution pattern matching: expanding the current method by using multiple filters and templates resolutions, for example 3 values for each of them. As shown in Fig. 1, it is obvious that the unique pattern of pupil will be revealed at least in one of the transformed images. In the case of using different resolution patterns and filters, the final result can be the pair with the maximum correlation. Excepting the speed, the performance of Gabor based segmentation algorithm is better than gradient based method. Although the Gabor based algorithm is 40 percents slower than the gradient based, but it can be still considered fast enough for practical applications. The proposed algorithm for segmentation has no tuning parameter and no threshold. The algorithm is not sensitive to eyelashes. The used Gabor transform is band pass and thus the back ground illumination has the lowest effect on the results. These remarkable features make the algorithm reliable and practical for iris recognition applications.
Acknowledgment The authors would like to thank Chinese Academy of Sciences Institute of Automation for preparing the CASIA database.
References 1. Wilds, R.P.: Iris recognition: An emerging biometric technology. Proceeding of IEEE, Vol. 85, No. 9. (1997) 1348-1363. 2. Daugman, J.G.: High confidence visual recognition of persons by a test of statistical independence. IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol. 15, No. 11. (1993) 1148-1161. 3. Chen, D.:Joint Time-Frequency Analysis. Prentice Hall, (1996). 4. Heitger, F., Rosenthaler, L., von der Heydt, R., Peterhans, E., &Kubler, O.: Simulation of neural contour mechanisms: from simple to end-stopped cells. Vision Research, Vol. 32. (1992) 963–981. 5. MacLennan, B.:Gabor representation of spatiotemporal visual images. Tech. Report, CS91-144. Comp. Science Dept., Univ. Tennessee, (1994). 6. Lee. T.: Image representation using 2D Gabor wavelets. IEEE Trans- actions on Pattern Analysis and Machine Intelligence, Vol. 18, (1996) 959-971. 7. Ma, L., Wang, Y., Tan, T.: Iris Recognition Based on Multichannel Gabor Filtering. Proceedings of ACCV , Vol. I. Australia, (2002) 279-283. 8. Ma, L., Wang, Y., Tan, T.: Iris Recognition Using Circular Symmetric Filters. IEEE International Conference on Pattern Recognition, Vol. II. pp. Canada, (2002) 414-417. 9. Tisse, C., Martin, L.:Person Identification Technique Using Human Iris Recognition. Proc. of Vision Interface. (2002) 294-299.
Iris Recognition: Localization, Segmentation and Feature Extraction
1189
10. Sanchez-Reillo, R., Sanchez-Avila, C.: Iris Recognition with Low Template Size. Proc. of Audio and Video Based Biometric Person Authentication. (2001)324-329. 11. Haralick, R.M., Shapiro, L.G.: Computer and Robot Vision, Vol. II. Addison-Wesley, (1992) 316-317. 12. Ma, L.: Local Intensity Variation Analysis for Iris Recognition. Pattern Recognition, Vol. 37, No. 6. (2004) 1287-1298. 13. Ma, L., Tieniu, T.: Efficient Iris Recognition by Characterizing Key Local Variations. IEEE Trans. on Image Processing, Vol. 13, No. 6. (2004) 739- 750. 14. Ajdari-Rad, A., Safabakhsh, R.:Fast Iris and Pupil Localization and Eyelid Removal Using Gradient Vector Pairs and Certainty Factors. Proc. of Conf. Machine Vision and Image Processing. (2004) 82-91.
Optimal Edge Detection Using Perfect Sharpening of Ramp Edges Eun Mi Kim1 , Cherl Soo Pahk2 , and Jong Gu Lee3 1
2
Dept. of Computer Science, Howon Univ., Korea [email protected] Dept. of Visual Optics and Optometry, Daebul Univ., Korea [email protected] 3 Dept. of Computer Science, Chonbuk Univ., Korea [email protected]
Abstract. The strictly monotonic intensity profiles of ramp edges are mapped to a simple step function of intensity. Such a perfect sharpening of ramp edges can be used to define a differential operator nonlocally, in terms of which we can construct an edge detection algorithm to determine the edges of images effectively and adaptively to varying ramp widths.
1
Introduction
Since the image sensing devices have physical limits in resolution and the limits are not uniform in general the edge features of resultant images are manifested with widely varying widths of intensity ramp. Therefore, in order to detect and locate such edge features precisely in real images we have developed an algorithm by introducing a nonlocal differentiation of intensity profiles called adaptive directional derivative (ADD), which is evaluated independently of varying ramp widths[1]. Contrary to the usual conventional edge detectors, which employ the local differential operator such as the gradient together with different length scales in the inspection of intensity changes[2][3] or additional matched filters[4][5][6], the ramp edges are identified as strictly monotonic intervals of intensity profiles and these nonlocal features are described by the values of ADD’s together with the widths of the intervals. As in the Gaussian model of edge template, where the profile of edge signal intensity is described by the integral of a Gaussian added to a fixed background, we have two independent parameters for the edge detector. The one is the amplitude of edge corresponding to the value of ADD, and the other is the variance of edge indicating the edge acuity and hence related to the ramp width. Therefore we can detect ramp edges by evaluating ADD regardless of the ramp width or length scale. In this paper, we first review the edge detector employing the ADD and then, we modify the ADD by introducing an extrinsic map which transfers the strictly monotonic intensity profiles of ramps to optimal simple step functions and is referred to as perfect sharpening map. The step functions are determined to M. Gavrilova et al. (Eds.): ICCSA 2006, LNCS 3982, pp. 1190–1199, 2006. c Springer-Verlag Berlin Heidelberg 2006
Optimal Edge Detection Using Perfect Sharpening of Ramp Edges
1191
minimize their mean square errors to the real intensity profiles and the values of directional derivative of this optimal step functions are assigned to the original intensity profiles to define the modified adaptive directional derivative (MADD). By using the MADD instead of the ADD, we have the exact location of edge pixels determined naturally as the positions of the optimal simple step mapped to by the perfect sharpening map without bothering extra procedure to find the position such as the local center of directional derivative (LCDD)[1], which is the position averaged out of the pixels within the ramp by the weight of their directional derivatives. Our edge detection algorithm employing the MADD in x and y directions is completed with an elaborate edge trimming procedure added in to avoid the double manifestations resultant from identifying edges in both of the above two directions. The performance of the algorithm beyond the magnification of images is illustrated by comparing the results to those from the Canny’s edge detector.
2
Adaptive Directional Derivative for Ramp Edges
In actual images, edges are blurred to yield ramp-like intensity profile due to optical condition, sampling rate and other image acquisition imperfection. However, the criterion by the usual local operator such as gradient or directional derivative may omit or repeat identifying a ramp edge with gradually changing gray level. In order to detect ramp edges properly, we employ the ADD of 1-D intensity profile. The pixel on the image is represented as a 2-D position vector like p. Then, the 1-D profile crossing the pixel p in the θ-direction is described by a gray-level valued function f (p + quθ ) defined on a part of the linear sequence {p + quθ ; q ∈ Z} in terms of the θ-direction vector uθ such as u±x ≡ ±(1, 0),
u±y ≡ ±(0, 1),
u±+ ≡ ±(1, 1),
u±− ≡ ±(1, −1).
(1)
Then the directional derivative of intensity (gray level) in the θ-direction is defined as vθ (p) ≡ Dθ f (p) = f (p + uθ ) − f (p), (2) and extended nonlocally to define ADD in the corresponding direction as ∞ Δθ (p) ≡ 1 − δsθ (p),sθ (p−uθ ) δk+1,Nθ (p,k) vθ (p + kuθ ),
(3)
k=0
where Nθ (p, k) ≡
k
|sθ (p + nuθ )| δsθ (p),sθ (p+nuθ ) ,
(4)
n=0
⎧ ⎨ +1, sθ (p) ≡ 0, ⎩ −1,
vθ (p) > 0 vθ (p) = 0 vθ (p) < 0
(5)
1192
E.M. Kim, C.S. Pahk, and J.G. Lee
and δ(·, ·) is the Kronecker delta. According to this definition, Δθ (p) has a nonvanishing value only when p is the starting pixel of a strictly monotonic interval of the profile of f in the θ-direction. That is, if f starts to increase or decrease strictly at p and ends at p + wuθ from f (p) to f (p + wuθ ) along the θ-direction then Δθ (p) =
w−1
vθ (p + quθ )
q=0
= f (p + wuθ ) − f (p) ≡ A.
(6)
According to the definitions, vθ (p + quθ ) = −v−θ (p + (q + 1) uθ ) and in the strictly monotonic interval [p, p + wuθ ], Δθ (p) = −Δ−θ (p+wuθ ) is equal to the overall intensity change throughout the interval together with Δθ (p + quθ ) = 0 for 0 < q < w. Hence we can notice that the ADD Δθ specifies the strictly monotonic change of intensity and the corresponding interval [p, p + wuθ ] along some fixed θ-direction associated with a ramp edge. Thus we are available of a criterion for ramp edges in terms of Δθ with a threshold value T : (CRE) The absolute value of an adaptive directional derivative of gray level at p is larger than T , i. e. |Δθ (p)| ≥ T for a θ-directions. With respect to a ramp edge where two regions on the image are bounded by the change of intensity A, we introduce the φ-direction which makes an angle of magnitude φ from the direction normal to the edge. We also introduce the corresponding direction vector uφ and ramp width wφ , in terms of which the ramp profile in the φ-direction beginning at p is described by a strictly monotonic gray level distribution over the interval [p, p + wφ uφ ]. u0 is normal and u±π/2 is tangential to the edge. If the edge is convex in the positive u0 direction, Δφ (p) = A all for −π/2 ≤ φ ≤ π/2. If the edge is concave in the positive u0 direction, Δφ (p) = A only for a restricted range of angle near φ = 0 and Δφ (p) = 0 definitely at φ = ±π/2. Therefore, in order to detect the intensity changes associated with all the ramp edges by estimating the ADD Δθ in general, we should employ at least two of them Δθ1 and Δθ2 with different fixed directions θ1 and θ2 making an right angle. Employing the criterion (CRE), the local maximal length interval [p, p+wuθ ] of strictly monotonic gray level distribution along the θ-directions satisfying |f (p + wuθ ) − f (p)| ≥ T is identified as the width of a ramp edge, and one of w + 1 pixels in the interval is determined as the edge pixel. The detection of the edges of all directions can be performed by using the ADD Δθ1 (p) or Δθ2 (p) of two orthogonal directions and we can use the two fixed directions θ1 = x and θ2 = y in practice. Here, we note that the criterion (CRE) can detect the edge regardless of spatial blurring or scaling which are concerned with the edge width w. However, we need a natural procedure in addition in order to determine the precise location of a single edge pixel out of the ramp interval [p, p + wuθ ] such as the LCDD.
Optimal Edge Detection Using Perfect Sharpening of Ramp Edges
3
1193
Modified Adaptive Directional Derivative and Perfect Sharpening of Ramp Edges
With respect to a ramp edge, which is identified as a strictly monotonic interval in the 1-D intensity profile in some direction on the image, we adopt an ideal step edge as the result of its perfect sharpening. That is, each (ramp) edge is associated with a unique step edge located at a certain pixel within the ramp with the same amplitude (i. e. overall intensity change throughout the whole ramp). If we are provided with a relevant rule to locate a unique step edge associated with the ramp edge, we can assign the directional derivative of the intensity of the associated ideal step edge to the ramp edge so that we define it as the MADD in the corresponding direction on the image where the 1-D intensity profile is identified as this strictly monotonic function. The strictly monotonic 1-D intensity profile f (q) representing a ramp edge over the interval [0, w] of q with f (0) = B and f (w) = A + B, for example, is associated with a ideal step edge whose intensity profile is described by B, 0 ≤ q ≤ qE fs (q) = (7) A + B, qE < q ≤ w for the relevant qE within the interval [0, w] (Fig. 1). In the pixel space manifestation, the directional derivative of fs is determined as ∂fs A, q = [qE ] (q) ≡ fs (q + 1) − fs (q) = (8) 0, q = [qE ] ∂q since q should be integer-valued. If the map S : f → fs , which assigns a unique step function fs of the amplitude A to each strictly monotonic function f with the same amplitude and performs the perfect sharpening, is well defined, we can define the MADD of f as f (q) ≡ ∂fs (q) Δ (9) ∂q for every q ∈ [0, w]. In order to define the map S completely, we still have to determine the relevant location qE of the step. The relevance can be granted
Fig. 1. Perfect sharpening of the ramp intensity profile
1194
E.M. Kim, C.S. Pahk, and J.G. Lee
by minimizing the mean square error associated with the map S, M SE(S). Provided the intensity function is independent and identically distributed, the mean square error reduced to
w M SE(S) = P (fs ; q)[fs (q) − f (q)]2 dq
0 w = [fs (q) − f (q)]2 dq (10) 0
qE
w = [B − f (q)]2 dq + [A + B − f (q)]2 dq, qE
0
together with a uniform density P (fs ; q) = 1. The value of qE minimizing M SE(S) satisfies d 2 2 M SE(S) = [B − f (qE )] − [A + B − f (qE )] = 0, dqE
(11)
which dictates the step locating condition (SLC) at q = qE ; (SLC)
f (qE ) − B =
A . 2
With respect to the strictly monotonic function, such a position q = qE is uniquely determined. (Fig. 1) When f itself is a step function or constant, the minimum of M SE(S) is trivially accomplished by fs = f without bothering the condition SLC and the perfect sharpening map S is well defined. In the pixel space, the 1-D intensity profile f (p + quθ ) with q ∈ Z in the θ-direction of an image can be divided into a sequence of strictly monotonic or constant intensity intervals. With respect to each of these intervals, the profile can be mapped to the step (or constant) function by the map S, which can be naturally extended over the whole of intensity profile to obtain fs (p + quθ ) and also the MADD of f (p + quθ ) can be defined as θ (p + quθ ) = ∂fs (p + quθ ) ≡ fs (p + (q + 1)uθ ) − fs (p + quθ ) Δ ∂q
(12)
all over the whole intensity profile in the θ-direction. In the pixel space manifestation, the condition (SLC) is replaced by (ISLC) f (p + [qE ]uθ ) − B ≤
A ≤ f (p + ([qE ] + 1) uθ ) − B, 2
for the strictly increasing interval and (DSLC)
f (p + [qE ]uθ ) − B ≥
A > f (p + ([qE ] + 1) uθ ) − B, 2
for the strictly decreasing interval. With respect to a strictly monotonic interval [p, p + wuθ ] non-extensible (i. e. not extended to the larger one including it)
Optimal Edge Detection Using Perfect Sharpening of Ramp Edges
1195
θ (p + quθ ) (q = in the θ-direction with f (p + wuθ ) − f (p) = A, the MADD Δ θ (p + quθ ) = 0 0, 1, 2, . . . , w − 1) is valued as Δθ (p + quθ ) = A for q = [qE ] and Δ otherwise. We thus can find out that the MADD can detect the significant change A (of intensity) and its relevant location q = [qE ] simultaneously. On substitut θ for the ADD Δθ , the detection and location of edge pixels is ing the MADD Δ θ : accomplished simply by implementing the criterion on Δ θ (p)| ≥ T at a If the MADD of intensity in a θ-direction satisfies |Δ pixel p, this pixel is an edge pixel. With respect to the above strictly monotonic interval [p, p + wuθ ], if |A| ≥ T , θ (p + [qE ]uθ )| ≥ T and p + [qE ]uθ is an edge pixel according to this criterion. |Δ θ (p + [qE ]uθ ) = Δθ (p), the MADD’s Δ θ1 and Δ θ2 in two orthogonal Since Δ fixed directions can pick out all of the edges of an image as the ADD’s do. While x and Δ y in order to detect all the edge of images, we can employ two MADD’s Δ x and Δ y . In particular some parts of edges can be doubly identified both by Δ when the edges are severely curved, the detected edge pixels would form hairy edge lines with a few cilia. Therefore, we implement some exclusive conditions x and Δ y applied case by case to that render only one of the two MADD’s Δ x is used to identify obtain clear edge lines. The relevant condition is such that Δ the edge only when |vx | is larger than |vy | and Δy is adopted otherwise. Hence we can describe our edge detection strategy as the following procedures: P1. We scan the image along the x direction to find the strictly monotonic and nonextensible intervals such as [p, p + wx ux ] where the overall change of x (p+[qE ]ux )| for [qE ] ∈ [0, wx −1] satisfies intensity |f (p+wx ux )−f (p)| i. e. |Δ the criterion |Δx (p + [qE ]ux )| ≥ T . P2. Locate an edge pixel at p + [qE ]ux within the interval where the condition (ISLC) or (DSLC) with θ = x is satisfied only when |vx (p + [qE ]ux )| ≥ |vy (p + [qE ]ux )|. P3. Repeat the procedures with respect to the y direction to find the strictly y (p + monotonic and nonextensible intervals such as [p, p + wy uy ] satisfying |Δ [qE ]uy )| ≥ T for [qE ] ∈ [0, wy − 1]. P4. Locate an edge pixel at p + [qE ]uy within the interval where the condition (ISLC) or (DSLC) with θ = y is satisfied only when |vy (p + [qE ]uy )| ≥ |vx (p + [qE ]uy )|. We can notice that these procedures can detect an edge regardless of spatial blurring or scaling which are concerned with the edge width parameter wx and wy , and extract the exact edge simply by deciding such pixels at q = [qE ] that the half of the overall intensity change throughout the strictly monotonic and nonextensible interval occurs at q = [qE ] or between [qE ] and [qE + 1]. Actually
1196
E.M. Kim, C.S. Pahk, and J.G. Lee
magnifying the image by the scale κ makes no differences in the crucial condi x (κp + [κqE ]ux )| ≥ T , |vx (κp + [κqE ]ux )| ≥ |vy (κp + [κqE ]ux )|, etc. of tions |Δ this procedures except for changing to κp, κwx , κwy and [κqE ]. Here, we note θ (κp + κquθ ) = Δ θ (p + quθ ) that vθ (κp + κquθ ) = (1/κ)vθ (p + quθ ) while Δ is left invariant. Therefore, the above procedure has the invariant performance beyond scaling contrary to the usual edge detectors employing local derivatives such as vθ .
4
Algorithm and Applications
x and Δ y along The entire edge detection procedure is performed by estimating Δ the every vertical and horizontal lines of an image. With respect to the strictly increasing intervals in the x-direction, we apply the SIEDPS (strictly increasing edge detection by perfect sharpening) procedure to detect and locate an edge pixel, which is suspended at the end pixel pF of the line and specified in the following pseudo code: Procedure SIEDPS 1. While vx (p) > 0 and p = pF with respect to the value vx (p) read from the successive scan pixel by pixel along the x-direction, do = 0, F E = 0 and P E = p initially. (a) Put Δ to obtain its new value. (b) Add vx (p) to Δ add vx (P E) to F E to obtain the new value (c) With respect to every Δ, of F E and substitute the next pixel for P E to obtain the new value of < F E + vx (P E). vx (P E) until it satisfies Δ/2 2. When vx (p) ≤ 0 or p = pF (i. e. at the end of the strictly increasing interval ≥ T and |vx (P E)| ≥ |vy (P E)|. or line), write P E as an edge pixel if Δ With respect to the strictly increasing intervals in the y-direction, we apply the above SIEDPS procedure with x-direction replaced by y-direction. With respect to strictly decreasing intervals, we substitute the SDEDPS (strictly decreasing edge detection by perfect sharpening) procedure, which is obtained simply by replacing vθ (p) with −vθ (p) in the SIEDPS procedure. In order to construct the algorithm for detecting edge pixels from scanning the whole of a single line in the θ = x or θ = y direction, we combine the SIEDPS and SDEDPS procedures as the sub-procedures for the pixels with vθ (p) > 0 and vθ (p) < 0 respectively, together with the bypassing procedure which produces no output for the pixels with vθ (p) = 0. As the output of this algorithm for edge detection by perfect sharpening of ramps (EDPSR), we obtain the edge pixels identified with the pixels corresponding to the half of intensity change over edge ramps from the SIEDS or SDEDS procedures. Then, a complete edge detection algorithm of an image is constructed by the integration of EDPSR algorithm over all the lines of x and y direction of the image.
Optimal Edge Detection Using Perfect Sharpening of Ramp Edges
1197
Fig. 2. Trimming the edge of disc
We first apply our new algorithm to a simple but illustrative image of circular disc. (Fig. 2a) In Fig. 2b, we have the result of edge detection by our former algorithm introduced in the section 2, where some hairs and doubling of edge line are observed implying double manifestation of the edge identified twice by the scans of x and y directions. We can find the hairs trimmed and doubled lines disappear in Fig. 2c, which is the result of applying our new algorithm. We can thus notice that the exclusive selection simply by comparing the absolute values |vx | and |vy | of directional derivatives out of the identifications by the values of x and Δ y works relevantly. MADD’s Δ
Fig. 3. Comparison of the results of edge detection
We next apply the algorithm to a practical image, the picture of Lena. (Fig. 3a) The result Fig. 3c of our algorithm with the threshold T = 30 out of 256 gray levels is compared to the result Fig. 3b of Canny’s in terms of Sobel mask with the equivalent lower threshold value 120. We can see equally (or even more) precise edges manifested successfully by the rather simple algorithm identifying strictly monotonic intervals out of the image. In Fig. 3b, the thick edge lines appear since every pixel over the lower threshold in the same ramp edge can be manifested and still further exhaustive thinning procedure should be applied to obtain precise edge location while every edge line has single pixel width representing precise edge location in Fig. 3c. When the image is magnified κ times in length scale, the local derivatives in the pixel space such as Sobel operator have their values multiplied by 1/κ
1198
E.M. Kim, C.S. Pahk, and J.G. Lee
θ is invariant. Therefore, the Canny’s edge detector may lose while our MADD Δ some edges after magnification unless the threshold is readjusted. However, our edge detector can remain equally effective beyond scaling. In Fig. 4a and 4b, two parts of the origin of Lena image Fig. 3a are magnified 4 times. The canny’s edge detector yields the results of Fig. 4c and 4d and our edge detector yields Fig. 4e and 4f. We can observe that some edge lines manifested in Fig. 4e and 4f are lost in Fig. 4c and 4d in this magnification. That is, the edges are well identified by our edge detector still after some magnifications although some of them are lost by the edge detectors employing the local differential operators.
Fig. 4. Comparison of the results of edge detection by image magnification
5
Discussions
In order to verify the presence of edges and locate the precise edge pixels, we x (p) and Δ y (p) independently. Therefore, only have to estimate the MADD’s Δ this algorithm seems to have relative simplicity of calculation compared to the usual ones employing the magnitude of gradient which include the floating point operation for the calculation of square root and needing extra thinning algo x (p) and Δ y (p) are badly nonlocal, they can rithm. Although the operators Δ be estimated by single oneway scan of the local derivatives vx (p) and vy (p) as described in the procedures SIEDPS and so on, which causes no additional calculational expense. We also note that the additional calculations comparing
Optimal Edge Detection Using Perfect Sharpening of Ramp Edges
1199
|vx | and |vy | in order to avoid the double manifestations of edges due to the x and Δ y cost not so much at all since the involved double identifications by Δ calculations are simple linear and very restrictive. The false positive effect of the above double manifestation occurs when one of x and Δ y does the strictly monotonic intervals associated with the MADD’s Δ not completely cross over the edge ramp between the two regions divided by it, which mostly implicates that the corresponding directional derivative has a smaller absolute value than the other. Therefore, such a false positive effect on edge detection can be suppressed efficiently by demanding that the corresponding directional derivative has a larger absolute value than the other, which has been not proved exactly. Because of the nonuniformity in illumination, the gradual change of intensity over large area of a single surface may be recognized as a false edge in applying the MADD of intensity profile as the characteristic attribute of edge signal. In order to prevent this false positive effect in particular, we need the unsharp masking or flat-fielding procedure before applying the edge detector[7][8], or can restrict the width of strictly monotonic intervals to be detected.
Acknowledgements This work was supported by the grants from the Howon University and the Daebul University.
References 1. E. M. Kim and C. S. Pahk, ”Strict Monotone of an Edge Intensity Profile and a New Edge Detection Algorithm,” LNCS 2690: Intelligent Data Engineering and Automated Learning, pp. 975-982 (2003). 2. D. Marr and E. Hildreth, ”Theory of edge detection,” Proc. R. Soc. London B207, 187-217 (1980). 3. J. Canny, ”A computational approach to edge detection,” IEEE Trans. Pattern Anal. Mach. Intell. PAMI-8, 679-698 (1986). 4. R. A. Boie and I. Cox, ”Two dimensional optimum edge recognition using matched and Wiener filters for machine vision,” in Proceedings of the IEEE First International Conference on Computer Vision (IEEE, New York, 1987), pp. 450-456. 5. R. J. Qian and T. S. Huang, ”Optimal Edge Detection in Two-Dimensional Images.” IEEE Trans. Image Processing, vol. 5, no. 7, pp. 1215-1220 (1996). 6. Z. Wang, K. R. Rao and J. Ben-Arie, ”Optimal Ramp Edge Detection Using Expansion Matching.” IEEE Trans. Pattern Anal. Machine Intell., vol. 18, no. 11. pp. 1586-1592 (1996). 7. J. R. Parker, ”Algorithms for Image Processing and Computer Vision.” Wiley (1997) and references therein. 8. M. Seul, L. O’Gorman and M. J. Sammon, ”Practical Algorithms for Image Analysis.” Cambridge U. Press, Cambridge (2000) and references therein.
Eye Tracking Using Neural Network and Mean-Shift Eun Yi Kim1 and Sin Kuk Kang2 1
Department of Internet and Multimedia Engineering, Konkuk Univ., Korea [email protected] 2 Department of Computer Engineering, Seoul National Univ., Korea [email protected]
Abstract. In this paper, an eye tracking method is presented using a neural network (NN) and mean-shift algorithm that can accurately detect and track user’s eyes under the cluttered background. In the proposed method, to deal with the rigid head motion, the facial region is first obtained using skin-color model and connected-component analysis. Thereafter the eye regions are localized using neural network (NN)-based texture classifier that discriminates the facial region into eye class and non-eye class, which enables our method to accurately detect users’ eyes even if they put on glasses. Once the eye region is localized, they are continuously and correctly tracking by mean-shift algorithm. To assess the validity of the proposed method, it is applied to the interface system using eye movement and is tested with a group of 25 users through playing a ‘aligns games.’ The results show that the system process more than 30 frames/sec on PC for the 320×240 size input image and supply a user-friendly and convenient access to a computer in real-time operation.
1 Introduction Gesture-based interface have been considerable interests during the last decades that use human gestures to convey information such as input data or to control devices and applications such as computers, games, PDAs, etc. Users create gestures by a static hand or body pose or a physical motion including eye blinks or head movements. Among them, human eye movements such as gaze direction and eye blinks have a high communicate value Due to this, such an interface using eye movements has gained many attractions, so far many systems have been developed [1-4]. Then they can be classified into two major techniques: device-based techniques and video-based techniques. Device-based techniques use a glasses, head band, or cap with infrared/ultrasound emitters, which measure the user’s motion using the changes in ultrasound waves or infrared reflector. In contrast to the device-based techniques using additional devices, the video-based techniques detects the user’s face and facial features by processing the images or videos obtained via a camera. When compared with the device-based techniques, these are a non-intrusive and comfortable to users, and inexpensive communication device. For practical use of these video-based systems, automatic detection and tracking of faces and eyes in real-life situation should be first supported. However, in most of the commercial systems, the initial eye or face position are manually given or some M. Gavrilova et al. (Eds.): ICCSA 2006, LNCS 3982, pp. 1200 – 1209, 2006. © Springer-Verlag Berlin Heidelberg 2006
Eye Tracking Using Neural Network and Mean-Shift
1201
conditions are used; the user initially clicks on the features to be tracked via a mouse in [5]; some systems require the user to blink only the eye for a seconds and then finds the eye regions via differencing the successive frames [6]. Moreover, they use the strong assumptions to make the problems more tractable. Some common assumptions are the images contain frontal facial view and the face has no facial hair or glasses. To overcome abovementioned problems, this paper proposes a new eye tracking method using neural network (NN) and mean-shift that can automatically locate and track the accurate features under the cluttered background with no constraints. In our method, the facial region is first obtained using skin-color model and connectedcomponent analysis. And then, the eye regions are localized by a NN-based texture classifier that discriminates each pixel in the extracted facial regions into the eye-class and non-class using the texture property. This enables us to accurately detect user’s eye region even if they put on the glasses in the cluttered background. Once the eye regions are detected in the first frame, they are continuously tracked by a mean-shift. To evaluate the proposed method, the method is applied to the interface to convey the user’s command via his (her) eye movements. The interface system is tested with 25 numbers of people, and then the result shows that our method is robust to the timevarying illumination and less sensitive to the specula reflection of eyeglasses. Also, the results show that it can be efficiently and effectively used as the interface to provide a user-friendly and convenient communication device. The remainder of this paper is organized as follows: Sections 2 describes the face extraction using skin-color model, then Section 3 provides a detailed description of the eye detection. The tracking process of the extracted eye regions is described in the next section. Thereafter, the experimental results are presented in Section 5, and Section 6 gives a final summary.
2 Skin-Color Based Face Extractor Detecting pixels with a skin-color offers a reliable method for detecting face part. In the RGB space obtained by most video cameras, the RGB representation includes both color and brightness. As such, RGB is not necessarily the best color representation for detecting pixels with a skin-color [7-10]. The chromatic colors provide a skin-color representation by normalizing the RGBvalue by its intensity. Therefore, the brightness can be removed by dividing the three components of a color pixel by the intensity. This space is known as chromatic color. Fig. 1 shows the color distribution of human faces obtained from sixty test images in chromatic color space. Since the color distribution is clustered within a small area of the chromatic color space it can be approximated by a 2D Gaussian distribution. Therefore, the generalized skin-color model can be represented by 2D Gaussian dis2 tribution, G(m, Σ ), as follows,
m = (r , g ) , r = 1
N
¦r , N i
i =1
where and
g=
1 N
N
¦g , i
i =1
ªσ rr Σ=« ¬σ gr
σ rg º σ gg »¼
(1)
r and g represent Gaussian means of r and g color distribution respectively,
Σ represent covariance matrix of each distribution. 2
1202
E.Y. Kim and S.K. Kang
Fig. 1. The color distribution of human faces in chromatic color space
Once the skin-color model is created, the most straightforward way to locate the face is to match the skin-color model with the input image to identify facial regions. As such, each pixel in the original image is converted into chromatic color space, then compared with the distribution of the skin-color model. By thresholding the matched results, we obtain a binary image (see Fig. 2(b)). For the binary image, the connected component labeling is performed to remove the noise and small region. Then the facial regions are obtained by selecting the largest components with skin-colors, which is shown in Fig. 2(c).
(a)
(b)
(c) Fig. 2. Face detection results. (a) original color images, (b) skin-color detection results, (c) extracted facial regions
3 NN-Based Eye Detection Our goal is to detect the eye in the facial region and track it through the whole sequence. Generally, the eye region has the following properties: 1) it has the high brightness contrast between white eye sclera and dark iris and pupil, along the texture
Eye Tracking Using Neural Network and Mean-Shift
1203
of the eyelid; 2) it has place in the upper of the facial region. These properties help reduce the complexity of the problem, and facilitate the discrimination between the eye regions from the whole face. Here, we use a neural network as a texture classifier to automatically discriminate the pixels of the facial regions into eye regions and noneye ones in various environments. The network scans all the pixels in the upper facial region so as to classify them as eye or non-eye. The network receives the gray-scale value of a pixel and its neighboring pixel within a small window to classify the pixel as eye or non-eye. Then, the output of the network represents the class of the central pixel in the input window. A diagram of our eye detection scheme is shown in Fig. 3.
Fig. 3. A diagram of eye detection scheme
3.1 Texture Classification We assume that the eye has a different texture from the facial region and is detectable. The simplest way to characterize the variability in a texture pattern is by noting the gray-level values of the raw pixels. This set of gray values becomes the feature set on which the classification is based. An important advantage of this approach is the speed with which images can be processed, as the features do not need to be calculated. However, the main disadvantage is that the size of the feature vector is large. Accordingly, we use a configuration from autoregressive features (only the shaded pixels from the input window in Fig. 3), instead of all the pixels in the input window. This reduces the size of the feature vector from M2 to (4M-3), thereby resulting in an improved generalization performance and classification speed. After training, the neural network outputs the real value between 0 and 1 for eye and non-eye respectively. If a pixel has a larger value than the given threshold value, it is considered as an eye; otherwise it is labeled as non-eye. Accordingly, the result of the classification is a binary image. Fig. 4 shows the classification result using the neural network. Fig. 4(a) is an input frame and Fig. 4(b) is the extracted facial regions, and then the result of detected eye pixels is shown in Fig. 4(c). In Fig. 4(c), the pixels to be classified as eyes are marked as black. We can see that all of the eyes are labeled correctly, but there are some misclassified regions as eye.
1204
E.Y. Kim and S.K. Kang
(a)
(b)
(c)
(d)
Fig. 4. An example of eye detection. (a) an original image, (b) the extracted facial regions (c) the classified image by the neural network, (d) the detected eye region after post-processing.
3.2 Post-processing Although we use the bootstrap method to make the eye detection, the detection result from the MLP includes many false alarms. As such we still encounter difficulties in filtering out high-frequency and high-contrast non-eye regions. In this paper, we use the connected-component analysis result posterior to the texture classification. The generated connected-components (CC) are filtered by their attributes, such as size, area, and location. We have then applied two-stage filtering on the CCs: Stage 1: Heuristics using features of CCs such as area, fill factor, and horizontal and vertical extents; The width of the eye region must be larger than the predefined minimum width (Min_width), the area of the component should be larger than Min_area and smaller than Max_area, and the fill factor should be larger than Min_fillfactor. Stage 2: Heuristics using the geometric alignment of eye components; We check the rotation angles of two nearby eye candidates. They have to be smaller than the predefined rotation angle. Using these two heuristics, the classified images are filtered, then the resulting image is shown in Fig. 4(d). In Fig. 4(d), the extracted eye region is filled blue for the better viewing.
4 Mean-Shift Based Eye Tracker To track the detected eyes, a mean shift algorithm is used, which finds the object by seeking the mode of the object score distribution. In the present work, the color distribution of detected pupil, Pm(gs)=−(2πσ)-1/2exp{(gs−ȝ)2σ-1 )}, where the ȝ and σ are set to 40 and 4 respectively. The distribution is used as the object score distribution at site s, which represents the probability of belonging to an eye.
Eye Tracking Using Neural Network and Mean-Shift
1205
A mean shift algorithm is a nonparametric technique that climbs the gradient of a probability distribution to find the nearest dominant mode. The algorithm iteratively shifts the center of the search window to the weighted mean until the difference between the means of successive iterations is less than a threshold. The depth of the color means the object score, i.e. the probability of belonging to an object. The solidline rectangle is the search window for the current iteration, while the dotted line represents the shifted search window. The search window is moved in the direction of the object by shifting the center of the window to the weighted mean. The weighted mean, i.e. the search window center at iteration n+1, mn+1 is computed using the following equation, mn +1 = ¦ Pm ( g s ) ⋅ s s∈W
¦ P (g m
s
)
(2)
s∈W
The search window size for a mean shift algorithm is generally determined according to the object size, which is efficient when tracking an object with only a small motion. However, in many cases, objects have a large motion and low frame rate, which means the objects end up outside the search window. Therefore, a search window that is smaller than the object motion will fail to track the object. Accordingly, in this paper, the size of the search window of the mean shift algorithm is adaptively determined in direct proportion to the motion of the object as follows:
((
))
((
))
(t ) Wwidth = max α mx(t −1) − mx(t − 2) − Bwidth) ,0 + βBwidth
.
(3)
(t ) Wheight = max α mx(t −1) − mx(t −2 ) − Bheight) ,0 + βBheight
(t>2) where Į and ȕ are constant and t is the frame index. This adaptation of the window size allows for accurate tracking of highly active objects. Fig. 5 shows the results of the eye tracking, where the eyes are filled out white for the better viewing. As can be seen in Fig. 5, the eye regions are accurately tracking. Moreover, the proposed method can determine the gaze direction.
Fig. 5. An eye tracking result
5 Experimental Results To show the effectiveness of the proposed method, it was applied to the interface to convey users’ command via an eye movement. Fig. 6 shows the interface using our method, which consists of a PC camera and a computer. The PC camera, which is connected to the computer through the USB port,
1206
E.Y. Kim and S.K. Kang
supplies 30 color images of size 320×240 per second. The computer is a PentiumIV−1.7GHz with the Window XP operating system, and then it translates the user’s eye movements into the mouse movements by processing the images received from the PC camera. Then the processing of a video sequence is performed by the proposed eye tracking method.
Fig. 6. The interface system using our eye tracking method
To assess the effectiveness of the proposed method, the interface system was tested with 25-users under the various environments. The tracking results are shown in Fig. 7. The extracted eyes have been filled white for better viewing. The features are tracked throughout the 100 frames and not lost once. To quantitatively evaluate the performance of the interface system, it was tested using an ‘alien game’ [11]. Each user was given an introduction to how the system worked and then allowed to practice moving the cursor for five minutes. The fiveminute practice period was perceived as sufficient training time by the users. In the computer game, “aliens” appear at random locations on the screen one at a time, as shown in Fig. 8. To catch aliens, users must point the cursor at the boxes. Each user was asked to “catch ten aliens” three times with the standard mouse and three times with the mouse using eye movement. The type of mouse that was tested first was chosen randomly. For each test, the user’s time to play the game was recorded. Table 1 presents the average time to be taken to catch one alien, when playing with the regular mouse and the mouse using eye movements. The former is about 3-times faster than the latter. But, the latter can process more than 30 frames/sec on a notebook without any additional hardware, for the 320×240 size input image, which is enough to apply to the real-time application. Moreover, the implemented system is not needed any additional hardware except a general PC and an inexpensive PC camera, the system is very efficient to realize many applications using real-time interactive information between users and computer systems. Consequently, the experiment showed that it has a potential to be used as interface for handicapped people and generalized user interface in many applications.
Eye Tracking Using Neural Network and Mean-Shift
Fig. 7. Tracking results in the cluttered environments
Fig. 8. Aliens game
1207
1208
E.Y. Kim and S.K. Kang
Table 1. Timing comparison between regular mouse and the mouse using eye movements
Method Standard Mouse Eye Mouse
Measure Mean Deviation Mean Deviation
Time/sec 0.44s 0.07s 1.67s 0.21s
6 Conclusions In this paper, we proposed an eye tracking method using NN and mean-shift procedure, and implemented the mouse system to receive user’s eye as an input signal to control a computer. The proposed eye tracking method consists of three modules: face extractor, eye detector, and eye tracker. To deal with the rigid motion of a user, a face is first extracted using skin-color model and connected-component analysis. Thereafter, the user’s eyes are localized by the NN-based texture classifier, and then the eyes are continuously tracking by a mean-shift procedure. The proposed method was tested with 25 peoples, and the results shows that our method has the following advantages: 1) it is robust to the time-varying illumination and less sensitive to the specula reflection, 2) it works well on the input image of the low resolutions. However, the proposed method has some problems. Although it is fast enough to apply for user interface and other application, the proposed method is slower than the standard mouse. Therefore, we are currently investing the speed-up of our method.
References 1. Jacob, Robert J. K.: Human-computer interaction: Input devices. ACM Computing Surveys, Vol. 28, No. 1 (1996) 2. Sharma, R., Pavlovic, V.I., Huang, T.S.: Toward multimodal human-computer interface. Proceedings of the IEEE , Volume: 86 , Issue: 5 (1998) 853 – 869 3. Kaufman, Arie E., Bandopadhay, Amit., Shaviv, Bernard D.: An Eye Tracking Computer User Interface. Virtual Reality, 1993. Proceedings., IEEE 1993 Symposium on Research Frontiers in , 25-26 (1993) 4. Scassellati, Brian.: Eye finding via face detection for a foveated, active vision system. American Association for Artificial Intelligence. (1998) 5. Kurata, Takeshi., Okuma, Takashi., Kourogi, Masakatsu., Sakaue, Katsuhiko.: The Hand Mouse: GMM Hand-color Classification and Mean Shift Tracking. In Proc. Second International Workshop on Recognition, Analysis and Tracking of Faces and Gestures in Realtime Systems (RATFG-RTS 2001) in conjunction with ICCV 2001 in Vancouver, Canada. (2001) 119-124 6. Takami, N. Irie, Kang C., Ishimatsu, T., Ochiai, T.: Computer interface to use head movement for handicapped people. TENCON '96. Proceedings. 1996 IEEE TENCON. Digital Signal Processing Applications , Volume: 1 , 26-29 Nov. vol. 1 (1996) 468 – 472
Eye Tracking Using Neural Network and Mean-Shift
1209
7. Sako, H., Whitehouse, M., Smith, A., Sutherland, A.,: Real-time facial-feature tracking based on matching techniques and its applications. Pattern Recognition, 1994. Vol. 2 Conference B: Computer Vision & Image Processing., Proceedings of the 12th IAPR International. Conference on , Volume: 2 , 9-13 vol.2 (1994) 320 - 324 8. Kim, Sang-Hoon., Kim, Hyoung-Gon., Tchah, Kyun-Hyon.: Object oriented face detection using range and color information. Electronics Letters , Volume: 34 , Issue: 10 , 14 (1998) 979 – 980 9. Schiele, Bernet., Waibel, Alex.:Gaze Tracking Based on Face-Color. School of Computer Science, Carnegie Mello University (1995) 10. Yang, Jie., A., Waibel.: A real-time face tracker. Applications of Computer Vision, 1996. WACV '96., Proceedings 3rd IEEE Workshop on , 2-4 (1996) 142 – 147 11. Betke, M., Gips, J., Fleming, P.: The camera mouse: visual tracking of body features to provide computer access for people with severe disabilities. Neural Systems and Rehabilitation Engineering, IEEE Transactions on [see also IEEE Trans. on Rehabilitation Engineering] , Volume: 10 , Issue: 1. (2002) 1 – 10
The Optimal Feature Extraction Procedure for Statistical Pattern Recognition Marek Kurzynski and Edward Puchala Wroclaw University of Technology, Faculty of Electronics, Chair of Systems and Computer Networks, Wyb. Wyspianskiego 27, 50-370 Wroclaw, Poland [email protected], [email protected]
Abstract. The paper deals with the extraction of features for object recognition. Bayes’ probability of correct classification was adopted as the extraction criterion. The problem with full probabilistic information is discussed in detail. A simple calculation example is given and solved. One of the paper’s chapters is devoted to a case when the available information is contained in the so-called learning sequence (the case of recognition with learning).
1
Introduction
One of the fundamental problems in statistical pattern recognition is representing patterns through a reduced number of dimensions. In most practical cases, the pattern feature space dimension is rather large due to the fact that it is too difficult or impossible to directly evaluate the usefulness of a particular feature at the design stage. Therefore it is reasonable to initially include all the potentially useful features and to reduce this set later. There are two main methods of dimensionality reduction ([1], [2], [6]): feature selection in which we select the best possible subset of input features and feature extraction consisting in finding a transformation to a lower dimensional space. We shall concentrate here on feature reduction. There are many effective methods of feature reduction. One can consider here linear and nonlinear feature extraction methods, particularly ones which ([4], [5]): 1. minimize the conditional risk or probability of incorrect object classification, 2. maximize or minimize the previously adopted objective function, 3. maximize the criteria for the information values of the individual features (or sets of features) describing the objects. In each of the above cases, extraction means a feature space transformation leading to a reduction in the dimensionality of the space. One should note that feature extraction may result in deterioration of object classification quality and so it should be performed taking into account an increase in computer system operating speed and a reduction in data processing time while maintaining the best possible quality of the decision aiding systems. In this paper a novel approach to the statistical problem of feature reduction is proposed. A linear transformation of the n-dimensional space of features M. Gavrilova et al. (Eds.): ICCSA 2006, LNCS 3982, pp. 1210–1215, 2006. c Springer-Verlag Berlin Heidelberg 2006
The Optimal Feature Extraction Procedure
1211
was adopted as the base for the extraction algorithm. The space allows one to describe an object in a new m-dimensional space with reduced dimensionality (m < n). In order to define a linear transformation one should determine the values of the transformation matrix components. This means that a properly defined optimisation problem should be solved. The probability of correct classification of objects in the recognition process was adopted as the criterion.
2
Preliminaries and the Problem Statement
Let us consider the pattern recognition problem with probabilistic model. This means that n-dimensional vector of features describing recognized pattern x = (x1 , x2 , ..., xn )T ∈ X ⊆ Rn and its class number j ∈ M = {1, 2, ..., M } are observed values of a couple of random variables (X, J), respectively. Its probability distribution is given by a priori probabilities of classes pj = P (J = j), j ∈ M
(1)
and class-conditional probability density function (CPDFs) of X fj (x) = f (x/j), x ∈ X , j ∈ M.
(2)
In order to reduce dimensionality of feature space let consider linear transformation y = Ax, (3) which maps n-dimensional input feature space X into m-dimensional derivative feature space Y ⊆ Rm , or - under assumption that m < n - reduces dimensionality of space of object descriptors. It is obvious, that y is a vector of observed values of m dimensional random variable Y, which probability distribution given by CPDFs depends on mapping matrix A, viz. g(y/j; A) = gj (y; A), y ∈ Y, j ∈ M.
(4)
Let introduce now a criterion function Q(A) which evaluates discriminative ability of features y, i.e. Q states a measure of feature extraction mapping (3). As a criterion Q any measure can be involved which evaluates both the relevance of features based on a feature capacity to discriminate between classes or quality of a recognition algorithm used later to built the final classifier. In the further numerical example the Bayes probability of correct classification will be used, namely Q(A) = P c(A) = max {pj gj (y; A)} dy. (5) Y j∈M
Without any loss of generality, let us consider a higher value of Q to indicate a better feature vector y. Then the feature extraction problem can be formulated as follows: for given priors (1), CPDFs (2) and reduced dimension m find the matrix A∗ for which Q(A∗ ) = max Q(A). A
(6)
1212
3
M. Kurzynski and E. Puchala
Optimization Procedure
In order to solve (6) first we must explicitly determine CPDFs (4). Let introduce the vector y¯ = (y, x1 , x2 , ..., xn−m )T and linear transformation y¯ = A¯ x,
(7)
where ⎡
⎤ A A¯ = ⎣ − − − ⎦ I | 0
(8)
is a square matrix n × n. For given y equation (7) has an unique solution given by Cramer formulas xk (y) =| A¯k (y) | · | A¯ |−1 ,
(9)
where A¯k (y) denotes matrix with k-th column replaced with vector y¯. Hence putting (9) into (2) and (4) we get CPDFs of y¯ ([3]): g¯j (¯ y ; A) = J −1 · fj (x1 (¯ y ), x2 (¯ y ), · · · , xn (¯ y )),
(10)
where J is a Jacobian of mapping (7). Integrating (10) over variables x1 , ..., xn−m we simply get gj (y; A) = ... g¯j (¯ y ; A) dx1 dx2 ... dxn−m . (11) X1
X2
Xn−m
Formula (11) allows one to determine class-conditional density functions for the vector of features y, describing the object in a new m-dimensional space. Substituting (11) into (5) one gets a criterion defining the probability of correct classification for the objects in space Y: Q(A) = P c(A) = max pj · ... J −1 × Y
j∈M
X1
X2
Xn−m
×fj (x1 (¯ y ), x2 (¯ y ), · · · , xn (¯ y )) dx1 dx2 ... dxn−m
=
max Y
j∈M
pj ·
...
X1
X2
Xn−m
dy =
J −1 ×
×fj (| A¯1 (y) | · | A¯ |−1 , · · · , | A¯n (y) | · | A¯ |−1 ) dx1 dx2 ... dxn−m
dy.
(12)
Thus, the solution of the feature extraction problem (6) requires that such matrix A∗ should be determined for which the Bayes probability of correct classification (12) is the maximum one.
The Optimal Feature Extraction Procedure
1213
Consequently, complex multiple integration and inversion operations must be performed on the multidimensional matrices in order to obtain optimal values of A. Although an analytical solution is possible (for low n and m values), it is complicated and time-consuming. Therefore it is proposed to use numerical procedures. For linear problem optimisation (which is the case here) classic numerical algorithms are very ineffective. In a search for a global extremum they have to be started (from different starting points) many times whereby the time needed to obtain an optimal solution is very long. Thus it is only natural to use the parallel processing methodology offered by genetic algorithms ([7]). A major difficulty here is the proper encoding of the problem. Chapter 4 presents a simple example of optimisation associated with the extraction of features, solved using the analytical method.
4
Numerical Example
Let consider two-class pattern recognition task with equal priors and reduction problem of feature space dimension from n = 2 to m = 1. Input feature vector is uniformly distributed and its CPDFs are as follows: f1 (x) = f2 (x) =
0.5 for 0 ≤ x1 ≤ 2 and 0 ≤ x2 ≤ x1 , 0 otherwise,
(13)
0.5 for 0 ≤ x1 ≤ 2 and x1 ≤ x2 ≤ 2, 0 otherwise.
(14)
Now feature extraction mapping (3) has now the form y = [a, 1] · [x1 , x2 ]T = a · x1 + x2
(15)
and problem is to find such a value a∗ which maximize criterion (12). Since Jacobian of (7) is equal to 1 hence from (9) and (10) for j = 1, 2 we get g¯j (¯ y , a) = fj (x1 , y − a · x1 ).
(16)
The results of integrating (16) over x1 , i.e. CPDFs (11) for a ≥ 1, −1 ≤ a ≤ 1 and a ≤ −1 are presented in Fig.1. a), b) and c), respectively. Finally, from (5) we easy get: ⎧ a+1 ⎨ 4a for a ≥| 1 |, P c(a) = (17) ⎩ a+1 for a ≤| 1 | . 4 The graph demonstrating the Bayes probability of misclassification Pe (a) = 1 − Pc (a) depending on parameter a of feature extraction mapping is depicted in Fig.2. The best result Pe (a∗ ) = 0 (or equivalently Pc (a∗ ) = 1) is obtained for a∗ = −1.
1214
M. Kurzynski and E. Puchala
a)
b)
c)
1/a+1
-1/a
1
0
2a
2a+2
2a
0
2a+2
2a
2a+2
0
Fig. 1. Illustration of example
0.5
0.4
0.3
0.2
0.1
–3
–2
–1
1
2
3
a
Fig. 2. Probability of misclassification
5
The Case of Recognition with Learning
It follows from the above considerations that an analytical and numerical solution of the optimisation problem is possible. But for this one must know the classconditional density functions and the a priori probabilities of the classes. In practice, such information is rarely available. All we know about the classification problem is usually contained in the so-called learning sequence: SL (x) = {(x(1) , j (1) ), (x(2) , j (2) ), ..., (x(L) , j (L) )}.
(18)
Formula (18) describes objects in space X . For the transformation to space Y one should use the relation: y (k) = A · x(k) ; k = 1, 2, ..., L
(19)
and then the learning sequence assumes the form: SL (y) = {(y (1) , j (1) ), (y (2) , j (2) ), ..., (y (L) , j (L) )}.
(20)
The elements of sequence SL (y) allow one to determine (in a standard way) the estimators of the a priori probabilities of classes pjL and class-conditional density functions fjL (x). Then the optimisation criterion assumes this form:
The Optimal Feature Extraction Procedure
QL (A) = P cL (A) =
max Y
j∈M
pjL ·
...
X1
X2
Xn−m
×fjL (x1 (¯ y ), x2 (¯ y ), · · · , xn (¯ y )) dx1 dx2 ... dxn−m
6
1215
J −1 ×
dy.
(21)
Conclusions
The feature extraction problem is fundamental for recognition and classification tasks since it leads to a reduction in the dimensionality of the feature space in which an object is described. This is highly important in situations when it is essential that the computer system should aid on-line decision taking. In most cases, an optimisation task must be formulated for the extraction problem. Because of the high computational complexity involved, numerical methods are used for this purpose. The authors propose to apply genetic algorithms in the considered case. Simulation studies justifying this approach and their results will be the subject of subsequent publications.
References 1. Devroye L., Gyorfi P., Lugossi G.: A Probabilistic Theory of Pattern Recognition, Springer Verlag, New York, 1996 2. Duda R., Hart P., Stork D.: Pattern Classification, Wiley-Interscience, New York, 2001 3. Golub G., Van Loan C.: Matrix Computations, Johns Hopkins University Press, 1996 4. Guyon I., Gunn S., Nikravesh M, Zadeh L.: Feature Extraction, Foundations and Applications, Springer Verlag, 2004 5. Park H., Park C., Pardalos P.: Comparitive Study of Linear and Nonlinear Feature Extraction Methods - Technical Report, Minneapolis, 2004 6. Fukunaga K.: Introduciton to Statistical Pattern Recognition, Academic Press, 1990. 7. Goldberg D.: Genetic Algorithms in Search, Optimization and Machine Learning. Adison-Wesley, New York, 1989
A New Approach for Human Identification Using Gait Recognition Murat Ekinci Computer Vision Lab, Department of Computer Engineering, Karadeniz Technical University, Trabzon, Turkey [email protected]
Abstract. Recognition of a person from gait is a biometric of increasing interest. This paper presents a new approach on silhouette representation to extract gait patterns for human recognition. Silhouette shape of a motion object is first represented by four 1-D signals which are the basic image features called the distance vectors. The distance vectors are differences between the bounding box and silhouette. Second, eigenspace transformation based on Principal Component Analysis is applied to time-varying distance vectors and the statistical distance based supervised pattern classification is then performed in the lower-dimensional eigenspace for recognition. A fusion task is finally executed to produce final decision. Experimental results on three databases show that the proposed method is an effective and efficient gait representation for human identification, and the proposed approach achieves highly competitive performance with respect to the published gait recognition approaches.
1
Introduction
Human identification from gait has been a recent focus in computer vision. It is a behavioral biometric source that can be acquired at a distance. Gait recognition aims to discriminate individuals by the way they walk and has the advantage of being non-invasive, hard to conceal, being readily captured without a walker’s attention, and is less likely to be obscured than other biometric features [1][2][3][6]. Gait recognition can be broadly divided into two groups, model-based and silhouette-based methods. Model-based methods [2][15] model the human body structure and extract image features to map them into structural components of models or to derive motion trajectories of body parts. The silhouette-based methods [6][7][9][1], characterizes body movement by the statistics of the patterns produced by walking. These patterns capture both the static and dynamic properties of body shape. In this paper an effective representation of silhouette for gait recognition is developed and statistical analysis is performed. Similar observations have been made in [7][9][1], but the idea presented here implicitly captures both structural (appearances) and transitional (dynamics) characteristics of gait. The silhouettebased method presented is basically to produce the distance vectors, which are four 1D signals extracted from projections to silhouette, they are top-, bottom-, M. Gavrilova et al. (Eds.): ICCSA 2006, LNCS 3982, pp. 1216–1225, 2006. c Springer-Verlag Berlin Heidelberg 2006
A New Approach for Human Identification Using Gait Recognition
1217
left-, and right-projections. As following main purpose, depending on four distance vectors, PCA based gait recognition algorithm is first performed. A statistical distance based similarity is then achieved to obtain similarity measures on training and testing data. A fusion task includes two strategies is executed to produce consolidation decision. Experimental results on three different databases demonstrate that the proposed algorithm has an encouraging recognition performance.
2
Silhouette Representation
To extract spatial silhouettes of walking figures, a background modeling algorithm and a simple correspondence procedure are first used to segment and track the moving silhouettes of a walking figure, more details are given in [5]. Once a silhouette generated, a bounding box is placed around silhouette. Silhouette across a motion sequence (a gait cycle) are automatically aligned by scaling and cropping based on the bounding box. The details on the gait cycle estimation used are given in reference [14]. Silhouette representation is based on the projections to silhouette which is generated from a sequence of binary silhouette images bs(t)= bs(x,y,t), indexed spatially by pixel location (x,y) and temporally by time t. There are four different image features called the distance vectors. They are top-, bottom-, left- and rightdistance vectors. The distance vectors are the differences between the bounding box and the outer contour of silhouette. An example silhouette and the distance vectors corresponding to four projections are shown in the middle of figure 1. The distance vectors are separately represented by four 1D signals. The size of 1D signals is equal to the height or to the width of the bounding box for leftand right-distance vectors or for top- and bottom-distance vectors, respectively. The values in the signals for both left- and right-projections are computed as the difference in the locations of the bounding box and left-most and right-most boundary pixels, respectively, in a given row. The other projections along a given column are also computed as the differences from the top of the bounding box to the top-most of silhouette for top-projection, from the bottom of the box to the bottom-most of silhouette pixels for bottom-projections, respectively. From a new 2D image F T (x, t) = y bs(x, y, t), where each column (indexed by time t) is the top-projections (row sum) of silhouette image bs(t), as shown in figure 1 top-left. Each value F T (x, t) is then a count of the number of the row pixels between the top side of the bounding box and the outer contours in that columns x of silhouette image bs(t). The result is a 2D pattern, formed by stacking top projections together to form a spatio-temporal pattern. A second pattern which represents the bottom-projection F B (x, t) = −y bs(x, y, t) can be constructed by stacking bottom projections, as shown in figure 1 bottomleft. The third pattern F L (y, t) = x bs(x, y, t) is then constructed by stacking with using the differences as column pixels from left side of the box to left-most boundary pixels of silhouette which are produced by the left projections, and the last pattern F R (y, t) = −x bs(x, y, t) is also finally constructed by stacking the right projections, as shown in figure 1 top-right and bottom-right 2D patterns,
1218
M. Ekinci
? ? Project Time -
-?? ?
x y
-
? -
Time -
?
Time -
666 Project
Time -
6 6
Fig. 1. Silhouette representation. (Middle) Silhouette and four projections, (Left) temporal plot of the distance vectors for top and bottom projections, (Right) temporal plot of the distance vectors for left and right projections.
respectively. The variation of each component of the each distance vectors can be regarded as gait signature of that object. From the temporal distance vector plots, it is clear that the distance vector is roughly periodic and gives the extent of movement of different part of the subject. The brighter a pixel in 2D patterns in figure 1, the larger value is the value of the distance vector in that position.
3
Training
The following processes on the four 1D signals produced from the distance vectors are to eliminate the influence of spatial scale and signal length of the distance vectors by scaling of these distance vector signals with respect to magnitude and size through the sizes of the bounding boxes. Eigenspace transformation based on Principal Component Analysis (PCA) is then applied to time varying distance vectors derived from a sequence of silhouette images to reduce the dimensionality of the input feature space. The training process similar to [1][4] is illustrated as follows: Given k class for training, and each class represents a sequence of the distance vector signals of a person. Multiple sequences of each subject can be added for
A New Approach for Human Identification Using Gait Recognition
1219
training, but a sequence includes one gait cycle was considered in the experiw ments. Let Vi,j be the jth distance vector signal in the ith class for w projection to silhouette and Ni the number of such distance vector signals in the ith class. The total number of training samples is Ntw = N1w + N2w + ... + Nkw , as the whole w w w w training set can be represented by [V1,1 , V w , .., V1,N , V2,1 , ..., Vk,N ]. The mean 1 k w 1,2 w mv and the global covariance matrix of w projection training set can easily be obtained by w k Ni 1 w mv = w Vw (1) Nt i=1 j=1 i,j w
w
k Ni 1 w w T = w (V w − mw v )(Vi,j − mv ) Nt i=1 j=1 i,j
(2)
Here each V w value is 1D signal and equal to, F w (.), the distance vectors for w projection (top-bottom-left-right) as explained in section 2. If the rank of matrix is N , then the N nonzero eigenvalues of , λ1 , λ2 , .., λN , and associated eigenvectors e1 , e2 , .., eN can be computed based on theory of singular value decomposition [4]. The first few eigenvectors correspond to large changes in training patterns, and higher-order eigenvectors represent smaller changes [1]. As a result, for computing efficiency in practical applications, those small eigenvalues and their corresponding eigenvectors are ignored. Then a transform matrix w w w T w = [ew 1 , e2 , .., es ] to project an original distance vector signal Vi,j into a point w Pi,j in the eigenspace is constructed by taking only s < N largest eigenvalues and their associated eigenvectors for each projections to silhouette. Therefore, s values are usually much smaller than the original data dimension N . Then the projection average Aw i of each training sequence in the eigenspace is calculated w by averaging of Pi,j as follows: w w T w Pi,j = [ew ... ew 1 e2 s ] Vi,j ,
4
Aw i =
Ni 1 Pw Ni j=1 i,j
(3)
Pattern Classification
Gait pattern recognition (classification) can be solved through measuring similarities between reference gait pattern and test samples in the parametric eigenspace. A simple statistical distance has been chosen to measure similarity, because the main interest here is to evaluate the genuine discriminatory ability of the extracted features in the proposed method. The accumulated distance between the associated centroids Aw (obtained in the process of training) and B w (obtained in the process of testing) can be easily computed by A1 − B1 2 Ap − Bp 2 dS (A, B) = ( ) + ... + ( ) (4) s1 sp
1220
M. Ekinci
Where (s1 , ..., sp ) are equal to corresponding the sizes of Ai and Bi . In the distance measure, the classification result for each projection is then accomplished by choosing the minimum of d. The classification process is carried out via the nearest neighbor (NN) classifier. The classification is performed by classifying in the test sequence and all training sequences by c = argi min di (B, Ai )
(5)
where B represents a test sequence, Ai represents the ith training sequence, d is the similarity measures described in above. The similarity results produced from each distance vectors are fused to increase the recognition performance. In this fusion task, two different strategies were developed. In the strategy 1, each projection is separately treated. Then the strategy is combining the distances of each projections at the end by assigning equal weight. As implementation, if any two of the similarities achieved based on four projections give maximum similarities for same individual, then the identification is appointed as positive. This fusion strategy has rapidly increased the recognition performance in the experiments. At the experiments, it has been seen that, some projection has given more robust results than others. For instance, while human moves in lateral view with respect to image plane, the back side of human can give more individual characteristics in gait. So, the projection corresponding to that side can give more reliable results. It is called dominant feature to this case. As second strategy, the strategy 2 has also been developed to further increase the recognition performance. In the strategy 2, if the projection selected as dominant feature or at least two projections of others give positive for an individual, then identification result given by the strategy 2 is appointed as positive. The dominant feature in this work is automatically assigned by estimating the direction of motion objects in tracking. At the next section, the dominant features determined by experimentally for different view points with respect to image plane are given.
5
Experimental Results
The performance of the proposed methods was evaluated on CMU’s MoBo database[13], NLPR gait database [1], and USF database [6]. The Viterbi algorithm was used to identify the test sequence, since it is efficient and can operate in the logarithmic domain using only additions [12]. The performance of the algorithm is evaluated on three different databases of varying of difficulty. CMU Database. This database has 25 subjects (23 males, 2 females) walking on a treadmill. Each subject is recorded performing four different types of walking: slow walk, fast walk, inclined walk, and slow walk holding ball. There are about 8 cycles in each sequence, and each sequences is recorded at 30 frames per second. It also contains six simultaneous motion sequence of 25 subjects, as shown in figure 2. One of the cycle in each sequence was used for testing, others for training. First, we did the following experiments on this database: 1) train on slow walk
A New Approach for Human Identification Using Gait Recognition
1221
Table 1. Classification performance on the CMU data set for viewpoint 1 All projections: equal Dominant: Right projection Test/Train Rank 1 Rank 2 Rank 3 Rank 4 Rank 1 Rank 2 Rank 3 Slow/Slow 72 100 100 100 84 100 100 Fast/Fast 76 100 100 100 92 100 100 Ball/Ball 84 100 100 100 84 100 100 Slow/Fast 36 92 100 100 52 100 100 Fast/Slow 20 60 100 100 32 88 100 Slow/Ball 8 17 33 58 42 96 100 Fast/Ball 4 13 33 67 17 50 88 Ball/Slow 8 17 38 67 33 88 100 Ball/Fast 13 29 58 92 29 63 100
and test on slow walk, 2) train on fast walk and test on fast walk, 3) train on walk carrying a ball and test on walk carrying a ball, 4) train on slow walk and test on fast walk, 5) train on slow walk and test on walk carrying a ball, 6) train on fast walk and test on slow walk, 7) train on fast walk and test on walk carrying a ball, 8) train on walk carrying a ball and test on slow walk, 9) train on walk carrying a ball and test on fast walk. The results obtained using the proposed method are summarized on the all cases 1)-9) in Table 1. It can be seen that the right person in the top two matches 100% of times for the cases where testing and training sets correspond to the same walk styles. When the strategy developed in the fusion as dominant feature (projections) is used, the recognition performance is increased, as seen in Table 1. For the case of training with fast walk and testing on slow walk, and vice versa, the dip in performance is caused due to the fact that for some individual as biometrics suggests, there is a considerable change in body dynamics and stride length as a person changes his speed. Nevertheless, the right person in the top three matches 100% of the times for that cases, and dominant projection strategy has also increased the recognition performance for Ranks 1 and 2. For the case of training with walk carrying ball and testing on slow and fast walks, and vice versa, encouraging results have also been produced by using the proposed method, and the dominant feature property has still increased the recognition performance, as given in Table 1.
View 1
View 2
View 3
View 4
View 5
Fig. 2. The six CMU database viewpoints
View 6
1222
M. Ekinci
Table 2. Classification performance on the CMU data set for all views. Eight gait cycles were used, seven cycles for training, one cycle for testing. All projections equal Dominant: Right projection View Test/Train Rank 1 Rank 2 Rank 3 Rank 1 Rank 2 Rank 3 Slow/Slow 76 100 100 84 100 100 4 Fast/Fast 84 100 100 96 100 100 Slow/Fast 12 44 80 24 64 100 Fast/Slow 20 64 100 32 76 100 Dominant: Left projection Slow/Slow 80 100 100 80 100 100 5 Fast/Fast 88 100 100 88 100 100 Slow/Fast 16 44 80 24 64 100 Fast/Slow 24 56 96 32 68 100 Dominant: Right projection Slow/Slow 80 100 100 88 100 100 3 Fast/Fast 72 100 100 76 100 100 Slow/Fast 20 64 100 28 76 100 Fast/Slow 24 56 92 28 68 100 Dominant: Right projection Slow/Slow 72 100 100 84 100 100 6 Fast/Fast 76 100 100 80 100 100 Slow/Fast 16 44 88 36 76 100 Fast/Slow 16 40 72 24 56 100
For the other view points, the experimental results are also summarized on the cases 1)-4) in Table 2. When the all experimental results for the different view points are considered, it can be seen that, the right person in the top two matches 100% and in the top four matches 100% of the times for the cases 1)-2) and for the cases 3)-4), respectively. It is also seen that, when the dominant feature is used, gait recognition performance is also increased. Some comparison results are also given in Table 3. The reason to show the cases 1)-6) only is become the page limitation in this paper, and that points are our lowest results on MoBo dataset for the comparisons to the other works in literature. NLPR Database. The NLPR database [1] includes 20 subjects and four sequences for each viewing angle per subject, two sequences for one direction of Table 3. Comparison of several algorithm on MoBo dataset Algorithms The method presented Kale et.al. [7] Collins et.al [11] The method presented Kale et.al. [7] Collins et.al.[11]
train/test Rank 1(%) Rank 2(%) Rank 3 (%) Rank 5 (%) slow/slow 84 100 100 100 slow/slow 72 80 85 97 slow/slow 86 100 100 100 fast/slow 52 100 100 100 fast/slow 56 62 75 82 fast/slow 76 Not given 92
A New Approach for Human Identification Using Gait Recognition
1223
Table 4. Performance on the NLPR data set for three views Walking Direction View Training Test Rank1 Rank2 Rank3 Exp. 1 Exp. 1 65 100 100 Lateral Exp. 1 Exp. 2 55 100 100 One Exp. 1 Exp. 1 60 100 100 Way Frontal Exp. 1 Exp. 2 35 100 100 Walking Exp. 1 Exp. 1 40 90 100 Oblique Exp. 1 Exp. 2 30 60 100 Exp. 1 Exp. 1 60 100 100 Lateral Exp. 1 Exp. 2 50 100 100 Reverse Exp. 1 Exp. 1 60 100 100 Way Frontal Exp. 1 Exp. 2 40 100 100 Walking Exp. 1 Exp. 1 45 100 100 Oblique Exp. 1 Exp. 2 35 75 100
walking, the other two sequences for reverse direction of walking. For instance, when the subject is walking laterally to the camera, the direction of walking is from right to left for two of four sequences, and from right to left for the remaining. Those all gait sequences were captured as twice (we called two experiments) on two different days in an outdoor environment. All subjects walk along a straight-line path at free cadences in three different views with respect to the image plane, as shown in figure 3, where the white line with arrow represents one direction path, the other walking path is reverse direction. We did the following experiments on this database: 1) train on one image sequence and test on the remainder, all sequences were produced from first experiment, 2) train on two sequences obtained from first experiment and test on two sequences obtained from second experiment. This is repeated for each viewing angle, and for each direction of walking. The results for the experiments along with cumulative match scores in three viewing angle are summarized in Table 4. When the experimental results are considered, the right person in the top two matches 100% of times for lateral and frontal viewing angles, and in the top three matches 100% of times for oblique view.
Lateral view
Oblique view Fig. 3. Some images in the NLPR database
Frontal view
1224
M. Ekinci
In the experiments on the NLPR database, the performance of the proposed algorithm was also compared with those of a few recent silhouette-based methods described in [11],[8],[16], and [1], respectively. To some extent, they reflect the latest and best work of these research groups in gait recognition. In [11], a method based on template matching of body silhouettes in key frames for human identification was established. The study in [8] described a moment-based representation of gait appearance for the purpose of person identification. A baseline algorithm was also proposed for human identification using spatio temporal correlation of silhouette images in [16]. The work in [1] computes the centroid of silhouette’s shape, and unwraps the outer counter to obtain a 1D distance signal, then applies principal component analysis for person identification. These methods were implemented using the same silhouette data from the NLPR database with lateral view by the study in [1], and the results given are as taken from tables in [1]. Table 5 lists the identification rates that have been reported by other algorithms and our algorithm. The proposed algorithm has successfully given the right person in top two matches 100% the times for the NLPR database. USF Database. Finally, the USF database [6] is considered. The database has variations as regards viewing direction, shoe type, and surface type. At the experiments, one of the cycle in each sequence was used for testing, others (3-4 cycles) for training. Different probe sequences for the experiments along with the cumulative match scores are given in Table 6 for the algorithm presented in this paper and three different algorithms [16][1][7]. The same silhouette data from USF were directly used. These data are noisy, e.g., missing of body parts, small Table 5. Comparison of Several algorithm on the NLPR Database (Lateral View) Methods Rank 1(%) Rank 2(%) Rank 3 (%) Rank 5 (%) Rank 10 (%) BenAbdelkader [10] 73 Not given 89 96 Collins [11] 71 Not given 79 88 Lee [8] 88 Not given 99 100 Phillips [16] 79 Not given 91 99 Wang [1] 75 Not given 98 100 The methods presented 65 100 100 100 100 Table 6. Performance on the USF database for four algorithm The method Baseline[16] NLPR[1] UMD[7] Exp. Rank 1 Rank 2 Rank 3 Rank 1 Rank 5 Rank 1 Rank 5 Rank 1 Rank 5 GAL[68] 35 80 100 79 96 70 92 91 100 GBR[44] 34 82 100 66 81 58 82 76 81 GBL[44] 25 55 91 56 76 51 70 65 76 CAL[68] 39 90 100 30 46 27 38 24 46 CAR[68] 30 66 100 29 61 34 64 25 61 CBL[41] 30 78 100 10 33 14 26 15 33 CBR[41] 29 66 100 24 55 21 45 29 39 GAR[68] 34 60 90 – -
A New Approach for Human Identification Using Gait Recognition
1225
holes inside the objects, severe shadow around feet, and missing and adding some parts around the border of silhouettes due to background characteristics. In Table 6, G and C indicate grass and concrete surfaces, A and B indicate shoe types, and L and R indicate left and right cameras, respectively. The number of subjects in each subset is also given in square bracket. It is observed that, the proposed method has given the right person in top three matches 100% of the times for training and testing sets corresponding to the same camera.
6
Conclusion
The method presented has given very close results to the existing works for Rank 1 on the databases tested, nevertheless it has almost given 100% accuracy for Ranks 2 and 3 on the all databases used. Nonlinear discriminant analysis will be developed as next study to achieve higher accuracy than the current for Rank 1.
References 1. L. Wang, T. Tan, H. Ning, W. Hu, Silhouette Analysis-Based Gait Recognition for Human Identification. IEEE Trans. on PAMI Vol.25, No. 12, Dec.,2003. 2. C. BenAbdelkader, R. G. Cutler, L. S. Davis, Gait Recognition Using Image SelfSimilarity. EURASIP Journal of Applied Signal Processing, April, 2004. 3. G. V. Veres,et.al, What image information is important in silhouette-based gait recognition? Proc. IEEE Conf. on Computer Vision and Pattern Recognition, 2004. 4. P. Huang, C. Harris, M. Nixon, Human Gait Recognition in Canonical Space Using Temporal Templates. IEE Proc. Vision Image and Signal Proc. Conf., 1999. 5. M. Ekinci, E. Gedikli, Background Estimation Based People Detection and Tracking for Video Surveillance. Springer LNCS 2869, ISCIS 2003, November, 2003. 6. S. Sarkar, et al The HumanID Gait Challenge Problem: Data Sets, Performance, and Analysis. IEEE Trans. on Pat. Anal. and Mach. Intell., Vol.27, No. 2, 2005. 7. A. Kale, et. al., Identification of Humans Using Gait. IEEE Trans. on Image Processing, Vol.13, No.9, September 2004. 8. L. Lee, W. Grimson, Gait Analysis for Recognition and Classification. Proc. IEEE, Int. Conference on Automatic Face and Gesture Recognition, pp. 155-162, 2002. 9. Yanxi Liu, R. T. Collins, T. Tsin, Gait Sequence Analysis using Frieze Patterns. Proc. of European Conf. on Computer Vision, 2002. 10. C. BenAbdelkader, et.al, Stride and Cadence as a Biometric in Automatic Person Identification and Verification. Proc. Int. Conf. Aut. Face and Gesture Recog.,2002. 11. R. Collins, R. Gross, and J. Shi, Silhouette-Based Human Identification from Body Shape and Gait. Proc. Int. Conf. Automatic Face and Gesture Recognition, 2002. 12. J. Phillips et.al, The FERET Evaluation Methodology for Face recognition Algorithm. IEEE Trans. Pattern Analysis and Machine Intell., vol.22, no.10, Oct.2000. 13. R. Gross, J. Shi, The CMU motion of body (MOBO) database. Tech. Rep. CMURI-TR-01-18, Robotics Institute, Carnegie Mellon University, June 2001. 14. M. Ekinci, E. Gedikli A Novel Approach on Silhouette Based Human Motion Analysis for Gait Recognition. ISVC 2005, LNCS 3804, pp.219-226, December 2005. 15. A. I. Bazin, M. S. Nixon, Gait Verification Using Probabilistic Methods. IEEE Workshop on Applications of Computer Vision, 2005. 16. P. Phillips, et.al., Baseline Results for Challenge Problem of Human ID using Gait Analysis. Proc. Int. Conf. Automatic Face and Gesture Recognition, 2002.
ERRATUM
A Security Requirement Management Database Based on ISO/IEC 15408 Shoichi Morimoto, Daisuke Horie, and Jingde Cheng Department of Information and Computer Sciences, Saitama University, Saitama, 338-8570, Japan {morimo, horie, cheng}@aise.ics.saitama-u.ac.jp
M. Gavrilova et al. (Eds.): ICCSA 2006, LNCS 3982, pp. 1–10, 2006. © Springer-Verlag Berlin Heidelberg 2006 ____________________________________________
DOI 10.1007/11751595_1 An error has been found in the above article 1
In the original version of this paper the affiliation was not correct. The correct affiliation of Shoichi Morinmoto, Daisuke Horie, and Jingde Cheng is : Department of Information and Computer Sciences, Saitama University, 338-8570, Japan.
_______________________________________________ The original online version for this chapter can be found at http://dx.doi.org/10.1007/11751595_1
_______________________________________________
Author Index
Abbas, Cl´ audia Jacy Barenco V-819 Abraham, Ajith IV-40 Adamidis, Panagiotis V-108 Adolf, David I-711 Ahn, Byung Jun II-77 Ahn, Dong-In III-251 Ahn, Heejune IV-603 Ahn, Jaehoon V-522 Ahn, Jung-Chul IV-370 Ahn, ManKi III-48 Ahn, Sang-Ho III-279 Ahn, Seongjin II-400, II-410, II-487, V-829, II-1169 Ahn, Sung-Jin II-982 Ahn, Sungsoo V-269 Ahn, Sungwoo II-175 Ahn, Taewook IV-388 Ahn, Yonghak V-1001 Ahn, Youngjin II-661 Akbar, Ali Hammad II-186, II-847 Alam, Muhammad Mahbub II-651 Albert´ı, Margarita I-721 Algredo-Badillo, Ignacio III-456 Ali, Hassan IV-217 Ali, Saqib IV-217 Allayear, Shaikh Muhammad II-641 Almendra, Daniel V-819 Al-Mutawah, Khalid I-586 Alvarez, Susana III-1073 Amarnadh, Narayanasetty I-1 Ambler, Anthony P. V-531 An, Kyounghwan II-155 An, Sunshin II-730 Anagun, A. Sermet III-11, III-678 Anan, Yoshiyuki II-40 Andersen, Anders Magnus IV-98 Angelides, Marios C. IV-118 Arce-Santana, Edgar R. V-412 Armer, Andrey I-974 Arteconi, Leonardo I-694 Badea, Bogdan I-1166 Bae, Hae-Young I-914, IV-1126 Bae, Hyo-Jung I-151
Bae, Joonsoo II-379 Bae, Suk-Tae II-309 Bae, Yong-Geun IV-828 Bae, Youngchul III-244 Baek, Jun-Geol V-839 Baek, Myung-Sun V-752 Bagherpour, Morteza III-546 Bahn, Hyokyung I-1072 Bai, Zhang I-885 Baik, Heung Ki V-236 Baixauli, J. Samuel III-1073 Bala, Piotr V-394 Balas, Lale I-547 Balci, Birim I-373 Ban, Chaehoon II-175 Bang, Hyungbin II-319 Bang, Young-Cheol III-1090, III-1129 Bardhan, Debabrata I-10 Bartolotta, Antonino I-821 Bashir, Ali Kashif II-186 Basu, Kalyan I-566 Bawa, Rajesh Kumar I-1177 Bellaachia, Abdelghani V-346 Bentz, C´edric III-738 Berbegall, Vicente V-192 Berkov, Dmitri V-129 Bertoni, Guido III-1004 Biscarri, F´elix V-725 Biscarri, Jes´ us V-725 Blibech, Kaouthar III-395 Boada, Imma I-364 Bohli, Jens-Matthias III-355 Bolze, Raphael V-202 Bories, Benoˆıt I-744 Bravo, Maricela IV-169 Brennan, John K. V-743 Breveglieri, Luca III-1004 Brzezi´ nski, Jerzy IV-1166, V-98 Buiati, Fabio V-819 Burns, John I-612 Byun, Doyoung V-537 Byun, Sang-Seon II-1189 Byun, Sang-Yong III-84 Byun, Sung-Wook I-232
1228
Author Index
Byun, Byun, Byun, Byun,
Tae-Young III-134 Young Hwan V-457 Yung-Cheol V-185 Yung-Hwan V-512, V-932
Caballero-Gil, Pino I-577, III-1035 Caballero, Ismael III-984 C´ aceres, Santos II-18 Cai, Guoyin IV-1090 Calderon, Alejandro IV-1136 Calero, Coral III-984 Camahort, Emilio I-510 Camara, Jos´e Sierra V-798 Campos-Delgado, Daniel U. V-412 Cao, Wenming V-375 Capacho, Liliana III-554 Cappelletti, David I-721 Carballeira, F´elix Garc´ıa V-108 Carlone, Pierpaolo I-794 Caro, Angelica III-984 Caron, Eddy V-202 Carretero, Jesus IV-1136 Castro, Mildrey Carbonell V-798 Cattani, Carlo I-785, I-828, I-857 Cha, Byung-Rae II-1090 Cha, Eui-Young I-1110 Cha, Guang-Ho I-344 Cha, Jae-Sang V-312 Cha, JeongHee V-432 Chae, Kijoon I-1072, IV-440 Chae, Oksam V-1001 Chae, Young Seok II-760 Challiol, Cecilia IV-148 Chan, Yung-Kuan V-384 Chan, Yuen-Yan I-383, III-309, III-365, III-507, IV-406 Chang, Chung-Hsien I-171 Chang, Hangbae IV-255, IV-707 Chang, Hoon IV-577 Chang, Hoon V-1010 Chang, Hsi-Cheng V-158 Chang, Kuo-Hwa III-944 Chang, Ok-Bae III-188, III-222, IV-893, IV-955, V-644 Chang, Soo Ho II-451 Chang, Sujeong II-77 Chang, Yu-Hern III-649 Chaudhry, Shafique Ahmad II-847 Chaudhuri, Chitrita II-1 Chen, Chiou-Nan IV-1107
Chen, Chun V-39 Chen, Gencai V-39 Chen, Huifen III-944 Chen, Kai-Hung V-384 Chen, Kaiyun IV-756 Chen, Ken I-307 Chen, Lei II-1149 Chen, Ling V-39 Chen, Tzu-Yi III-1081 Chen, Yen Hung III-631 Cheng, Jingde III-1 Cheng, Yu-Ming I-171, I-181 Cheon, Sahng Hoon III-718 Cheon, SeongKwon III-73 Cheun, Du Wan II-451 Cheung, Yen I-586 Chi, Sang Hoon IV-58 Chih, Wen-Hai III-668 Chlebiej, Michal V-394 Cho, Cheol-Hyung I-101 Cho, Daerae IV-787 Cho, Dongyoung IV-491 Cho, Eun Sook II-1003, IV-985 Cho, Haengrae V-214 Cho, Ik-hwan I-326 Cho, Jae-Hyun I-1110 Cho, Jong-Rae III-832, III-994 Cho, Juphil V-236 Cho, KumWon V-522 Cho, Kwang Moon IV-1003 Cho, Mi-Gyung I-904 Cho, Minju II-760 Cho, Nam-deok V-546 Cho, Sang-Hun II-288 Cho, Sok-Pal II-1082 Cho, Sung-eon V-600 Cho, Tae Ho IV-58 Cho, Yongyun IV-30 Cho, Yookun II-701, IV-499, IV-549 Cho, Youngsong I-111 Cho, You-Ze II-631 Choi, Bong-Joon V-912 Choi, Byung-Cheon III-785 Choi, Byungdo IV-808 Choi, Byung-Sun II-945 Choi, Chang IV-567 Choi, Changyeol II-562, IV-1156 Choi, Deokjai IV-128 Choi, Eun Young IV-316 Choi, Ho-Jin II-796
Author Index Choi, Hwangkyu II-562, IV-1156 Choi, Hyung-Il V-441 Choi, Hyun-Seon III-728 Choi, Jaeyoung IV-11, IV-30 Choi, Jonghyoun II-525, II-895 Choi, Jongmyung II-1033 Choi, Jong-Ryeol IV-893 Choi, Junho IV-567 Choi, Junkyun V-829 Choi, Kuiwon I-335 Choi, Kyung Cheol IV-659 Choi, Misook II-49, IV-966 Choi, Sang-soo V-618 Choi, Sang-Yule V-312, V-322, V-355 Choi, Seongman III-222, IV-955, V-644, V-675 Choi, Su-il II-77 Choi, Sung-Hee IV-937 Choi, Tae-Young I-307, I-317 Choi, Wan-Kyoo IV-828 Choi, Wonjoon IV-279 Choi, Yeon Sung I-993 Choi, Yong-Rak IV-432 Choi, Yun Jeong II-298 Chon, Jaechoon I-261, III-1172 Chon, Sungmi II-28 Choo, Hyunseung II-165, II-288, II-534, II-661, II-710, II-856, II-923, II-934, II-1121, III-1090, III-1129 Choo, MoonWon IV-787 Chun, Junchul I-410 Chung, Chin Hyun I-929, I-964 Chung, Chun-Jen III-862 Chung, Ha Joong IV-549 Chung, Hyoung-Seog V-491 Chung, Il-Yong IV-828 Chung, Jinwook II-487, II-982 Chung, Kyoil III-375, IV-584, V-251 Chung, Min Young II-77, II-288, II-534, II-856, II-934, II-1121 Chung, Mokdong IV-1042 Chung, Shu-Hsing III-610 Chung, TaeChoong II-390 Chung, Tae-sun I-1019 Chung, Tai-Myoung II-135, II-239, III-486, V-626, V-655 Chung, YoonJung III-54, IV-777 Chung, Younky III-198, III-234 Ciancio, Armando I-828 Ciancio, Vincenzo I-821
1229
Clifford, Rapha¨el III-1137 Cocho, Pedro III-964 Cokuslu, Deniz II-681 Coll, Narcis I-81 Cong, Jin I-921 Cools, Ronald V-780 Cordero, Rogelio Lim´ on IV-726 Costantini, Alessandro I-738 Crane, Martin I-612 Cruz, Laura II-18 Culley, Steve J. II-279 Cumplido, Ren´e III-456 Czekster, Ricardo M. I-202 Daefler, Simon I-566 Dagdeviren, Orhan II-681 D’Anjou, Alicia III-1143 Darlington, Mansur J. II-279 Das, Gautam K. II-750 Das, Sajal I-566 Das, Sandip I-10, II-750 David, Gabriel IV-78 De Crist´ ofolo, Valeria IV-148 de Deus, Flavio E. V-808 de Doncker, Elise V-789 de Oliveira, Robson V-819 de Frutos-Escrig, David IV-158 de Ipi˜ na, Diego L´ opez IV-108 de Sousa, Rafael V-819 Deineko, Vladimir III-793 Demirkol, Askin V-365 den Hertog, Dick III-812 Deo, Puspita I-622 Derevyankin, Valery I-974 Desprez, Frederic V-202 D´evai, Frank I-131 Diaz, Olivia Graciela Fragoso IV-50 Doallo, Ram´ on I-701 Dogdu, Erdogan IV-88 Drummond, L.A. V-192 Duan, Guolin V-450 Duan, Yucong IV-746 Dur´ an, Alfonso III-964 Ekinci, Murat III-1216 Eksioglu, Burak III-748 Ek¸sio˘ glu, Sandra Duni III-708 Eom, Jung-Ho II-239 Eom, Young Ik I-1028 Erciyes, Kayhan II-681
1230
Author Index
Esqu´ıvel, Manuel L. III-841 Eun, He-Jue V-990 Evangelisti, Stefano I-744 Fang, Zhijun II-964 F¨ arber, Gerrit III-638 Farsaci, Francesco I-821 Farzanyar, Zahra I-1100 Fathy, Mahmood V-118 Fei, Chai IV-179 Feixas, Miquel I-449 Feng, Dan I-1045 Feregrino-Uribe, Claudia III-456 Ferey, Nicolas I-222 Fernandez, Javier IV-1136 Fern´ andez, Marcel III-527 Fern´ andez, Marcos I-490 Fernandez, Reinaldo Togores I-30 Fern´ andez-Medina, Eduardo III-1013, III-1024, III-1044 Fey, Dietmar V-129 Filomia, Federico I-731 Fiore, Ugo III-537 Fleissner, Sebastian I-383, IV-406 Forn´e, Jordi IV-1098 Fort, Marta I-81 Frausto, Juan IV-169 Frick, Alexander I-847 Fu, Xiaolan IV-746 F´ uster-Sabater, Amparo I-577, III-1035 Gabillon, Alban III-395 Galindo, David III-318 Gallego, Guillermo III-822 Gao, Yunjun V-39 Garcia, Felix IV-1136 Garcia, Jose Daniel IV-1136 Garc´ıa, L.M. S´ anchez V-108 Garc´ıa-Sebastian, M. Teresa III-1143 Gattiker, James R. III-1153 Gatton, Thomas M. III-244, IV-947, V-665, V-675 Gaudiot, Jean-Luc IV-622 Gavrilova, Marina L. I-61, I-431 Ge, He III-327 Gerardo, Bobby D. III-144, IV-899, V-867 Gervasi, Osvaldo I-212, I-665 Gherbi, Rachid I-222 Ghosh, Preetam I-566
Ghosh, Samik I-566 Gil, Joon-Min II-1169 Go, Sung-Hyun V-867 Goh, John I-1090 Goh, Sunbok II-204 Goi, Bok-Min IV-424 Gonz´ alez, Ana I. III-1143 Gonzalez, C´esar Otero I-30 Gonz´ alez, J.J. V-772 Gonz´ alez, Juan G. II-18 Gonz´ alez, Luis I-633 Gonz´ alez, Patricia I-701 Gordillo, Silvia IV-148 G´ orriz, J.M. V-772 Gra˜ na, Manuel III-1143 Gros, Pierre Emmanuel I-222 Gu, Boncheol IV-499, IV-549 Gu, Huaxi V-149 Gu, Yuqing IV-746 Guillen, Mario II-18 Guo, Jiang III-974 Guo, Jianping IV-1090 Guo, Weiliang I-938 Guti´errez, Miguel III-964 Ha, Jong-Eun III-1163 Ha, Jongsung II-49 Ha, Sung Ho III-1110 Hahn, GeneBeck II-769 Hamid, Md.Abdul II-866 Han, Chang-Hyo III-832 Han, DoHyung IV-594 Han, Dong-Guk III-375 Han, Gun Heui V-331 Han, Hyuksoo IV-1081 Han, Jizhong I-1010 Han, Jong-Wook IV-360 Han, Joohyun IV-30 Han, JungHyun I-1028 Han, Jungkyu IV-549 Han, Ki-Joon II-259 Han, Kijun II-1159 Han, Kunhee V-584 Han, Long-zhe I-1019 Han, Sang Yong IV-40 Han, Seakjae V-682 Han, SeungJae II-359 Han, Sunyoung II-601 Han, Youngshin V-260 Harbusch, Klaus I-857
Author Index Hashemi, Sattar I-1100 Hawes, Cathy I-644 He, S. III-934 Helal, Wissam I-744 Heng, Swee-Huay III-416 Heo, Joon II-989, II-1066 Heo, Junyoung II-701, IV-499, IV-549 H´erisson, Joan I-222 Herrero, Jos´e R. V-762 Higdon, David III-1153 Hinarejos, M. Francisca IV-1098 Hoesch, Georg V-202 Hoffmann, Aswin L. III-812 Hong, Bonghee II-155, II-175 Hong, Choong Seon II-651, II-866 Hong, Dong-Suk II-259 Hong, Dowon IV-584 Hong, Gye Hang III-1110 Hong, Jiman II-701, IV-499, IV-558, IV-603 Hong, John-Hee III-832 Hong, Kwang-Seok I-354 Hong, Maria II-400 Hong, In-Hwa IV-245 Hong, Seokhie III-446 Hong, Soonjwa III-385 Hong, Suk-Kyo II-847 Hong, Sukwon I-1019 Hong, Sung-Je I-151 Hong, Sung-Pil III-785 Hong, WonGi IV-577 Hong, Youn-Sik II-249 Horie, Daisuke III-1 Hosaka, Ryosuke I-596 Hsieh, Shu-Ming V-422 Hsu, Chiun-Chieh V-158, V-422 Hsu, Li-Fu V-422 Hu, Qingwu IV-746 Hu, Yincui IV-1090 Huang, Changqin V-243 Huang, Chun-Ying III-610 Huang, Wei I-518 Huh, Euinam II 390, II-515, II-827, II-905, V-717 Huh, Woong II-224 Huh, Woonghee Tim III-822 Hur, Tai-Sung II-224 Hwang, An Kyu II-788 Hwang, Chong-Sun II-369, II-816 Hwang, Ha-Jin V-1018
1231
Hwang, Hyun-Suk III-115, III-125, V-895 Hwang, InYong II-1140 Hwang, Jin-Bum IV-360 Hwang, Jun II-760 Hwang, Ken III-668 Hwang, Soyeon IV-344 Hwang, Suk-Hyung IV-767, IV-937 Hwang, Sungho III-134 Hwang, Sun-Myung IV-909 Hwang, Tae Jin V-236 Ikeguchi, Tohru I-596 Im, Chaeseok I-1000 Im, Eul Gyu III-54, IV-777 Im, SeokJin II-369 Im, Sungbin II-806 ˙ Inan, Asu I-547 Inceoglu, Mustafa Murat I-373 Iordache, Dan I-804 Isaac, Jes´ us T´ellez V-798 Isaila, Florin D. V-108 Ishii, Naohiro II-40 Iwata, Kazunori II-40 Jang, Byung-Jun V-752 Jang, Hyo-Jong II-106 Jang, Injoo III-206 Jang, Jun Yeong I-964 Jang, Kil-Woong II-671 Jang, Moonsuk II-204 Jang, Sangdong IV-1116 Jang, Taeuk II-760 Jang, Yong-Il IV-1126 Je, Sung-Kwan I-1110 Jeon, Hoseong II-934 Jeon, Hyung Joon II-974, II-1009 Jeon, Hyung-Su III-188 Jeon, Jongwoo III-718 Jeon, Kwon-Su V-932 Jeon, Segil V-522 Jeon, Sung-Eok III-134 Jeong, Byeong-Soo II-505, II-796 Jeong, Chang-Sung I-232, II-462 Jeong, Chang-Won IV-853 Jeong, Chulho II-430 Jeong, Dong-Hoon II-996 Jeong, Dongseok I-326 Jeong, Gu-Beom IV-1032 Jeong, Hye-Jin V-675
1232
Author Index
Jeong, Hyo Sook V-609 Jeong, In-Jae III-698 Jeong, Jong-Geun II-1090 Jeong, Karpjoo V-522 Jeong, Kugsang IV-128 Jeong, KwangChul II-923 Jeong, Sa-Kyun IV-893 Jeong, Su-Hwan V-895 Jeong, Taikyeong T. I-993, V-531 Jhang, Seong Tae IV-631 Jhon, Chu Shik IV-631 Ji, JunFeng I-420 Ji, Yong Gu IV-697 Ji, Young Mu V-457 Jiang, Chaojun I-938 Jiang, Di I-50 Jiang, Gangyi I-307, I-317 Jiang, Yan I-921 Jianping, Li I-885 Jin, DongXue III-73 Jin, Hai IV-529 Jin, Honggee IV-687 Jin, Mingzhou III-708, III-748 Jo, Geun-Sik II-779 Jo, Jeong Woo II-480 Jodlbauer, Herbert V-88 Johnstone, John K. I-500 Joo, Su-Chong III-251, IV-798, IV-853, IV-899 Joye, Marc III-338 Juang, Wen-Shenq IV-396 Ju, Hyunho V-522 Ju, Minseong IV-271 Jun, Jin V-839 Jung, Cheol IV-687 Jung, Eun-Sun IV-416 Jung, Hyedong II-691 Jung, Hye-Jung IV-1052 Jung, Inbum II-562, IV-1156 Jung, Jae-Yoon II-379, V-942 Jung, JaeYoun III-64 Jung, Jiwon II-155 Jung, Won-Do II-186 Jung, Kwang Hoon I-929 Jung, Kyeong-Hoon IV-448 Jung, Kyung-Hoon III-115 Jung, Myoung Hee II-77 Jung, SangJoon III-93, III-234, IV-1022 Jung, Seung-Hwan II-462 Jung, Se-Won II-837
Jung, Won-Do II-186 Jung, Won-Tae IV-1052 Jung, Youngsuk IV-1022 Jwa, JeongWoo IV-594 Kabara, Joseph V-808 Kangavari, Mohammadreza I-1100 Kang, Dazhou II-1179 Kang, Dong-Joong II-309, III-1163 Kang, Dong-Wook IV-448 Kang, Euisun II-400 Kang, Euiyoung IV-558 Kang, Eun-Kwan IV-947 Kang, Heau-jo V-690 Kang, Hong-Koo II-259 Kang, Hyungwoo III-385 Kang, Jeonil IV-380 Kang, Jinsuk I-993 Kang, Maing-Kyu III-898 Kang, Mikyung IV-558 Kang, Mingyun V-575 Kang, Namhi III-497 Kang, Oh-Hyung III-287, IV-1060 Kang, Sanggil I-1127 Kang, Sang-Won II-369 Kang, Sangwook II-730 Kang, Seo-Il IV-326 Kang, Seoungpil II-1066 Kang, Sin Kuk III-1200 Kang, Suk-Ho IV-787, V-942 Kang, Sukhoon II-1060, IV-271, IV-432 Kang, Wanmo III-777, III-822 Kang, Yunjeong V-665 Karsak, E. Ertugrul III-918 Kasprzak, Andrzej III-1100, III-1119 Katsionis, George I-251 Kaugars, Karlis V-789 Keil, J. Mark I-121 K´ep`es, Fran¸cois I-222 Kettner, Lutz I-60 Key, Jaehong I-335 Khader, Dalia III-298 Khonsari, Ahmad V-118 Khoo, Khoongming III-416 Kim, Backhyun IV-68, IV-1146 Kim, Bonghan V-851 Kim, Bong-Je V-895 Kim, Byeongchang III-21 Kim, Byung Chul II-788
Author Index Kim, Byunggi II-319, II-330, II-740, II-1033 Kim, Byung-Guk II-996 Kim, Byung-Ryong III-476 Kim, Byung-Soon II-671 Kim, Chang J. V-932 Kim, Changmin III-261, IV-787 Kim, Chang Ouk V-839 Kim, Chang-Soo III-115, III-125, V-895 Kim, Cheol Min I-278, I-288, IV-558 Kim, Chonggun III-64, III-73, III-93, III-234, IV-808, IV-818, IV-1022 Kim, Chulgoon V-522 Kim, Chul Jin II-1003, IV-985 Kim, Chul Soo V-185 Kim, Dai-Youn III-38 Kim, Deok-Soo I-101, I-111, I-440 Kim, DongKook II-340, II-349 Kim, Dong-Oh II-259 Kim, Dong-Seok IV-853 Kim, Dongsoo IV-687 Kim, Donguk I-101, I-111, I-440 Kim, Duckki II-195 Kim, Duk Hun II-856 Kim, Eung Soo III-31 Kim, Eunhoe IV-11, IV-30 Kim, Eun Mi III-1190, IV-893 Kim, Eun Yi III-1200 Kim, Gil-Han V-284 Kim, Gui-Jung IV-835 Kim, Guk-Boh IV-1032 Kim, Gukboh II-214 Kim, Gu Su I-1028 Kim, Gwanghoon IV-344 Kim, GyeYoung II-106, V-432, V-441 Kim, Gyoung Bae I-914 Kim, Hae Geun III-104 Kim, Haeng-Kon III-84, III-163, III-198, IV-844, IV-927, IV-976 Kim, Haeng Kon IV-873 Kim, Hak-Jin III-928 Kim, HanIl IV-558, IV-567, IV-594 Kim, Hee Taek I-914 Kim, Hong-Gee IV-937 Kim, Hong-Jin II-1082 Kim, Hong Sok V-1010 Kim, Hong-Yeon I-1053 Kim, Ho-Seok I-914, IV-1126 Kim, Ho Won III-375 Kim, Howon IV-584, V-251
Kim, Hye-Jin I-955 Kim, Hye Sun I-288 Kim, HyoJin II-359 Kim, Hyongsuk III-1172 Kim, Hyun IV-466 Kim, Hyuncheol V-829 Kim, Hyung-Jun IV-483 Kim, Hyunsoo III-852 Kim, Iksoo IV-68, IV-1146 Kim, IL V-912 Kim, Ildo II-87 Kim, InJung III-54, IV-777 Kim, In Kee V-1 Kim, Intae IV-21 Kim, Jaehyoun II-934 Kim, Jae-Soo II-572 Kim, Jae-Yearn III-590 Kim, Jaihie II-96 Kim, Jee-In I-983 Kim, Je-Min II-1219 Kim, Jeong Hyun II-996, II-1066 Kim, Jin-Geol IV-288 Kim, Jin Ok I-929, I-964 Kim, Jin Suk II-480 Kim, Jin-Sung V-968, V-979 Kim, Jin Won IV-499, IV-509 Kim, John II-114 Kim, Jong-Hwa V-503 Kim, Jongik II-552 Kim, Jongsung III-446 Kim, Jongwan II-369 Kim, June I-1028, I-1053 Kim, Jungduk IV-255, IV-707 Kim, Jung-Sun V-922 Kim, Junguk II-760 Kim, Kap Hwan III-564 Kim, Kibom III-385 Kim, Ki-Chang III-476 Kim, Ki-Doo IV-448 Kim, Ki-Hyung II-186, II-847 Kim, Ki-Uk V-895 Kim, Ki-Young IV-612 Kim, Kuinam J. II-1025 Kim, Kwang-Baek I-1110, III-172, III-279, V-887 Kim, Kwangsoo IV-466 Kim, Kwanjoong II-319, II-1033 Kim, Kyujung II-28 Kim, Kyung-Kyu IV-255 Kim, Kyung Tae IV-519
1233
1234
Author Index
Kim, LaeYoung II-1131 Kim, Min Chan IV-669 Kim, Min-Ji V-932 Kim, Minsoo III-154, IV-697, V-269, V-922 Kim, Minsu III-134 Kim, Min Sung III-31 Kim, Misun II-420, III-154 Kim, Miyoung II-885 Kim, MoonHae V-522 Kim, MoonJoon IV-577 Kim, Moonseong II-710, III-1054, III-1090, III-1129, V-626 Kim, Myeng-Ki IV-937 Kim, Myoung-Joon I-1053 Kim, Myoung-sub V-700 Kim, Myung Keun I-914 Kim, Nam-Gyun I-241 Kim, Pankoo IV-567 Kim, Sangbok II-515 Kim, Sangho V-491 Kim, Sang-Il III-728 Kim, Sangjin IV-388 Kim, Sangki II-87 Kim, Sangkuk II-11 Kim, Sangkyun IV-639, IV-716 Kim, Seki III-1054 Kim, Seoksoo II-1060, IV-271, V-565, V-575, V-584, V-591, V-700 Kim, Seok-Yoon IV-612 Kim, Seong Baeg I-278, I-288, IV-558 Kim, Seungjoo II-954, V-858 Kim, Seung Man I-480 Kim, Seung-Yong IV-612 Kim, Sijung V-851 Kim, SinKyu II-769 Kim, Soo Dong II-451, IV-736 Kim, Soo Hyung IV-128 Kim, Soon-gohn V-690 Kim, Soon-Ho III-172 Kim, So-yeon V-618 Kim, Su-Nam IV-448 Kim, Sungchan I-459 Kim, Sung Jin V-609 Kim, Sung Jo IV-669 Kim, Sung Ki II-876 Kim, Sung-Ryul III-1137 Kim, Sung-Shick III-928 Kim, SungSoo I-904 Kim, Sungsuk IV-567
Kim, Tae-Kyung II-135 Kim, Taeseok I-1062 Kim, Tai-hoon V-700 Kim, Ung Mo II-165, IV-456 Kim, Ungmo I-1028, V-139 Kim, Won II-106 Kim, Woo-Jae II-720 Kim, Wu Woan IV-1116 Kim, Yang-Woo II-905 Kim, Yeong-Deok IV-271 Kim, Yong-Hwa V-968 Kim, Yong-Min II-340, II-349 Kim, Yongsik IV-687 Kim, Yong-Sung V-958, V-968, V-979, V-990 Kim, Yong-Yook I-241 Kim, Yoon II-562, IV-1156 Kim, Young Beom II-515, II-827 Kim, Youngbong IV-226 Kim, Youngchul II-319, II-1033 Kim, Younghan III-497 Kim, Younghyun II-611 Kim, Young-Kyun I-1053 Kim, Youngrag III-64 Kim, Young Shin V-457 Kim, Youngsoo II-545 Kim, Yunkuk II-730 Knauer, Christian I-20 Ko, Eung Nam IV-475 Ko, Hyuk Jin II-165 Ko, Il Seok V-331, V-338 Ko, Kwangsun I-1028 Ko, Kyong-Cheol IV-1060 Kobusi´ nska, Anna IV-1166 Koh, Byoung-Soo IV-236, IV-245 Koh, Kern I-1062 Koh, Yunji I-1062 Kohout, Josef I-71 Kolingerov´ a, Ivana I-71 Kong, Jung-Shik IV-288 Koo, Jahwan II-487 Kosowski, Adrian I-141, I-161 Koszalka, Leszek V-58 Koutsonikola, Vassiliki A. II-1229 Kozhevnikov, Victor I-974 Krasheninnikova, Natalia I-974 Krasheninnikov, Victor I-974 Kreveld, Marc van I-20 Krusche, Peter V-165 Ku, Chih-Wei II-1210
Author Index Ku, Hyunchul II-827 Kurzynski, Marek III-1210 Kwak, Jae-min V-600 Kwak, Jin II-954 Kwak, Jong Min V-338 Kwak, Jong Wook IV-631 Kwak, Keun-Chang I-955 Kwon, Dong-Hee II-720 Kwon, Dong-Hyuck III-38 Kwon, Gihwon IV-1081, V-905 Kwon, Jang-Woo II-309, V-887 Kwon, Jungkyu IV-1042 Kwon, Oh-Cheon II-552 Kwon, Oh-Heum IV-306 Kwon, Seungwoo III-928 Kwon, Soo-Tae III-767 Kwon, Taekyoung II-769, II-915 Kwon, Tae-Kyu I-241 Kwon, Yoon-Jung V-503 Lagan` a, Antonio I-212, I-665, I-675, I-694, I-721, I-738, I-757 Lago, Noelia Faginas I-731 Lai, Jun IV-179 Lai, Kin Keung I-518 Lan, Joung-Liang IV-1107 Laskari, E.C. V-635 Lawrence, Earl III-1153 Lazar, Bogdan I-779 Lazzareschi, Michael III-1081 Le, D. Xuan IV-207 Lee, Amy Hsin-I III-610 Lee, Bo-Hee IV-288 Lee, Bongkyu IV-549, V-185 Lee, Byung-kwan III-38, III-172, III-261 Lee, Byung-Wook I-946, II-495 Lee, Chae-Woo II-837 Lee, Changhee I-440 Lee, Changhoon III-446 Lee, Changjin V-537 Lee, Chang-Mog IV-1012 Lee, Chang-Woo IV-1060 Lee, Chien-I II-1210 Lee, Chilgee V-260 Lee, Chulsoo IV-777 Lee, Chulung III-928 Lee, Chung-Sub IV-798 Lee, Dan IV-994 Lee, Deok-Gyu IV-326, IV-370
1235
Lee, Deokgyu IV-344 Lee, Dong Chun II-1017, II-1051, II-1082 Lee, Dong-Ho III-728 Lee, Dong Hoon III-385, IV-316 Lee, DongWoo IV-197, IV-491 Lee, Dong-Young II-135, III-486, V-626, V-655 Lee, SungYoung II-390 Lee, Eun Ser IV-1070, V-546, V-555 Lee, Eung Ju IV-187 Lee, Eunseok II-430, II-621, V-49 Lee, Gang-soo V-618 Lee, Gary Geunbae III-21 Lee, Geon-Yeob IV-853 Lee, Geuk II-1060 Lee, Gigan V-952 Lee, Gueesang IV-128 Lee, Gun Ho IV-659 Lee, Hanku V-522 Lee, Ha-Yong IV-767 Lee, Hong Joo IV-639, IV-716 Lee, HoonJae III-48, III-269 Lee, Hosin IV-255 Lee, Ho Woo III-718 Lee, Hyewon K. II-214 Lee, Hyobin II-96 Lee, Hyun Chan I-111 Lee, Hyung Su II-691, IV-519 Lee, Hyung-Woo V-284, V-294 Lee, Ig-hoon I-1036 Lee, Im-Yeong IV-326, IV-370 Lee, Inbok III-1137 Lee, Jaedeuk V-675 Lee, Jae Don I-1000 Lee, Jae-Dong IV-1126 Lee, Jae-Kwang II-945 Lee, Jae-Seung II-945 Lee, Jaewan III-144, III-178, IV-899, V-867 Lee, Jae Woo V-457, V-512, V-932 Lee, Jaewook II-487 Lee, Jae Yeol IV-466 Lee, Jaeyeon I-955 Lee, Jae Yong II-788 Lee, Jangho I-983 Lee, Jang Hyun II-1199 Lee, Jeong Hun III-600 Lee, Jeonghyun IV-21 Lee, JeongMin IV-577
1236
Author Index
Lee, Ji-Hyun III-287, IV-994 Lee, Jin Ho III-875 Lee, Joahyoung II-562, IV-1156 Lee, Jongchan II-1033 Lee, Jong Gu III-1190 Lee, Jong Sik V-1 Lee, Jong-Sool III-564 Lee, Jong-Sub III-898 Lee, Jongsuk II-49 Lee, Jungho I-326 Lee, Junghoon IV-558, V-185 Lee, Jungsuk V-269 Lee, Junsoo V-175 Lee, Kang-Hyuk II-309 Lee, Kang-Woo IV-466 Lee, Kang-Yoon II-827 Lee, Keun-Ho II-816 Lee, Keun Wang II-1074 Lee, Kihyung II-175 Lee, Kil-Hung II-572 Lee, Kilsup IV-917, V-877 Lee, Ki-Young II-249 Lee, Kunwoo I-459 Lee, Kwang Hyoung II-1074 Lee, Kwangyong IV-499 Lee, Kwan H. I-480 Lee, Kwan-Hee I-151 Lee, Kyesan II-905, V-708, V-717, V-952 Lee, Kyujin V-708 Lee, Kyu Min IV-483 Lee, KyungHee IV-380 Lee, Kyung Ho II-1199 Lee, Kyunghye II-410 Lee, Kyungsik III-777 Lee, Kyung Whan IV-873 Lee, Malrey III-244, IV-947, V-644, V-665, V-675 Lee, Mun-Kyu IV-584 Lee, Myungho I-1019 Lee, Myungjin I-1072 Lee, Na-Young V-441 Lee, Samuel Sangkon II-231 Lee, Sangjin IV-245 Lee, Sang-goo I-1036 Lee, Sang Ho IV-1070, V-555, V-609 Lee, Sang-Hun II-239 Lee, Sang Hun I-459 Lee, Sangjin III-446, IV-236 Lee, Sang Joon V-185
Lee, Sang-Jun V-503 Lee, Sangjun IV-549 Lee, Sang-Min I-1053 Lee, Sangyoun II-87, II-96 Lee, Seojeong IV-966 Lee, Seok-Cheol III-115 Lee, Seon-Don II-720 Lee, SeongHoon IV-491 Lee, Seonghoon IV-197 Lee, Seong-Won IV-622 Lee, Seoung-Hyeon II-945 Lee, Seoung-Soo V-503 Lee, SeoungYoung II-1140 Lee, Seungbae V-467 Lee, Seung-Heon II-495 Lee, Seunghwa II-621, V-49 Lee, Seunghwan III-64, III-93 Lee, Seung-Jin V-512 Lee, Seungkeun IV-21 Lee, Seungmin V-476 Lee, Seung-Yeon II-905 Lee, SeungYong V-922 Lee, Se Won III-718 Lee, SooCheol II-552 Lee, SuKyoung II-1131 Lee, Su Mi IV-316 Lee, Sungchang II-923 Lee, Sung-Hyup II-631 Lee, Sung Jong IV-917 Lee, Sung-Joo IV-828 Lee, SungYoung II-390 Lee, Sungkeun II-204 Lee, Tae-Dong II-462 Lee, Taehoon IV-1081 Lee, Tae-Jin II-288, II-534, II-661, II-710, II-856, II-923, II-1121 Lee, Vincent I-586 Lee, Wankwon IV-491 Lee, Wansuk II-954 Lee, Wongoo II-11, V-851 Lee, Won-Hyuk II-982 Lee, Wookey IV-787, V-942 Lee, Woongho I-326 Lee, Yang-sun V-600 Lee, Yongjin IV-197 Lee, Yongseok V-665 Lee, YoungGyo III-54 Lee, Young Hoon III-875 Lee, Young-Koo II-505 Lee, Youngkwon II-915
Author Index Lee, Young-Seok III-144 Lee, Youngsook III-517, V-858 Lee, YoungSoon V-675 Lee, Yun Ho III-875 Lee, Yun-Kyoung I-232 Leininger, Thierry I-744 Le´ on, Carlos V-725 Leong, Chee-Seng IV-424 Lho, Tae-Jung II-309, III-1163, V-887 Liang, Yanchun I-938 Liang, Zhong III-928 Liao, Yuehong III-974 Li, Fucui I-317 Li, Haisen S. V-789 Li, Jie II-59 Li, Jin III-309, III-365, IV-406 Li, Kuan-Ching IV-1107 Li, Li I-895 Li, Lv V-32 Li, Qu I-393 Li, Sheng I-420 Li, Shiping I-317 Li, Shujun V-789 Li, Xun I-895 Li, Yanhui II-1179 Li, Yunsong V-149 Li, Zhanwei V-450 Li, Zhong I-1118, I-1134 Lim, Andrew III-688 Lim, Chan-Hyoung III-832 Lim, Hyotaek IV-380 Lim, Hyung-Jin II-135, II-239, III-486, V-626, V-655 Lim, Jeong-Mi IV-679 Lim, JiHyung IV-380 Lim, Jiyoung IV-440 Lim, Sungjun IV-707 Lim, Taesoo IV-687 Lim, Younghwan II-28, II-400, II-410, II-487 Lin, Ching-Fen III-944 Lin, Chuen-Horng V-384 Lin, Hon-Ren V-158 Lin, Hung-Mei III-338 Lin, Kung-Kuei V-158 Lin, Woei II-1111 Ling, Yun IV-649 L´ısal, Martin V-743 Lisowski, Dominik V-58 Liu, Chia-Lung II-1111
1237
Liu, Fuyu IV-88 Liu, Guoli III-659 Liu, Heng I-528 Liu, Joseph K. IV-406 Liu, Jun IV-649 Liu, Kai III-748 Liu, Qun I-1045 Liu, Shaofeng II-279 Liu, Xianxing II-59 Liu, XueHui I-420 Loke, Seng Wai IV-138 Lopes, Carla Teixeira IV-78 L´ opez, M´ aximo IV-169 Lu, Jiahui I-938 Lu, Jianjiang II-1179 Lu, Jiqiang III-466 Lu, Xiaolin I-192, I-875 Luna-Rivera, Jose M. V-412 Luo, Ying IV-1090 Luo, Yuan I-431 Lv, Xinran V-450 Ma, Hong III-688 Ma, Lizhuang I-1118, I-1134 Ma, Shichao I-1010 Madern, Narcis I-81 Magneau, Olivier I-222 Mah, Pyeong Soo IV-509 Makarov, Nikolay I-974 Malafiejski, Michal I-141, I-161 Mamun-or-Rashid, Md. II-651 Manos, Konstantinos I-251 Manzanares, Antonio Izquierdo V-798 Mao, Zhihong I-1118, I-1134 Markiewicz, Marta I-684 Markowski, Marcin III-1119 Marroqu´ın-Alonso, Olga IV-158 Mart´ın, Mar´ıa J. I-701 Mateo, Romeo Mark A. III-178, V-867 Matte-Tailliez, Oriane I-222 Maynau, Daniel I-744 McLeish, Tom I-711 McMahon, Chris A. II-279 Mecke, R¨ udiger I-268 Meek, Dereck I-1118 Mehlhorn, Kurt I-60 Meletiou, G.C. V-635 Mellado, Daniel III-1044 Meng, Qingfan I-938 Merabti, Madjid IV-352
1238
Author Index
Mercorelli, Paolo I-847, I-857 Miao, Zhaowei III-688 Mijangos, Eugenio III-757 Mikolajczak, Pawel V-394 Mill´ an, Roc´ıo V-725 Min, Byoung Joon II-270, II-876 Min, Hong IV-499, IV-549 Min, Hong-Ki II-224 Min, Hyun Gi IV-736 Min, Jun-Ki II-67 Min, Kyongpil I-410 Min, So Yeon II-1003, II-1074, IV-985 Mitra, Pinaki I-1, II-1 Moet, Esther I-20 Moh, Chiou II-1111 Mohades, Ali V-735 Moin, M. Shahram III-1180 Mok, Hak-Soo III-832, III-994 Molinaro, Luis V-808 Monedero, ´I˜ nigo V-725 Moon, Aekyung V-214 Moon, Il Kyeong III-600 Moon, Ki-Young II-945 Moon, Kwang-Sup III-994 Moon, Mikyeong II-441, IV-226 Moon, Young Shik V-404 Morarescu, Cristian I-771, I-779, I-804, I-814, I-839 Moreno, Anna M. Coves III-638 Moreno, Ismael Sol´ıs IV-50 Morillo, Pedro I-490 Morimoto, Shoichi III-1 Morphet, Steve I-1127 Mouloudi, Abdelaaziz V-346 Mouri˜ no, Carlos J. I-701 Mu, Yi III-345 Mukhopadhyay, Sourav III-436 Mukhtar, Shoaib II-847 Mun, Gil-Jong II-340 Mun, YoungSong II-195, II-214, II-319, II-400, II-410, II-420, II-471, II-487, II-525, II-611, II-740, II-885, II-895 Murzin, Mikhail Y. I-605 Na, Yang V-467, V-476 Na, Yun Ji V-331, V-338 Nah, Jungchan IV-440 Nakashima, Toyoshiro II-40 Nam, Do-Hyun II-224
Nam, Junghyun III-517, V-858 Nam, Taekyong II-545 Nandy, Subhas C. II-750 Naumann, Uwe I-865 Navarro, Juan J. V-762 Neelov, Igor I-711 Neumann, Laszlo I-449 Ng, Victor I-383 Niewiadomska-Szynkiewicz, Ewa I-537 Nilforoushan, Zahra V-735 Noh, Bong-Nam II-340, II-349, V-922 Noh, Hye-Min III-188, IV-893 Noh, Min-Ki II-1169 Noh, Sang-Kyun II-349 Noh, Seo-Young II-145 Noh, SiChoon II-1051 Noh, Sun-Kuk II-582 Noori, Siamak III-546 Noruzi, Mohammadreza III-1180 Nowi´ nski, Krzysztof V-394 Nyang, DaeHun IV-380 Ogryczak, Wlodzimierz III-802 Oh, Am-Suk II-309, V-887 Oh, Hayoung IV-440 Oh, Heekuck IV-388 Oh, Hyukjun IV-603 Oh, Inn Yeal II-974, II-1009 Oh, Jaeduck II-471 Oh, Jehwan V-49 Oh, Juhyun II-760 Oh, June II-1199 Omary, Fouzia V-346 Onosato, Masahiko I-469 Ordu˜ na, Juan Manuel I-490 Ortiz, Guillermo Rodr´ıguez IV-50 Ould-Khaoua, Mohamed V-118 Paar, Christof III-1004 Pacifici, Leonardo I-694 Pahk, Cherl Soo III-1190 Paik, Juryon IV-456 Pak, Jinsuk II-1159 Palazzo, Gaetano Salvatore I-794 Palmieri, Francesco III-537 Pamula, Raj III-974 Papadimitriou, Georgios I. II-1229 Pardede, Eric I-1146, IV-207 Park, Chang Mok IV-296
Author Index Park, Park, Park, Park, Park, Park, Park, Park, Park, Park, Park, Park, Park, Park, Park, Park, Park, Park, Park, Park, Park, Park, Park, Park, Park, Park, Park, Park, Park, Park, Park, Park, Park, Park, Park, Park, Park, Park, Park, Park, Park, Park, Park, Park, Park, Park, Park, Park, Park, Park,
Chang-Seop IV-679 Chang Won IV-549 Chiwoo IV-697 Choung-Hwan II-1043 Dae-Hyeon V-958 DaeHyuck II-400 Dea-Woo IV-883 DongSik V-260 Eun-Ju IV-927 Geunyoung IV-549 Gilcheol V-565, V-591 Gi-Won II-631 Gyungleen II-760, IV-558, V-185 Hee-Un V-700 HongShik II-1140 Hyeong-Uk V-512 Ilgon IV-509 Jaehyung II-77 Jaekwan II-155 Jaemin IV-549 Jang-Su IV-370 Jea-Youn IV-835 Jeongmin II-430 Jeong Su IV-316 Jeung Chul I-480 Jong Hyuk IV-236, IV-245 Jongjin II-525 Joon Young I-111 Jungkeun I-1000 Jun Sang V-457 Ki-Hong IV-1060 Kisoeb III-1054 Ki Tae V-404 Kyoo-Seok III-125, V-912 Kyungdo III-928 Mee-Young V-512 Mi-Og IV-883 Namje V-251 Neungsoo IV-622 Sachoun V-905 Sangjoon II-319, II-1033 Sang Soon V-236 Sang Yong IV-1 SeongHoon V-68 Seungmin IV-499 Seung Soo II-298 Soo-Jin IV-432 Soon-Young IV-1126 Sung Soon II-641 Taehyung II-806
Park, Wongil II-330 Park, Woojin II-730 Park, Yong-Seok IV-370 Park, Young-Bae II-224 Park, Young-Jae II-515 Park, Young-Shin IV-432 Park, Youngsup I-402 Park, Young-Tack II-1219 Parpas, Panos III-908 Pastor, Rafael III-554 Paun, Viorel I-779, I-804 Pazo-Robles, Maria Eugenia I-577 Pazos, Rodolfo II-18, IV-169 Pegueroles, Josep III-527 P´erez, Jes´ us Carretero V-108 P´erez, Joaqu´ın IV-169 P´erez-Ros´es, Hebert I-510 Perrin, Dimitri I-612 Petridou, Sophia G. II-1229 Phillips, Robert III-822 Piao, Xuefeng IV-549 Piattini, Mario III-984, III-1013, III-1024, III-1044 Pillards, Tim V-780 Pineda-Rico, Ulises V-412 Pion, Sylvain I-60 Pirani, Fernando I-721, I-738 Poch, Jordi I-364 Pont, Michael J. V-22 Pontvieux, Cyril V-202 Porrini, Massimiliano I-721 Porschen, Stefan I-40 Pozniak-Koszalka, Iwona V-58 Prados, Ferran I-364 Puchala, Edward III-1210 Pulcineli, L. V-819 Puntonet, C.G. V-772 Pusca, Stefan I-763, I-771, I-779, I-804, I-839 Puttini, Ricardo Staciarini V-808 Qin, Xujia I-393 Qu, Xiangli V-224 Qu, Zhiguo I-921 Quintana, Arturo I-510 Quir´ os, Ricardo I-510 Rabenseifner, Rolf V-108 Rahayu, J. Wenny I-1146, IV-207 Rahman, Md. Mustafizur II-866
1239
1240
Author Index
Ram´ırez, J. V-772 Rao, Imran II-390 Reed, Chris I-644 Rehfeld, Martina I-268 Reitner, Sonja V-88 Reyes, Gerardo IV-169 Rhee, Choonsung II-601 Rhee, Gue Won IV-466 Rhee, Yang-Won III-287, IV-1060 Rico-Novella, Francisco III-527 Riganelli, Antonio I-665 Rigau, Jaume I-449 Rim, Kiwook IV-21 Rodionov, Alexey S. I-605 Roh, Byeong-hee IV-279 Rosa-Velardo, Fernando IV-158 Rossi, Gustavo IV-148 Roy, Sasanka I-10 Rubio, Monica I-510 Ruskin, Heather J. I-612, I-622 Rustem, Ber¸c III-908 Ryba, Przemyslaw III-1100 Ryu, Jong Ho II-270 Ryu, Yeonseung I-1000, I-1019 Sadjadi, Seyed Jafar III-546, III-574 Safaei, Farshad V-118 Sakai, Yutaka I-596 Salavert, Isidro Ramos IV-726 Salgado, Ren´e Santaolaya IV-50 Samavati, Faramarz I-91 Sarac, T. III-678 Sarkar, Palash III-436 Saunders, J.R. I-556, III-934 Sbert, Mateu I-449 Schirra, Stefan I-60 Schizas, Christos N. IV-118 Schoor, Wolfram I-268 Schurz, Frank V-129 ´ lo, Piotr V-394 Scis Sedano, I˜ nigo IV-108 Segura, J.C. V-772 Sellar`es, J. Antoni I-81 Sem´e, David V-10 Seo, Dae-Hee IV-326 Seo, Dong Il II-270 Seo, Dongmahn II-562, IV-1156 Seo, JaeHyun III-154, V-922 Seo, Jeongyeon I-101
Seo, Kyu-Tae IV-288 Seo, Manseung I-469 Seo, Won Ju III-718 Seo, Young-Jun IV-864 Seo, Yuhwa III-954 Severiano, Jos´e Andr´es D´ıaz I-30 Severn, Aaron I-91 Shao, Feng I-307 Shi, Qi IV-352 Shi, Wenbo III-213 Shibasaki, Ryosuke I-261 Shim, Choon-Bo II-114 Shim, Donghee IV-491 Shim, Jang-Sup V-968, V-979, V-990 Shim, Junho I-1036 Shim, Young-Chul II-125, II-591 Shimizu, Eihan I-261 Shin, Chang-Sun III-251, IV-798 Shin, Chungsoo II-740 Shin, Dae-won III-261 Shin, Dong-Ryeol IV-483 Shin, Dong Ryul II-165 Shin, Dongshin V-467 Shin, Hayong I-440 Shin, Ho-Jin IV-483 Shin, Jeong-Hoon I-354 Shin, Kee-Young IV-509 Shin, Kwangcheol IV-40 Shin, Myong-Chul V-312 Shin, Seung-Jung II-487 Shin, Woochul I-895 Shin, Yongtae III-954 Shuqing, Zhang I-885 Siem, Alex Y.D. III-812 Singh, David E. IV-1136 Skouteris, Dimitris I-757 ´ Sliwi´ nski, Tomasz III-802 Smith, William R. V-743 Sobaniec, Cezary V-98 Sofokleous, Anastasis A. IV-118 Soh, Ben IV-179 Soh, Wooyoung V-682 Sohn, Hong-Gyoo II-989, II-1043 Sohn, Sungwon V-251 Sohn, Surgwon II-779 Soler, Emilio III-1024 Soler, Josep I-364 Son, Jeongho II-1159 Son, Kyungho II-954 Son, Seung-Hyun III-590
Author Index Song, Eungkyu I-1062 Song, Eun Jee II-1051 Song, Ha-Joo IV-306 Song, Hyoung-Kyu V-752 Song, Jaekoo V-575 Song, Jaesung I-469 Song, JooSeok II-359, II-769, II-1131 Song, Jungsuk IV-245 Song, Ki Won IV-873 Song, Sung Keun V-139 Song, Wang-Cheol V-185 Song, Yeong-Sun II-1043 Song, Young-Jae IV-835, IV-864 Soriano, Miguel III-527 Sosa, V´ıctor J. II-18, IV-169 Souza, Osmar Norberto de I-202 Stanek, Martin III-426 Stankova, Elena N. I-752 Sterian, Andreea I-779, I-804 Stewart, Neil F. I-50 Storchi, Loriano I-675 Suh, Young-Ho IV-466 Suh, Young-Joo II-720 Sun, Jizhou V-450 Sun, Lijuan V-450 Sun, Youxian IV-539 Sung, Jaechul III-446 Sung, Sulyun III-954 Susilo, Willy III-345 Syukur, Evi IV-138 Tae, Kang Soo II-231 Takagi, Tsuyoshi III-375 Talia, Domenico I-1080 Tan, Pengliu IV-529 Tan, Wuzheng I-1118, I-1134 Tang, Chuan Yi III-631 Tang, Lixin III-659 Tang, W.J. I-556 Taniar, David I-1090, I-1146 Tarantelli, Francesco I-675 ¨ Tasan, Seren Ozmehmet V-78 Tasoulis, D.K. V-635 Tasso, Sergio I-212 Tate, Stephen R. III-327 Teng, Lirong I-938 tie, Li V-32 Tiskin, Alexander III-793, V-165 Toma, Alexandru I-839 Toma, Cristian I-779
1241
Toma, Ghiocel I-804 Toma, Theodora I-771 Torabi, Torab IV-98, IV-217 Torres-Jim´enez, Jos´e IV-726 Tragha, Abderrahim V-346 Trujillo, Juan III-1024 Trunfio, Paolo I-1080 Tsai, Cheng-Jung II-1210 Tsai, Chwei-Shyong III-406 Tsai, Yuan-Yu I-171, I-181 Tunali, Semra V-78 Uhm, Chul-Yong
IV-448
Vafadoost, Mansour III-1180 Vakali, Athena I. II-1229 Val, Cristina Manchado del I-30 Vanhoucke, Mario III-621 Varnuˇska, Michal I-71 Vazquez, Juan Ignacio IV-108 Vehreschild, Andre I-865 Verd´ u, Gumersindo V-192 Verta, Oreste I-1080 Vidal, Vicente V-192 Vidler, Peter J. V-22 Villalba, Luis Javier Garc´ıa V-808, V-819 Villarroel, Rodolfo III-1024 Villarrubia, Carlos III-1013 Villecco, Francesco I-857 Virvou, Maria I-251 Vlad, Adriana I-1166 Voss, Heinrich I-684 Vrahatis, M.N. V-635 Walkowiak, Krzysztof II-1101 Wan, Wei IV-1090 Wan, Zheng II-964 Wang, Bo-Hyun I-946 Wang, Chung-Ming I-171, I-181 Wang, Gi-Nam IV-296 Wang, GuoPing I-420 Wang, K.J. III-885 Wang, Kun V-149 Wang, Kung-Jeng III-668 Wang, Shoujue V-375 Wang, Shouyang I-518 Wang, Weihong I-393 Wang, Yanming III-309 Wang, Zhengyou II-964
1242
Author Index
Wang, Zhensong I-1010 Watson, Mark D. I-121 Wawrzyniak, Dariusz V-98 Wee, Hui-Ming III-862, III-885 Weidenhiller, Andreas V-88 Wei, Guiyi IV-649 Wei, Tao IV-262 Wen, Chia-Hsien IV-1107 Wheeler, Thomas J. I-654 Wild, Peter J. II-279 Wollinger, Thomas III-1004 Won, Dong Ho II-165 Won, Dongho II-545, II-954, III-54, III-517, IV-777, V-251, V-858 Won, Youjip I-1062 Woo, Sinam II-730 Woo, Yo-Seop II-224 Woo, Young-Ho II-224 Wu, Chaolin IV-1090 Wu, Chin-Chi II-1111 Wu, EnHua I-420 Wu, Hsien-Chu III-406 Wu, Mary III-93, IV-818, IV-1022 Wu, Q.H. I-556, III-934 Wu, Qianhong III-345 Wu, Shiqian II-964 Wu, Xiaqing I-500 Wu, Zhaohui II-1149 Xia, Feng IV-539 Xia, Yu III-1064 Xiao, Zhenghong V-243 Xiaohong, Li V-32 Xie, Mei-fen V-375 Xie, Qiming V-149 Xie, Xiaoqin IV-756 Xu, Baowen II-1179 Xu, Fuyin V-243 Xue, Yong IV-1090 Yan, Jingqi I-528 Yang, Byounghak III-581 Yang, Ching-Wen IV-1107 Yang, Hae-Sool IV-767, IV-937, IV-976, IV-1052 Yang, Hwang-Kyu III-279 Yang, Hyunho III-178 Yang, Jong S. III-1129 Yang, Kyoung Mi I-278 Yang, Seung-hae III-38, III-261
Yang, Xuejun V-224 Yang, Young-Kyu II-495 Yang, Young Soon II-1199 Yap, Chee I-60 Yeh, Chuan-Po III-406 Yeh, Chung-Hsing III-649 Yeo, So-Young V-752 Yeom, Hee-Gyun IV-909 Yeom, Keunhyuk II-441, IV-226 Yeom, Soon-Ja V-958 Yeun, Yun Seog II-1199 Yi, Sangho II-701, IV-499, IV-549 Yi, Subong III-144 ˙ Yıldız, Ipek I-547 Yim, Keun Soo I-1000 Yim, Soon-Bin II-1121 Yoe, Hyun III-251 Yoh, Jack Jai-ick V-484 Yoo, Cheol-Jung III-188, III-222, IV-893, IV-955, V-644 Yoo, Chuck II-1189 Yoo, Chun-Sik V-958, V-979, V-990 Yoo, Giljong II-430 Yoo, Hwan-Hee V-1010 Yoo, Hyeong Seon III-206, III-213 Yoo, Jeong-Joon I-1000 Yoo, Kee-Young I-1156, V-276, V-303 Yoo, Ki-Sung II-1169 Yoo, Kook-Yeol I-298 Yoo, Sang Bong I-895 Yoo, Seung Hwan II-270 Yoo, Seung-Jae II-1025 Yoo, Seung-Wha II-186 Yoo, Sun K. I-335 Yoon, Eun-Jun I-1156, V-276, V-303 Yoon, Heejun II-11 Yoon, Hwamook II-11 Yoon, Kyunghyun I-402 Yoon, Won Jin II-856 Yoon, Won-Sik II-847 Yoon, Yeo-Ran II-534 Yoshizawa, Shuji I-596 You, Ilsun IV-336, IV-416 You, Young-Hwan V-752 Youn, Hee Yong II-691, III-852, IV-1, IV-187, IV-456, IV-519, V-139, V-185 Young, Chung Min II-534 Yu, Jonas C.P. III-885 Yu, Jun IV-649
Author Index Yu, Ki-Sung II-525, II-982 Yu, Lean I-518 Yu, Mei I-307, I-317 Yu, Sunjin II-87, II-96 Yu, Tae Kwon II-451 Yu, Young Jung I-904 Yu, Yung H. V-932 Yuen, Tsz Hon I-383 Yun, Jae-Kwan II-259 Yun, Kong-Hyun II-989 Zeng, Weiming II-964 Zhai, Jia IV-296 Zhang, David I-528 Zhang, Fan II-59 Zhang, Fanguo III-345 Zhang, Jianhong IV-262
Zhang, JianYu IV-262 Zhang, Jie V-149 Zhang, Minghu IV-529 Zhang, Xinhong II-59 Zhang, Zhongmei I-921 Zhao, Mingxi I-1118 Zheng, He II-954 Zheng, Lei IV-1090 Zheng, Nenggan II-1149 Zhiyong, Feng V-32 Zhong, Jingwei V-224 Zhou, Bo IV-352 Zhou, Yanmiao II-1149 Ziaee, M. III-574 Zongming, Wang I-885 Zou, Wei IV-262 ˙ nski, Pawel I-141, I-161 Zyli´
1243