This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
) includes sedimentation, i.e. incorporation of
169
a settling flux within the vertical convection term. Due to high concentrations of sludge the so-called hindered particle settling velocity can be described by the Vesilind function [3]: us =uoe where us is the settling velocity, u0 is the maximum settling velocity, rh is the hindered settling zone parameter and Cs is the local sludge concentration. The necessary parameters were experimentally determined (data not shown). The species transport model describes the movement of the soluble carbon substrate, and nitrate through the system. These species are the most relevant under anoxic conditions. Again the general conservation equation is used where the transport quantity
= pk, for any m = 1, 2, 3, .... In other words, the process [Xk] and the averaged processes {Xkm>J, m >1, have an identical correlation structure. The process {Xk} is asymptotically secondorder self-similar with H = 1 - (fi/2), if pk<m> —> pk, as m —• oo. The main properties of self-similar processes include slowly decaying variance, long-range dependence and 1/f-noise [2], [3], [11]. 3 FIVE METHODS The FFT- and RMD-based methods were suggested as being sufficiently fast for practical applications in generation of simulation input data, for example in [10] and [12]. In this paper, we will report properties of these two methods and the F-ARIMA-based method; and compare them with SRA and FGN-DW, two recently proposed alternative methods for generation of pseudo-random self-similar sequences [7], [8], [9]. These methods can be characterised as follows: • FFT Method: This method generates approximate self-similar sequences based on the Fast Fourier Transform (FFT) and a process known as the Fractional Gaussian Noise (FGN). Its main difficulty is connected with calculating the power spectrum, which involves an infinite summation. Paxson has solved this problem by applying a special approximation. Briefly, the FFT method is based on (i) calculation of the power spectrum from the periodogram (the power spectrum at a given frequency represents an independent exponential random variable), (ii) construction of complex numbers which are governed by the normal distribution and (iii) execution of the inverse FFT; see [12] for more details. • RMD Method: The basic concept of the random midpoint displacement (RMD) algorithm, which generates approximate FBM sequences, is to extend the generated sequence recursively, by adding new values at the midpoints derived from the values at the endpoints. The reason for subdividing the interval between 0 and 1 is to construct the Gaussian increments. Adding offsets to midpoints makes the marginal distribution of the final result normal. For more detailed discussion of the RMD method, see [4], [10]. • SRA Method: A method for the direct generation of an FBM process is based on the successive random addition (SRA) algorithm [4], [9]. The SRA method uses the midpoints in the same way as RMD, but adds a displacement of a suitable variance to all of the points to increase stability of the generated sequence [7]. The reason for interpolating midpoints is to construct Gaussian increments, which are correlated. Adding offsets to all points should make the resulted sequence self-similar and of normal distribution [4]. • F-ARIMA Method: Hosking [6] provides an algorithm for generating an LRD process
338 called fractional ARIMA(0,d,0), the simplest and most fundamental of the fractionally differenced ARIMA processes. The process has Gaussian marginals with zero mean and variance, and fractional differencing parameter d = H - 1/2. The process {Xk} is chosen from the Gaussian distribution N( UK Vk) where Uk is the k-th mean and Vk is the k-th variance. This algorithm requires 0(n2) computation time, because each number of the sequence depends on every previous number. See [6] for further discussion. • FGN-DW Method The method based on FGN and Daubechies wavelets (FGN-DW), and proposed in [8], [9], is based on the strategy proposed in [12]. The algorithm consists of the following steps: (i) calculation of the power spectrum an FGN process, (ii) construction of complex numbers which are governed by the normal distribution and (iii) calculation of two coefficients of Daubechies wavelets which are needed in the inverse Daubechies wavelets transform. 4 ANALYSIS OF SELF-SIMILAR SEQUENCES Five generators are comparable because they have the same statistical properties such as Gaussian distributions, means, and variances. The five generators of self-similar sequences of pseudorandom numbers described in Section 3 have been implemented in C on a Pentium II (233 MHz, 128 MB) computer. We have analysed the accuracy of the five methods. For each of H = 0.6, 0.7, 0.8 and 0.9, each method was used to generate 30 sample sequences of 32,768 (215) numbers starting from different random seeds. We have summarized the results of our analysis in the following Tables 1 and 2: The estimates of the Hurst parameter obtained from the least biased of the H estimation techniques, i.e., the wavelet-based H estimator and Whittle's MLE (see [9]), have been used to analyse the accuracy of the five generators. The presented numerical results are all averaged over 30 sequences. The results for the wavelet-based H estimator and Whittle's MLE with the corresponding 95% confidence intervals (CIs), (see Tables 1 and 2), show that for all input values, the F-ARIMA, the FFT and the FGN-DW methods produced sequences with less biased H values than other methods. Our results show that all five generators produce approximately self-similar sequences, with the relative inaccuracy increasing with H, but always staying below 9%. Table 1: Mean values of estimated H using the wavelet-based H estimator for five generators for H = 0.6, 0.7, 0.8 and 0.9. We give 95% CIs for the means in parentheses. Method F-ARIMA FFT FGN-DW RMD SRA
0.6 .5974 (.593, .601) .6005 (.596, .604) .6013 (.574, .629) .5963 (.591, .601) .5848 (.579, .589)
Mean Values of Estimated H 0.7 0.8 .6990 .7947 (.693, .704) (.787, .801) .6967 .7862 (.692, .700) (.782, .790) .7962 .6987 (.671, .726) (.769, .824) .6907 .7805 (.684, .696) (.773, .787) .6797 .7700 (.674, .685) (.763, .776)
0.9 .8900 (.880, .899) .8639 (.859, .867) .8938 (.866, .921) .8592 (.852, .866) .8499 (.842, .856)
5 CONCLUSIONS In this paper we have presented the results of a comparative analysis of five generators of (long) pseudo-random self-similar sequences. It appears that all five generators, based on the FFT, RMD, SRA, F-ARLMA, and FGN-DW methods, generate approximately self-similar sequences, with the relative inaccuracy of the results below 9%. The results of this research can be extended by designing more computationally efficient self-similar generator able to construct arbitrary long sequences. Such sequences are necessary in parallel computing simulation studies of the telecommunication networks with self-similar teletraffic.
339 Table 2: Mean values of estimated H using Whittle's MLE for five generators for H = 0.6, 0.7, 0.8 and 0.9. We give 95% CIs for the means in parentheses. Method F-APJMA FFT FGN-DW RMD SRA
0.6 .5803 (.571, .590) .6002 (.591, .610) .5849 (.575, .594) .5765 (.567, .586) .5762 (.567, .586)
Mean Values of Estimated H 0.8 0.7 .6628 .7469 (.738, .756) (.654, .672) .7002 .8003 (.691, .710) (.791, .809) .6725 .7620 (.663, .682) (.753, .771) .6567 .7401 (.647, .666) (.731, .749) .6563 .7395 (.647, .666) (.730, .749)
0.9 .8324 (.823, .842) .9002 (.891, .909) .8530 (.844, .862) .8261 (.817, .835) .8252 (.816, .834)
REFERENCES [I] J. Beran. Statistical Methods for Data with Long Range Dependence. Statistical Science, 7(4):404-427, 1992. [2] J. Beran. Statistics for Long-Memory Processes. Chapman and Hall, New York, 1994. [3] D. Cox. Long-Range Dependence: a Review. In H. David and H. David, editors, Statistics: An Appraisal, pages 55-74. Iowa State Statistical Library, The Iowa State University Press, 1984. [4] A. Crilly, R. Earnshaw, and H. Jones. Fractals and Chaos. Springer-Verlag, New York, 1991. [5] M. Garrett and W. Willinger. Analysis, Modeling and Generation of Self-Similar VBR Video Traffic. In Computer Communication Review, Proceedings of ACM SIGCOMM'94, volume 24(4), pages 269-280, London, UK, 1994. [6] J. Hosking. Modeling Persistence in Hydrological Time Series Using Fractional Differencing. Water Resources Research, 20(12):1898-1908, 1984. [7] H.-D. Jeong, D. McNickle, and K. Pawlikowski. A Comparative Study of Three Self-Similar Teletraffic Generators. In Proceedings of 13th European Simulation Multiconference, ESM'99, volume 1, pages 356-362, Warsaw, Poland, 1999. [8] H.-D. Jeong, D. McNickle, and K. Pawlikowski. Fast Self-Similar Teletraffic Generation Based on FGN and Wavelets. In Proceedings of the IEEE International Conference on Networks, ICON'99, pages 75-82, Brisbane, Australia, 1999. [9] H.-D.J. Jeong. Modelling of Self-Similar Teletraffic for Simulation. PhD thesis, Department of Computer Science, University of Canterbury, 2002 (Submitted). [10] W.-C. Lau, A. Erramilli, J. Wang, and W. Willinger. Self-Similar Traffic Generation: the Random Midpoint Displacement Algorithm and its Properties. In Proceedings of IEEE International Conference on Communications (ICC'95), pages 466-472, Seattle, WA, 1995. [II] W. Leland, M. Taqqu, W. Willinger, and D. Wilson. On the Self-Similar Nature of Ethernet Traffic (Extended Version). IEEE ACM Transactions on Networking, 2(1): 1-15, 1994. [12] V. Paxson. Fast, Approximate Synthesis of Fractional Gaussian Noise for Generating SelfSimilar Network Traffic. Computer Communication Review, ACM SIGCOMM, 27(5):5-18, 1997. [13] V. Paxson and S. Floyd. Wide-Area Traffic: the Failure of Poisson Modeling. IEEE ACM Transactions on Networking, 3(3):226-244, 1995.
THE DELAY-BOUNDED SOURCE MODEL S. RUTTANAWIT , M. LERTWATECHAKUL AND P. SOORAKSA Research Center for Communications and Information Technology (ReCCIT), and entific Department of Information Engineering, Faculty of Engineering King Mongkut's Institute of Technology Ladkrabang, Bangkok 10520, Thailand E-mail: [email protected], klmavure(a),kmitl.ac.th, [email protected] This paper proposes a new traffic source model, called Delay-Bounded Source model. A packet stream could be generated in constant distribution and user can adjust the generating time of the packets under a limited time interval. As a result, the source model generates a constant rate packet stream, which its packet generated delay time (compare to constant distribution) is fallen within the assigned time interval. In this model, the generated delay time distribution pattern could be generated in exponential distribution or normal distribution.
1
Introduction
Information Technology (IT) is growing rapidly, various types of application is emerged. Many of which work well in best-try sharing network environment, such as e-mail, FTP and WWW. On the other hand, many applications need a certain resource to fulfill their user satisfaction, such as multimedia application, voice and video on demand. To provide certain resource for a traffic flow, we need such kind of protocols to negotiate a set of traffic parameters corresponding to the specified service class (QoS). The user QoS of traffic flow would be obtained whenever their behavior is conformed to the contract agreement. Nowadays, there are many network technologies were developed. To achieve QoS on a specific network, end system has to follow the network discipline. For example, users are assigned fixed time slots in Time Division Multiplexing (TDM) network. A user's QoS is deterministic, predictable and isolated from the other users. The ATM (Asynchronous Transfer Mode) defines a set of traffic management functions to support a flexible set of services offering different QoS. While the servicewindow concept introduced in W-TDM (Window based TDM-like Scheduling) [4] is to enhance TDM method in service-time flexibility. In this paper, we propose a new traffic source model, called Delay-Bounded Source (DBS) model. The model was developed in the OPNET modeler 6.0. Using DBS model and other sources model, we could evaluate the QoS performance of a network model such as ATM, W-TDM etc. through difference kinds of traffic source. 2
Source Models
There are two basic philosophies for characterizing source traffic parameter: deterministic and random. The deterministic traffic model (which is control by leaky bucket algorithm) clearly defines the source characteristics as understood by the user and the network. The other philosophy for modeling traffic source behavior utilizes random models to define traffic parameters. Here, we describe some popular traffic models including the mathematical method for generating random variables from them.
340
341
2.1
The Constant Distribution Process
It has a probability distribution function represented as: P(X)=\
*=C [ otherwise It has an expected value E(x) = C, Naturally the parameter C must be an integer. 2.2
Exponential Distribution
The exponential distribution process has an exponentially distributed inter-arrival times with rate parameter e. The probability density function is represented as:
f(x) =
x<[
It has an expected value E(x) = 1/b. 1.3
Normal Distribution Process
The normal (or Gaussian) distribution process has the probability density function is represented as:
•JITZO
It has an expected value E(x) = i. 3 1.1
Delay-Bounded Source (DBS) model Delay-Bounded Source (DBS) algorithm
Some network model discipline allows packets of a traffic flow to be sent periodically with some relax delay (such as W-TDM). Constant distribution function periodically generates the packet stream represented as:
ip-'p
+j
where tpk is the generating time of the packet kth, e representative of mean packet generating rate. The DBS The generating packets are concerned there are two ways of how we consider a discrete-time queuing model. That is early generating (tj and lately generating (ti), when the generating packet early than its reference time (tp) we assign DBS generates the packet by constant distribution function minus with delay time. When the generating packet lately than its reference time we assign DBS generates the packet by constant distribution function plus with delay time represented as:
342
where z is generated with exponential distribution function or normal distribution function. User can defined maximum delay time (z) under a limited time (d), z < d. »
X«-
t m t t, t-p
. k-}
«C
TT
»/>
TT
»/
TT
&/7
TTT7
*/J
^ k+2 W"
Figure 1. Delay-Bounded Source Algorithm
1.2
Delay-Bounded Source (DBS) process
We explain the state transition diagram of the OPNET process model, as shown in Fig. 2. The transition diagram of the DBS algorithm consists of four states: init state, wait state, send state and end state. The init state load user specified parameter such as distribution model and maximum delay jitter time (z) and transfer to the wait state. In the wait state, the process waits for self-interrupt which is scheduled in a table for activating the process to generate a new packet in send state.
^mm?
Figure 2. Delay-Bounded Source Process
1.3
Delay-Bounded Source (DBS) result
We compare the simulation result of DBS model with different distribution pattern for generating delay time (z). The average generating rate (e) was defined as 0.2 packets/second and generating delay time (z) equal to 2 for every traffic source. Fig.3 and fig.4 show packet-generating time of the constant-packet stream compared to a DBSpacket stream with exponential distribution and normal distribution accordingly. The results show that the DBS model could generate a packet stream with a specified generating rate while its generating time could be vary under a defined boundary.
343 .IW T T W8!^L...../.............. J M M& twos*? itsrm
;rr
TTT
Figure 3. Constant source and DBS with exponential distribution, t=2 sec, 1/X=5 sec. •|F»W''W^
Tpipnnnnr Figure 4. Constant source and DBS with normal distribution, x=2 sec, 1/X=5 sec.
2
Conclusion
The Delay-Bounded Source (DBS) model is a new traffic source model developed on the OPNET Modeler 6.0. Using the DBS model, users can generate a packet stream in A packets/sec under specified generating delay time bound. The DBS traffic shape is look seem like the output of leaky bucket which is working in full load but its generated time is variant by specified distribution function. As a result, the DBS model could be used as a traffic source to determine the effect of different source models on performance of a network model.
References 1. Edward R. Doughety, Probability and statistics for the engineering, computing and physical sciences, Prentice Hall (1990) 2. John J. Komo, Random signal analysis in engineering systems, Academic Press (1987) 3. K. Sam Shanmugan and A. M. Breipohl, Random signals detection, Estimation and data analysis, John Wiley & Sons (1988) 4. M. Lertwatechakul, R.Warakulsiripunth,QoS quarantee in ATM network Employing Window-based TDM-like Scheduling, In Proceeding of Asia-Pacific Symposium on Broadcasting and Communications, (2000) pp. 383-388. 5. Gerd Keiser, David Freeman, Carrie Carter, 1995. ATM Test Traffic Generation Algorithms, In Proceedings of Fourth International Conference on Computer Communications and Networks, (1995) pp. 462-469. 6. Chukwuemeka N. Aduba, Matthew N. O. Sadiku, Simulation and Analysis of different Traffic Models for ATM Networks, In Proceedings IEEE Southeast Conference, (2002) pp. 73-75 7. OPNET Contributed Model Depot http://www.opnet.com/services/depot/home.html
INVERTIBLE INTEGER FFT AND DCT APPLIED ON LOSSLESS IMAGE COMPRESSION
YAN YUSONG', WANG CHUNMEI2 , SU GUANGDA', SHI QINGYUN 3 1
Department of Electronic Engineering, Tsinghua University, Beijing 100084, P.R.China 1
School of Mathematical Sciences, Peking University, Beijing 100871, P.R.China 3
Center for Information Sciences, Peking University, Beijing 100871, P.R.China Abstract: This paper completes the construction of FFT (Fast Fourier Transform) that map integers to integers by using Lifting Scheme and butterfly-style construction, which is the basis for invertible integer DCT. Experimental results using integer DCT for lossless image compression are given and show that this transform is fast and efficient for image compression.
1
INTRODUCTION In paper [1], Daubechies etc. advise lifting steps to rebuild wavelet transforms that map integers to
integers.
Using this integer version of wavelet transforms, the computations are still done with
floating point numbers, but the output is guaranteed to be integer and invertibility is preserved, which is crucial to lossless image coding. As traditional image compression methods, FFT and DCT have wide applications. Moreover, to some extent, DCT is better than wavelet for image compression with a higher compression ratio. So we want to construct invertible DCT that maps integers to integers. Lifting scheme is briefly reviewed in Section 2. In Section 3, we give a detailed analysis for FFT's butterfly structure, and try to adapt lifting to integer FFT. Finally, integer DCT is built and applied to lossless image compression; the relative results show its efficiency.
2
LIFTING SCHEME First, we briefly give lifting scheme for an upper triangular matrix. Given a transform:
^1-
1 a\ *i] 0 l'j x2\
where Xx, X2 are inputs, y1, y2
are outputs, and a
(2.1)
is a floating number. Now we can construct
its invertible transform:
j>, =x, + |_a x2\ (2.2)
y2=x2 where [_xj means rounding-off.
According to equation (2.2), we can see, if X{, X2 are integers, the
computed yt, y2 are integers also. Therefore, (2.2) defines a transform mapping integers to integers. Easily we can get its invertible transform:
x2 = y2 I
xx=yx-\ax2\ 344
I
(23)
345 It is clear that (2.3) is also integer transform.
Similarly, this can be done for a lower triangular matrix.
In paper [2], the invertible wavelet transform that maps integers to integers can be constructed according to the following steps: first divide a transform matrix into the multiplication of triangular matrices, whose diagonal elements equal to one; then for every triangular matrix, compute its invertible transform which maps integers to integers; finally, multiply all these transforms in order and get the final transform.
This step is called as lifting, and the triangular matrix with diagonal elements
equaling to one is called as lifting matrix. Although the computations are still done with floating point numbers, the results coming from the upper process are guaranteed to be integer and invertibility is preserved. Therefore, this transform can reduce signals' redundancy efficiently, and benefits data's lossless compression First, give two basic matrices' lifting: 1) Lifting for scaling matrix:
=
v4
i k-k2~\ 0 1 'j
i ol 1 ]/k l'j[0
k-\\ 1 01 1 'j 1 l'j
(2.4)
2) Lifting for rotation matrix: cos a sin a
- sin a~\ I = cos a J
•%(«/2)l
1
1
sin a
!l
-«(a/2)l
(2.5)
l'j So for scaling and rotation transforms, their invertible integer transforms can be gotten easily [2].
3
INTEGER FFT Discrete Fourier Transforms find wide applications in signal processing, and 2-D DFT is defined
as follows: F
1
-ilnnu^
L"] = - r = Z F M e x P [ — r , — ] '
^ M = -7Tf2. F Mexp[——]
(3-D
At the same time, a fast algorithm called as FFT is designed. In this paper, we shall concentrate on the case N=2m.
Figure 3.1(a) shows FFT with a length of eight. N
(a)
al
Dataj
a2
Data j+N/2
(b)
Figure 3.1 (a) Eight variables' FFT, where O represents complex numbers (b) Simple butterfly-style transform, where a l , a2 are outputs, b l , b2 are inputs The transform for Figure 3.1(b) is:
346
Wi 1 * 1 . 1 < / 2 | JIAJ
1 a
where WX = exp[
2j
—] .
(3.2)
i
-^'j
In fact, for better compression ratio, the dynamic range of
TV compressed data should be as small as possible. So in equations (3.1), we multiply a factor _L_ on •JN
both sides of DFT; that is to say, multiply _ L for equation (3.2), and we can get changed equation:
4i Wi 1 V2" Wi VJ
1
V2
a ]
'\ -
-Jl where w£ = exp[-
1
Lv2
(3.3)
VIj
- Yin j ~~N
•
According to upper discussion, FFT can be broken down into a combination of many butterfly transforms. And for every butterfly, we can rewrite its transform as follows: 1
W
J_
w
1
Obviously,
«
0
0 i1
i
IJ
(3.4)
1
• WJ 'j
o
J
•yj2|
1 01 and 0 1 1]
0^
are lifting matrices.
J?/
Moreover,
/2 0
1 J
1
° I 's
a
sca m
' 8
V2lj
matrix, and has its own lifting scheme according to equation (2.4). Therefore, we only need to prove that '
" 11 has its invertible integer transform also.
As we know, most FFT algorithms suppose the input signal is complex; therefore, it is possible to get lifting scheme for b = bx + ib
W^
, a = ax + ia
: the transform between complexes:
b = W^ x a , where
, can be rewritten as follows:
s1 .VJ'
cos# sin#
-sin#] I "*} cos# J ay\
where
9=~¥2xj
(3.5)
N
According to lifting equation (2.5), we can get the corresponding invertible integer transform which maps complex integer a to complex integer b . Accordingly, there is invertible integer transform for b = -W^ x a', so we can get the relevant invertible integer transform for 1 0 1. 0 -W^\ That is to say, we can construct invertible integer transform for a single butterfly transform (3.4), which leads to invertible integer transform for FFT. Invertible integer FFT coming from the upper steps has the following advantages: 1) which maps integer to integer, and has no errors because of lacking floating point computation; 2) which preserves
347 high precision, if there exists floating point numbers' computation; 3) which keeps the dynamic range. So such integer FFT has a better performance for complex data. In fact, generally when FFT operates on real signals, the output data are symmetric and conjugate; so we only need to record half of these complex data for restoring original signals. characteristic is crucial for data compression and integer DCT's realization.
Such
In later paper, we will
give integer FFT, which has conjugate symmetry for real signals, and correspondingly deduct invertible integer DCT. 4
EXPERIMENTAL RESULTS Using integer DCT coming from integer FFT, we compress several standard images losslessly.
Here, 8 x 8
integer DCT and arithmetic coding are exploited.
experimental results.
Table 4.1 shows the relative
Clearly, high performance can be achieved when using integer DCT first. Table 4.1 Integer DCT for lossless image compression
Standard image
Original Size (bytes)
Only Arithmetic Coding (bytes)
Integer DCT + Arithmetic Coding (bytes)
Lena
262144
237466
157958
Zelda
262144
229142
146500
Girl
262144
216736
153388
Barb
262144
239026
170803
Plane
262144
203268
155801
Flower
262144
220888
132934
Goldhill
262144
222413
172176
Integer DCT can not only be used for lossless image compression, but lossy image compression. Combining it with progressive coding methods together, we can realize grey image compression from progressively until losslessly.
Moreover, in paper [6], invertible integer transform from RGB space to
YCbCr color space can be constructed.
So, progressive until lossless color image compression can be
completed finally. Of cause integer DCT is useful for other data's lossless compression, such as audio signal. same time, there are quantities of fast algorithms for DCT.
At the
Therefore, it is worth doing research on
how to combine fast algorithms with invertible integer transform together, and it will reduce computational complexity accordingly. Reference 1. R.Calderbank, I.Daubechies, W.Sweldens, and B.-L.Yeo. Wavelet Transforms That Map Integers to Integers. Technical Report, Department of Mathematics, Princeton University,
1996.
http://cm.bell-labs.eom/who/wim/papers/papers.html#integer 2. Daubechies, and W.Sweldens. "Factoring Wavelet Transforms Into Lifting Steps". Technical Report, Bell-Laboratories, Lucent Technologies, 1996. 3. Xia Deshen and Fudesheng, Technology and Application for Modern Image Processing. Nanjing: Southeast University Press. (In Chinese) 4. William B.Pennebaker, Joan L.Mitchell. JPEG Still Image Data Compression Standard. NewYork: Van Nostrand Reinhold, 1993. 5. H.V.Sorensen etc. "Real-Value Fast Fourier Transform Algorithms". IEEE Transactions On Acoustics, Speech, and Signal Processing. VOL .ASSP-35, No 6, JUNE 1987. 6. Yan Yusong, Shi Qingyun. "Reversible Color Space Transform", Pattern Recognition and Artificial intelligence, Hefei, 1999, volume 12. No. 1 (in Chinese))
SOLUTION OF SCATTERING BY HOMOGENEOUS DIELECTRIC BODIES USING PARALLEL P-FFT ALGORITHM 'WEI-BIN EWE, u Y A O - J U N WANG,
1|2
LE-WEI LI, 3 ER-PING LI
'Department of Electrical and Computer Engineering, National University of Singapore, 10 Kent Ridge Crescent, Singapore
119260.
2
High Performance Computation for Engineered Systems (HPCES) Programme Singapore-MIT Alliance (SMA), Singapore/USA 119260/02139 3
The Computational Electro-Magnetics & Electronics (CEE) Institute of High Performance Computing (IHPC), Singapore
Division 117528
In this paper, pre-corrected FFT (P-FFT) is employed to solve electromagnetic scattering by threedimensional homogeneous lossy dielectric object. Triangular patch has been used to model the surface of the dielectric object. Meuiod of moments is applied to solve integral equation formed by Poggio, Miller, Chang, Harrington, Wu and Tsai (PMCHWT) formulation. Pre-corrected FFT algorithm, which is a 0(N2l2logN) algorithm, is applied in order to speed up the matrix-vector multiplication in the iterative solver. It also eliminates the need of generating and storing of impedance matrix and thus reduces the memory requirement. When implementing the P-FFT algorithm, most of the computation time has been taken up (up to 75%) to perform forward and inverse FFT. The computation will be more efficient if more processors are available for parallel implementation of forward and inverse FFT. The MPI (message passing interface) has been used for parallelizing the P-FFT algorithm on the platform, IBM p690 PowerPC_POWER4: AIX 5L. Numerical results are presented to demonstrate the accuracy and capability of the proposed method on high performance computers.
1
Introduction
Integral equation methods have been used for the solution of electromagnetic scattering problems by three-dimensional homogeneous bodies. In this approach, the problem is formulated by choosing appropriate integral equation and then applies Method of Moments (MoM) to form a matrix equation. Solving the matrix equation using normal matrix inversion techniques such as Gaussian elimination and LU decomposition require 0(N 3 ) floating point operations and 0(N 2 ) memory storage. An iterative solver such as Conjugate Gradient (CG) require 0(N 2 ) floating point operations per iteration and 0(N ) memory storage. The computational complexity and memory storage requirement of traditional matrix inversion solver and iterative solver preclude their application to large problem. Pre-corrected FFT (P-FFT) algorithm has been proposed to overcome the drawback of traditional MOM solver. It was originally proposed to solve electrostatic problems by Phillips and White [1]. Recently Nie et al. [2] have extended the P-FFT to solve electromagnetic scattering problem by perfectly electric conducting (PEC) object. For surface scatters, P-FFT has 0(N3/2logN) computational complexity and 0(N3/2) memory requirements. The performance of the P-FFT algorithm is highly dependent on effectiveness of FFT calculation in the algorithm and an efficient implementation of PFFT algorithm has been proposed by Wang et al. [6]. In this paper, we first consider the formulation of scattering problem of 3-D dielectric bodies using Poggio-Miller-Chang-Harrington-Wu-Tsai (PMCHWT) [5] approach. We
348
349
use the RWG basis function [4] to expand the equivalent electric and magnetic currents J and M, which involved in the formulation. In our experiment, we choose RWG basis as testing function and convert the formulated EFIE and MFIE into matrix equation. This formulation is found to be free of interior resonance and yields accurate result. We solve the resulting matrix equation using Generalized Minimal Residual Algorithm (GMRES) [7] and Parallel P-FFT algorithm to reduce the computational complexity and memory requirement. 2
Formulation
In this section, we consider the electromagnetic scattering problem of a three-dimensional arbitrary shape and homogeneous dielectric body. The detail derivation can be found in [8], but for completeness of formulation, the summary of equations is given below.
nxEinc =-K-nx\^(VW [A 0 = K-nx\-^-(yV•
•Ai+k^A,)-VxFi\ J 5+ \ + klX)~
Vx
^2 \ 1
nxHmc=J-nx{VxA
(la)
+ yy
(lb) S~
^+/C'M
(lc)
0 = -J-nX VxA+^—&±M3.l {
JhVi
(ld)
\s-_
where the magnetic and electric vector potential A, and Fi for i=l,2 are given as
- e~ik'r
-
A = J*-A—
(2)
- - e'jk'r Ft=K* 4/rr
(3)
Using PMCHWT approach, we combined (la) - (lb) and (lc) - (Id), and obtained nxEinc
=-hx
^ ( V V - A, + £ 1 2 A 1 ) + ^ 2 - ( W - A 2 +
[JK
jk2
fc2A2)-VxF1 (4a
nxH
l
= - n x VxA, + V x A 2 + -
[
L
?
— +
JhVx
JKVi
-VxF2l
J
l
-^4
J
(4b) After obtained the integral equation, we approximate the electric and magnetic current using Rao-Wilton-Gilson (RWG) basis function [4]:
' = £ ' , • / • 00
(5a)
350
K = fjM,fn(r)
(5b)
i
and the integral equation is converted to a matrix equation: ZI=V
(6)
The resulting matrix equation will be solved using iterative solver and accelerated the matrix-vector multiplication using parallel P-FFT algorithm [6] with slight modification to suit our need. The pseudocode of the algorithm is given for completeness of formulation. IF (rank .eq. p0) THEN Project the panel charges to the grid charges CALL MPI_Scatter() [scatter data from p 0 to P0-P15 ENDIF Istart to compute convolution CALL 3-D FFT() ! pj computes ith FFT CALL 3-D FFT() ! Pi computes i'h FFT"1 DO i=0,15 IF (rank .eq. p0) THEN CALL MPI_RECV() ! p 0 receive result from p, Rest computation relating to convolution ELSE CALL MPI_SEND() ! pi sends result to p 0 ENDIF ENDDO lend of convolution IF (rank .eq. p0) THEN Interpolate the grid potentials back to the panels Correction(Compute nearby interactions) ENDIF 3
Numerical Results
In this section, we present numerical result to demonstrate the accuracy of the proposed method. The computation was carried out on IBM p690 PowerPC POWER4. The example we considered is a dielectric sphere having a radius of 1 m and a relative permittivity, 6^2.0. The bistatic RCS for VV and HH polarization are computed at 750 Mhz and are compared with the Mie Series solution in Fig.l. Good agreements are observed between the results. 4
Conclusion
In this paper, the P-FFT algorithm has been extended to solve electromagnetic scattering of homogeneous dielectric bodies. The problem was formulated using PMCHWT approach and discretized by MOM. The resultant matrix system was then solved by
351 iterative solver and accelerated using P-FFT algorithm. Numerical example was presented to illustrate the accuracy of proposed method.
40
^
30
Mie Series P-FFT
r^
i
r^
20
CO
o cr .a
0
"oS
-20 -30 0
20
40
60 00 100 120 Angle 6,180-9 (Degree)
140
160
180
Fig. 1: Bistatic RCS of a dielectric sphere (a=lm, 8j=2) at 750 Mhz 5
Reference 1. J. R. Phillips and J. K. White, "A precorrected-FFT method for electrostatic analysis of complicated 3-D structures", IEEE Trans. Computer-Aided Design of Integrated Circuits and Systems, Vol.16, No.10, pp. 1059-1072, Oct., 1997. 2. Xiaochun Nie, Le-Wei Li, Ning Yuan and Yeo Tat Soon, "Precorrected-FFT Algorithm for Solving Combined Field Integral Equations in Electromagnetic Scattering", Journal of Electromagnetic Waves and Applications, Vol.16, No.8, pp.1171-1187, 2002. 3. K. R. Umashankar, A. Taflove, and S. M. Rao, "Electromagnetic scattering by arbitrary shaped three-dimensional homogeneous lossy dielectric objects," IEEE Trans. Antennas Propagat., vol. 39, pp. 627-631, May 1991. 4. S. M. Rao, D. R. Wilton, and A. W. Glisson, "Electromagnetic scattering by surfaces of arbitrary shape," IEEE Trans. Antennas Propagat., vol. 30, pp. 409-418, May 1982. 5. J. R. Mautz, and R. F. Harrington, "Electromagnetic scattering from a homogeneous material body of revolution," Arch. Elek. Ubertragung, vol. 33, no. 4, pp. 71-80, Apr. 1979.
352
6. Y. J. Wang, L. W. Li. and E. P. Lee, "Parallelization of Pre-corrected FFT in Scattering Field Computation," submitted to Int. Conf. on Sci. and Eng. Computation 2002. 7. Y. Saad and M. Schultz, "GMRES, A Generalized Minimal Residual Algorithm For Solving Nonsymmetric Linear Systems," SIAM J. Sci. Stat. Comput., vol. 7, No. 3, July 1986. 8. A. F. Peterson, S. L. Ray and R. Mittra, Computational Methods For Electromagnetics, IEEE Press, New York: IEEE Press, 1998.
THE COMMON COMPONENT ARCHITECTURE (CCA) APPLIED TO SEQUENTIAL AND PARALLEL COMPUTATIONAL ELECTROMAGNETIC APPLICATIONS DANIEL S. KATZ, E. ROBERT T1SDALE, CHARLES D. NORTON Jet Propulsion Laboratory, California Institute of Technology, 4800 Oak Grove Drive, Pasadena, CA, 91109, USA E-mail: {Daniel.S.Katz, E.Robert.Tisdale, Charles.D.Norton}®jpl.nasa.gov The development of large-scale multi-disciplinary scientific applications for high-performance computers today involves managing the interaction between portions of the application developed by different groups. The CCA (Common Component Architecture) Forum is developing a component architecture specification to address high-performance scientific computing, emphasizing scalable (possibly-distributed) parallel computations. This paper presents an examination of the CCA software in sequential and parallel electromagnetics applications using unstructured adaptive mesh refinement (AMR). The CCA learning curve and the process for modifying Fortran 90 code (a driver routine and an AMR library) into two components are described. The performance of the original applications and the componentized versions are measured and shown to be comparable.
1
Introduction
The work described in this paper was undertaken to answer the following questions regarding the Common Component Architecture (CCA): How usable is the CCA software? What work is involved for a scientist to take previously written software and turn it into components, particularly for parallel components? Once the components exist and are linked together, how does performance of the componentized version of the application compare with that of the original application, again, particularly for parallel components? The paper does not deal with the question of why one might choose to use components. It assumes that the reader has an interest in using components, and wants to understand the implications of choosing to use the CCA software for this purpose. The remainder of this paper will describe the initial software, describe the componentization process, and provide and analyze the timing measurements, and finally summarize the answers to the questions. 2
The Common Component Architecture (CCA)
The CCA Forum [1] was founded in January 1998, as a group of researchers from the U.S. National DoE labs and academic institutions committed to defining a standard Component Architecture for High Performance Computing. The CCA Forum noticed that the idea of using component frameworks to deal with the complexity of developing interdisciplinary HPC applications was becoming increasingly popular. Such
353
354
systems enable programmers to accelerate project development through introducing higher-level abstractions and allowing code reusability, as well as provide clearly specified component interfaces which facilitate the task of team interaction. These potential benefits encouraged research groups within a number of laboratories and universities to develop, and experiment with prototype systems. However, these prototypes do not interoperate. The need for component programming has been recognized by the business world and resulted in the development of systems such as CORBA, DCOM, Active X and others. However, these systems were designed primarily for sequential applications and do not address the needs of HPC. The objective of the CCA Forum is to create a standard that both a framework and components must implement. The intent is to define a minimum set of conditions needed to allow high performance components built by different teams at different institutions to be used together, and to allow these components to interoperate with one of a set of frameworks, where the frameworks may be built by teams different from those building the components. The CCA forum members are developing implementations of the standard as well, both components and frameworks. 3 The Non-Componentized Software The original JPL software consisted of two units. The first was the 2dimensional, parallel version of the Pyramid unstructured Adaptive Mesh Refinement (AMR) library [4], developed at JPL over the last few years. Pyramid uses the MPI library for its interprocessor communication. The second was a driver routine for this library [2]. The driver is also parallel, but it does not have any communications routines, since they are all handled within Pyramid. All of the original software was written in Fortran 90, though Pyramid requires an additional library called ParMetis, that determines a repartitioning for the parallel version of the Pyramid library. ParMetis was only used as a binary library, and was not modified in any way in this work. The function of the software is to read in a mesh resulting from an electromagnetic problem, and to (possibly repeatedly) refine a region of this mesh. 4 Componentization of the Software The initial work on this task [3] included development of simple single component and two component example applications. After these were developed, the only problem that had to be overcome to componentize the sequential software was building a C++ wrapper for the Fortran Pyramid library, and translating the driver code into C++, as the CCA framework (Ccaffeine) required components to be written in C++. The CCA model for parallel applications is a Single Component, Multiple Data (SCMD) model. In this model, one process of each component exists on all processors. In a given processor, one components
355
communicates with another component through the framework. Intercomponent communication takes place as expected, using a library such as MPI. A component in one processor cannot communicate directly with a different component in a different processor. As mentioned above, Pyramid uses the MPI library to communicate, and the driver component does not do any communication. Thus the only real differences between the sequential and parallel versions of the application are in launching the framework in parallel and ensuring that the components are also started in parallel. For the current framework, it can be launched on multiple processors by simply starting it with mpirun —np $number $path_to_ccaffeine. Since the driver code and the Pyramid library were both written in such a way that they can run on one or more processors, no changes needed to be made to the driver or Pyramid components. 5 Timing Results For each run of the application, two times were measured, the maximum time from the before the first call to the library to after the last call to the library over the set of processors in a given run, and the wall clock time. These two times were not significantly different for any run. Figure 1 shows the results from the parallel experiments. (Sequential results are not shown in the interest of space.) Each result is the average of 5 to 10 runs.
D Without CCA • With CCA
E c
4
8
16
32
Number of Processors Figure l.Timing results for the parallel component vs. driver/library application.
These results show an insignificant difference between the speed of the component application and the driver/library application on 2 to 32 processors. In some cases, the component application is slightly faster, in others, the driver/library application is slightly faster. The key point is that
356
the scalability is unchanged between the versions; the CCA framework has no effect on how the parallel application scales. 6
Conclusions
The lessons learned in this work are: There was initially a fair amount of learning associated with use of the CCA Forum's technology, including the CCAFEINE framework. It took 2-3 months to componentize the first application, though the second was componentized fairly quickly. Once the sequential application was componentized, proceeding to the parallel application was simple. The lack of a means to write Fortran90 components is a serious shortcoming for many science applications. It is possible to get around this shortcoming, but this introduces additional work for the componentizer and adds the chance for additional errors to come into the application. Once an application is componentized, if the amount of work done in each component call is large when compared with the time needed to make a function call, it is likely that the componentized version of the application will perform well. The authors' knowledge of ongoing work within the CCA Forum leads them to believe that the first issue has been mostly resolved, and the second issue will be resolved in time, most likely in less than 9 months. Once this is done, the CCA model will be a promising method for building large singleprocessor and parallel applications. In the next year, an effort will be undertaken to continue to resolve the first two issues above (flattening of the CCA learning curve and ensuring the Fortran90 components can be used in CCA.) Additionally, plans exist to turn a climate application into a CCA application. References 1. Armstrong R., Gannon D., Geist A., Keahey K., Kohn S., Mclnnes L. C , Parker S., Smolinski B., Toward a Common Component Architecture for High-Performance Scientific Computing, Proceedings of High Performance Distributed Computing, (1999) pp. 115-124. 2. Cwik T., Coccioli R., Wilkins G., Lou J. and Norton C , Multi-Scale Meshes for Finite Element and Finite Volume Methods:Active Device and Guided-Wave Modeling, Proc. of AP2000 Millennium Mtg. (2000). 3. Katz D. S., Tisdale E. R., and Norton C. D., A Study of the Common Component Architecture (CCA) Forum Software, Proceedings of High Performance Embedded Computing (HPEC-2002), (2002). 4. Norton C D . , Lou J. Z., and Cwik T., Status and Directions for the PYRAMID Parallel Unstructured AMR Library, 8th Intl. Workshop on Solving Irregularly Structured Problems in Parallel (I5th IPDPS), (2001).
A FAST ALGORITHM FOR THREE-DIMENSIONAL ELECTROSTATIC ANALYSIS: FAST FOURIER TRANSFORM ON MULTIPOLE (FFTM) E. T. ONG AND H. P. LEE Institute of High Performance Computing, 1 Science Park Road, Singapore E-mail: onget@ ihpc. as tar. edu. sg
117528
K. H. LEE AND K. M. LIM National University of Singapore, Department of Mechanical 10 Kent Ridge Cresent, Singapore 119260
Engineering,
In this paper, we proposed an alternate fast algorithm for solving large problems using Boundary Element Method (BEM). It utilized two important features, namely the multipole expansion, and potential evaluation by discrete convolutions, via Fast Fourier Transform (FFT). We refer to it as the Fast Fourier Transform on Multipole (FFTM) method. It is demonstrated that FFTM is an accurate method, and is likely to be more accurate than Fast Multipole Method (FMM), for the same order of expansion p, at least up to p=2. It is also shown that the method has only linear growth in the computational complexity, which implies that FFTM can be as efficient as FMM.
1
Introduction
Consider an electrostatic problem with electrical conductors embedded in a homogenous dielectric, the charges a(x') induced on the conductors satisfy the integral equation
^W^'h^pbfM- " r
(1)
where ^(JC) is the applied potential, x and x' correspond to the field and source point, respectively, eis the dielectric constant, and ||JC| is the Euclidean length of x. BEM is often used to solve equation (1). However, it generates dense linear system, which requires o(n3) and o(n2) operations if solved by direct methods (Gaussian Elimination) and iterative methods (GMRES), respectively. Recent development utilizes the matrix-free feature of the iterative methods, which requires computing matrix-vector products that can be seen as potential evaluation process. This important observation has led to the developments of numerous fast algorithms that is only 0(n). One such algorithm is the Fast Multipole Method (FMM), which has been developed by Greengard and Rohklin [1], and later used by Nabors and White [2] in electrostatic problems. The efficiency of FMM arises from the effective usage of multipole and local expansions in a hierarchical manner through a series of transformation operations. Further improvements were made by Greengard, Rokhlin and Cheng [3,4], by using new compression techniques and diagonal forms for the transformation operators. Other techniques developed for solving equation (1) rapidly include, the precorrected FFT approach [5], the singular value decomposition [6] and wavelet-transform methods [7]. In this paper, we describe an alternate approach, which we referred to as Fast Fourier Transforms on Multipole (FFTM) method. It utilizes two important features, namely:
357
358
i) using multipole expansion to approximate far potential fields, ii) evaluating the approximate potential fields by discrete convolution, via FFT. In the following section, the FFTM algorithm is described. Section 3 presents some numerical examples to demonstrate the accuracy and efficiency of the method. Finally, a conclusion is given in Section 4. 2
Fast Fourier Transform on Multipole (FFTM)
This method arises from realization that the multipole expansion can be seen as discrete convolutions, which can be evaluated rapidly using FFT algorithms. The method comprises of four main steps: A.
Problem domain is subdivided into many cells, in which the panels' charges are to be represented by multipole moments. The number of cells should satisfy that required by FFT solvers. But, with the help of FFTW (Fastest Fourier Transform in the West) [8], we can perform FFT of any arbitrary size (preferably factors of small primes).
B. Transforming panels' charges q{x') into multipole moments M™ using
M: = i^Y-^e-^y'dr
(2)
The potential due to M™ is given by the multipole expansion,
«*)«£lAC^ n-0 m=-n
(3)
^
where p is the order of expansion, and Y™(O,0) is the spherical harmonics functions, respectively, which are defined as
C(M= J)KHHLH/ £/f'(cos^
(4)
with P"(cos0) being the associated Legendre function of the first kind with degree n and order m, where n is a non-negative integer, and -n<m
ym
r,
(u,*)-ZI
/ *•
M;{rj,k!)-^{f-tj-f,k-k!\ R
"I
(5) IJ
where indices (ijjc) and (i j ,k) denote the discrete locations of the field and source points, respectively. By discrete convolution theorem [9], equation (5) can be evaluated efficiently using FFT algorithms. D. The cells' potentials are then interpolated onto the panels' collocation points, using a simple quadratic interpolation scheme. This accounts for the 'distant' charges effects only. But prior that, potential correction is performed to remove the 'near' charges contributions, which are inaccurately represented by multipole expansions. Finally, the 'near' charges contributions are added directly onto the collocation points.
359
3
Numerical Examples
Different FFTM schemes are defined according to the multipole expansion order p, and the direct interaction list D;„, (layers of cells in which panels' charges are to be computed for directly). The results are compared with the GMRES explicit approach (where the full coefficient matrix is formed explicitly). 3.1 Accuracy analysis of FFTM
The 4x4bus-crossing example [2] is used here to compare the accuracy of FMM and FFTM. The results of capacitance matrix are tabulated in table 1. In general, FFTM is more accurate than FMM. This is largely due to the ways the 'distant' potential contributions are computed in the two methods. In FMM, multipole and local coefficients are used repeatedly in a hierarchical process, in which approximation errors can accumulate. On the other hand, FFTM replaces this hierarchical process by discrete convolutions that are evaluated by FFT algorithm, which is "exact" up to round-off errors. Table 1. Capacitance extraction of 4x4bus-crossing example by FMM from [2], and FFTM methods. Solution Method GMRES explicit FMM (p = 0) FMM(p = l) FMM (p = 2) FFTM (p = 0) FFTM (p = 1) FFTM (p = 2) * Note that D,isl = 2.
Cn 402.9
Cn -136.2
394.5 406.6 405.2 404.2 403.4 403.2
-124.0 -139.7 -137.8 -133.1 -136.7 -136.3
Capacitance Matrix Entry (pF) C14 C13 C/6 Cn -12.00 -7.886 -48.18 -39.90 -0.175 -2.471 -52.15 -43.39 -12.36 -6.676 -48.48 -40.45 -11.91 -8.079 -48.36 -40.09 -13.53 -6.108 -49.14 -41.53 -12.57 -8.014 -48.15 -39.63 -11.49 -7.966 -48.36 -40.05
c„ -39.90 -43.08 -40.27 -40.01 -41.27 -39.62 -40.05
Cis -48.18 -52.92 -48.46 -48.45 -49.85 -48.05 -48.34
3.2 Efficiency analysis of FFTM The self-capacitance extraction of a unit cube is used in this analysis. The cube is meshed with uniform elements, where the larger problems are generated by using finer mesh. The efficiency plots for the CPU times and memory storage requirements are given in figure 1. •O/FESeplicit/ Dus,= 1,^=1 •Dus,=X'P = 2
- • - GMRES explicit,"
•pist = 2, p = 2
-*-D;iM = 2, p = 2'
~a~Dlai= l,p= 1 - _ * Diist = \,p = X -*-Du,t = 2, p - 1
.gl.OE+02
Q.
1.C6W Rcttemsiasin
1.0E+04
Problem size, n
(a) (b) Figure 1. Plots of (a) CPU times and (b) memory storage versus problem sizes for cube example.
360
All the problems are solved more rapidly using FFTM. But more importantly, it is observed that the FFTM schemes have only linear growth in their computational complexity. This means that FFTM can be as efficient as FMM. 4
Conclusion and Future work
In this paper, we proposed and implemented an alternate fast algorithm, the Fast Fourier Transform on Multipole (FFTM) method. It is demonstrated that FFTM is an accurate method, which is likely to be more accurate than FMM for the same order of multipole expansion (at least up to p = 2). It is also shown that FFTM has only linear growth in computational complexities. This means that it can be as efficient as FMM. In fact, for a given order of accuracy, we believe that FFTM is likely to be more efficient, since FMM would need a higher order expansion in order to achieve the desire accuracy. However, the current FFTM algorithm cannot attain very high order of accuracy, because of the way the potential interpolation is done. One obvious solution is to use the local expansions in conjunction with multipole expansions. However, in this case, the 4
number of discrete convolutions scales like 0(p ), which may render the algorithm inefficient. Hence, our future work aims to implement an accurate and efficient local expansion version of FFTM. References 1.
Greengard L. and Rokhlin V., A fast algorithm for particle simulations. Journal of Computational Physic 73 (1987) pp. 325-348. 2. Nabors K. and White J., Fastcap: A multipole accelerated 3-D capacitance extraction program. IEEE Transaction on Computer-Aided Design Integrated Circuits and Systems 11 (1991) pp. 1447-1459. 3. Greengard L. and Rokhlin V., A new version of the fast multipole method for the Laplace equation in three dimensions. Acta Numerica 6 (1997) pp. 229. 4. Cheng H., Greengard L. and Rokhlin V., A fast adaptive multipole algorithm in three dimensions. Journal of Computational Physics 155 (1999) pp. 468-498. 5. Phillips J. R. and White J., A precorrected-FFT method for electrostatic analysis of complicated 3-D structures. IEEE Transaction Computer-Aided Design Integrated Circuits and Systems 16 (1997) pp. 1059-1072. 6. Kapur S. and Zhao J., A fast method of moments solver for efficient parameter extraction of MCM's. Proceedings of Design Automation Conference, CA, June (1997) pp. 141-146. 7. Levin P. L., Spasojevic M. and Schenider R., Creation of sparse boundary element matrices for 2-D and axi-symmetric electrostatics problems using the bi-orthogonal Haar wavelet. IEEE Transaction on Dielectric and Electric Insulation 4 (1998) pp. 469-484. 8. Matteo Frigo and Steven G. Johnson. FFTW, C subroutines library for computing Discrete Fourier Transform (DFT). Freeware available from http://www.fftw.org. 9. Brigham E. O., The Fast Fourier Transform and its Applications. Prentice-Hall, Englewood Cliffs, 1988.
A N A L T E R N A T I V E I M P L E M E N T A T I O N OF I N T E R P O L A T I O N IN MULTILEVEL FAST MULTIPOLE M E T H O D (MLFMM) CHAN-PING LIM, YAO-JIANG ZHANG, FANG WU AND ER-PING LI Institute High Performance Computing, 1 Science Park Road, #01-01, The Capcricorn Singapore Science Park II, Singapore 117528 E-mail: [email protected] A fast algorithm is presented to solve a two-dimensional conducting cylinder using an Electric Field Integral Equation (EFIE) formulation. The fast algorithm presented in this paper employed the multilevel fast multipole method with the use of the Lagrange interpolation. The iterative solver used is the conjugate gradient method. This algorithm has a complexity of O(NlogN). The code is verified with method of moment (MoM).
1
Introduction
The method of moments * formulation will result in a dense matrix which requires 0(N2) to solve. With the implementation of fast multipole method (FMM), the operation can be reduced to 0(N1^). And, when FMM is implemented in multilevel (which is multilevel FMM), the operation can be further reduced to 0(N\ogN) for sparse scatters, and O(N) for densely packed scatters. In our implementation, we first discretise into elements by method of moments. Each element is at least 0.1 A. Then, in every level, the elements are grouped to its respective group. We organise the non-empty groups into a quad-tree structure with the smallest group at the bottom of the inverted tree, and the largest group at the root of the tree. Empty group is not stored in the tree. When one element is located far from another element, the field due from these elements is computed by the factorised Green's function in a multilevel multi-stage fashion. However, when the elements are close to each other, the field is obtained using the traditional MoM. In order to reduce the computation time, for our aggregation and dis-aggregation implementation, we make use of interpolation between levels and we employ Lagrange Interpolation. 2
2 D M e t h o d of M o m e n t s
A simple 2D method of moment (MoM) that discretize the 2D integral equation of scattering is shown as follow1 : N
'^2Ajiai = bj,
(1)
where j = 1, 2, • • •, N and MW A , ill + U TiM
tS ;Mi nn f ^^ l l , r(2)
4
AiH^>{kPji),
)]•
i
(2)
i±j
For the self-term (i = j), the Hankel function is represented by its small argument approximation, and for the non-self term, it is approximated by a one-point integration for simplicity. A,; is the element size and (j/Ae = 0.163805).
361
362 3
M u l t i l e v e l Fast M u l t i p o l e M e t h o d in 2 D
Multilevel implementation of fast multipole algorithm is to factorise the Green's function into multifactors. The Hankel function can be written as follows with the aid of the addition theorem repeatedly 2 ' 3 . H
(krji)
o
= PjJi • PjiJi • aJih
• Phh • Phi
(3)
where (4) Ji-m(kpjlJa)e-jV-m»w
[A/^km = [Pj2i2]m,n = \PhhUp
(5)
H^_n(kpj2h)e-^-^^
=
[Phi}P,i =
(6)
Jn-P(kphh)e-j(n-rt^
(7)
Jpikpi^e-^'^
(8)
The above equation can be diagonalized as shown in the two-level case: rl-K
H{0krji)= 4
/ Jo
da%j1(a')0j1j,(a')&jlIa(a')0IaJl(a')pIli{a')
(9)
Interpolation
The integral in Eq. (9) can be written in discrete form as shown: (10) 9=1
Anterpolation between Pjj1(aq) and Pj1j2{oiq) and interpolation between Pi2ix{otq) and Pi1i(aq) are needed to achieve an O(NlogN) algorithm. Eq. (11) shows how the interpolation matrix is obtained. jji-\-l,n -'ll rn+l,n i 21
rn+l,n M2 jn-\-l,n •'22
Tn+l,n . 'lQi 7-n4-l,n l 2Qi
7-n+l,n Q2l
7-n-t-l,n L Qi2
jn-\-l,n
(11)
/3^V2)
1
^ 7 X ( « Q I ) .
Qi > Qi and we can choose Q2 = 2 x Q j . We can see the Pjj1 in the
nth-\eve\
{Pjjx) is the same in Pjjx in the nth + 1-level (P™^1) except for the number of sampling points. Therefore, we can obtain P™^1 by interpolating from the data in
363
(3™jx We can use Lagrangian interpolation to get the values of the additional points. Lagrangian interpolation in the matrix form is as follows: Ml ( 7 l )
0(71) 0(72)
W2(7l) ^2(72)
^1(72)
w
W
l(7Q2)
WQ x (7l) Qi(72)
w
/?(«i)
/3(« 2 )
(12)
2(7Q2)
where [/3] is a Q2 x 1 vector, [/3] is a <2i x 1 vector and [w] is a Q2 x Qi matrix. where i = 1, 2, • • •, Q\ and £ = 1, 2, • • •, Q2
i(7i) =
(13)
Not all the Wi(7;) have to be filled and therefore, the matrix in Eq. (12) can be made sparse. T h e aggregation matrix in our multilevel F M M formulation has a block structure as shown in Eq. (14). Vl(Qi), V2(Qi), • • •, VG(Qi) are sub-matrices and each of these sub-matrices has to go through interpolation separately.
V\Qi) 0
0 y 2 (Q,-) (14)
0 Now, let us take V1(Qj)
•••
VG(Qj)
0
sub-matrix as an example.
v : (Q 3 ) =
VAhi)
V112(7i)
^1(72)
^2(72)
VQ S I(7Q S )
V-4 3 2 ( 7 Q 3 )
(15) •••
V4 3n ( 7 Q 3 ) V?„(«i)
^2(«i)
V
1
^ )
V2\(a2)
V212(a2)
VQ2I((*Q2)
V422(QQ2)
^,("2)
= •••
(16)
V^ 2 n K> 2 )J
where n is the number of elements in the group and Q3 < Q2wi(7i)
V2U72) t&i(7Qa)J
^1(72)
w
i(7Q2)
^2(71) ^2(72)
WQ3(7I)
W2(7Q2)
WQ3(7Q2)
W
where [w] is a Q2 x Q3 matrix and i = 1,2, •,ra.
Q3(72)
V?i(«l) V2\(a2) (17)
364 5
N u m e r i c a l results
Based on the MLFMM formulation described in this report, we developed the 2D MLFMM code using Fortran 90 which consists of over 2000 lines. We also try to use the minimal built-in library in order to enhance the portability of the code. In this section, we would like to verify our MLFMM results with the MoM results. We have chosen a circular cylinder with radius of 20A for our verification. All the verifications are performed by comparing the numerical results obtained by MLFMM with MoM.
Circular Cylinder with Radius of 201
Circular Cylinder With Radius of 20 X _i
1
,
i
i
i
i
I
i
I
i
L
40-
MLFMM
35•
MoM
302520-
— -
I
-4i^
1510-
^H|
(»K
150
200
1
50
100
Degree
250
300
0
— I — ' — I — 200 400
600
800
1000
1200
1400
Current Segments
Figure 1. Current distribution and RCS calculation of a circular cylinder
From the numerical result shown in Fig. 1, we can see the results obtained using our algorithm show excellent agreement with the exact method, i.e. MoM. 6
Conclusion
The multilevel fast multipole method with the emphasis on the interpolation has been presented. The code has been verified with MoM code and excellent agreement is obtained. Future work include parallelizing of this code will be investigated. References 1. R. F. Harrington, Field computation by moment methods, (Malabar, FL:Krieger, 1982). 2. C. C. Lu and W. C. Chew,Micro. Opt. Tech. Lett. Vol.7 N o . 1 0 , 466 (1994). 3. J. M. Song and W. C. Chew,Micro. Opt. Tech. Lett. Vol.10 N o . l , 14 (1995).
FAST MATRIX ALGORITHMS FOR HIERARCHICALLY SEMI-SEPARABLE REPRESENTATIONS
S. CHANDRASEKARAN AND T. PALS Department of Electrical & Computer Engineering University of California Santa Barbara CA, USA 93106-9560 M. GU Department of Mathematics University of California Berkeley CA, USA The hierarchically semi-separable representation for arbitrary matrices is presented, and fast backward stable direct solvers for such representations are constructed. This technique generalizes earlier work by Rokhlin and his co-workers on certain kinds of fast integral equation solvers [1,3,5]. This work can also be viewed as a generalization of the work of Dewilde and his co-workers in time-varying systems theory [4]. The method has the further advantage of reproducing optimal complexities independently of the number of underlying spatial dimensions. In particular the method will reproduce the optimal complexity for solving many sparse matrices by direct factorization.
1
HSS representation
In this p a p e r we consider a new class of r e p r e s e n t a t i o n s of m a t r i c e s t h a t we t e r m h i e r a r c h i c a l l y s e m i - s e p a r a b l e (HSS) r e p r e s e n t a t i o n s . T h e HSS r e p r e s e n t a t i o n of a s q u a r e m a t r i x A is given by six sequences of m a t r i c e s Di, Uk-,i, Vk-i, Rk;i, W*;» a n d Bk;i,j • T h e r a n g e of t h e t h e indices k, i a n d j will b e c o m e clear shortly. F i r s t t h e r e is a n integer K such t h a t 0 < k < K. We will call K t h e d e p t h of t h e H S S r e p r e s e n t a t i o n . T h e r e a r e 2K n u m b e r s mx;t such t h a t D{ is a n TUK;j x rn,K;i m a t r i x , a n d 5^j rriK;i is t h e n u m b e r of rows (columns) of A. We now define t h e a d d i t i o n a l n u m b e r s m^i = m,k+v,2i-i + w}/fc+i;2j where 0 < k < K a n d 1 < i < 2k. We now define a d d i t i o n a l m a t r i c e s Uk;i b y t h e following recursion UK,i
= UU
K
1 < i < 2
Uk-,i _= f U " k+ « i-;2i-iR ! - ^ k-+v,2i-i *"«+^-i } , 1
n0 ^< i k. ^<* K, -
i1 ^< „i• ^< ok 2
k+V,2iRk+l;2i• -• Uk+iM-K:
Similarly we define t h e a d d i t i o n a l m a t r i c e s VK;i
= UU
l
^;i=f^2'-1^+1^-1>), V
Vk+l;2iWk+l;2i
0
l
j
T h e m a t r i c e s Uk-,i a n d Vk;i have m^i rows. T h e n u m b e r of columns t h e y have a r e d e t e r m i n e d by t h e n u m b e r of rows of Rk-i a n d Wk;t respectively. P a r t i t i o n t h e m a t r i x A recursively as follows A);l,l =
A,
365
366 "lfc+l;2i-l
Ak.it
= ' '
m
1
2
*+ ^ '-
"ljfc+l;2t
1
M*+ii«-i.2*-i
"l*+l;2i
^+i;2i-i,2i A
V •^•*+l;2«,2t-l
^fc+l;2i,2t
o < fc < iT,
Note that only the diagonal blocks are recursively partitioned. representation is chosen such that AK-,i,i =Di: 1 < t < 2K, Akiij = UktiBkiiiJVi^, 0
1 < i < 2k.
/
Then the HSS
l
j = i±l,
It can be shown that given a sequence of 2K numbers rriK;i such that £ ^ niK„i = n, every n x n matrix has a corresponding HSS representation, and that this representation can be constructed in 0(n2) flops in the general case. The HSS representation is interesting only when it requires far fewer parameters than the usual representation of a matrix. For this to be the case the rank of Bk]itj must be much smaller than mk]i and mk)j in a certain well-specified sense. 2
Fast multiplication
First we consider how to multiply a matrix in HSS form with a regular vector (matrix). The technique is identical to the usual fast multipole methods. Hence we just summarize the recursions here. Suppose A is in HSS form and we wish to compute z = Ab. We first row-partition b according to the sequence mK-,i and call the i-th partition &JC;*. Then the following recursions Gk,i = Wk+1.2i-iGk+l;2i-l -Ffc,2i-1 = Bk;2i-l,2iGk;2i
+ Wk+1.2iGk+V,2i,
+ •R*;2i-1-Pfc-1,»)
Fk,2i = Bk]2i,2i-lGk-2i-l
+
Rk;2iFk-i-i,
with GKJ = ^K^K\i and Foji = 0, can be computed rapidly. We then observe that ZK;i = Dibx-,% + UK;iFji-,i, where ZK-,% is obtained by row-partitioning z according to the sequence m,Kti3
Fast stable solver
We now consider how to compute x rapidly, where Ax = b, and A is available in HSS form. The algorithm can presented in a recursive fashion. The recursion takes one of three forms. Case 1. Compressible off-diagonal blocks. We begin by observing that block row i, excluding the diagonal block Dj, has its column space spanned by the columns of UKJ- Hence if the number of columns of UK,i, denoted by riK;i is strictly smaller than mi, the number of rows in that block, we can find a unitary transformation qn-,i such that fr
-nH
TT
_™,i-
nK;i (
0
\
367 We now multiply block row i by (ff. The change in the off-diagonal blocks is represented by the above equation since all of them have UK-,1 as the leading term. The i-th block of the right-hand side changes to become H ,
_ mi-nK.ti
(PK-,i\
We also observe that Di, the diagonal block has become q^i^iunitary transformation WK;i such that "is - nK-,i f)
H
- (n
n \,„
H
-
m
n
i~
K;i (
%i-MK;i-njr.
y
Now we pick a
nK;i
-Di;l,l
0
\
Di2i
A;2J.
We then multiply the block column i from the right by w^.t. The change in the diagonal block is represented by the above equation. The off-diagonal blocks in block column i have V^.{ as the common last term. Hence we just need to multiply VK-,1 to obtain i7
_ „„
-
T/
m
i
~ nK;i
( ^ i « ^\
nK;i
\VK;iJ
Since we multiplied block column i from the right by tiif.j, we need to replace the unknowns x^-i
by WK-^XK-X„„
„
_
m
i -
n
K ; i
( ZK;i\
/, v
At this stage the first mi — nn-i equations in block row i read as follows -Di;i,iZjc;i = PK-,1, which can be solved for ZK,I to obtain ZK-,% = D^iipK;iWe now need to multiply the first m* — n,K;i columns in the block column i by ZK-,% and subtract it from the right-hand side. To do this efficiently we observe that the system of equations has been transformed as follows (diag(g^.j) J 4diag(w^. i )) (diag(w.R;;j) x) = diag(q>^.i) 6. If we define the vector \
0
)
we then observe that the stated subtraction can be re-written as follows b = diag(g^. i )6 — (diag(g^. j ) AdiAg(w^.i)) z. We can do this operation rapidly by observing that (diag(g^. i ) Adiag(w;^. i )), has the HSS representation {.Dj}?=1, {UK;i}i=1, {VK;i}i=l, {{-Rfc;i}i=l}*=0> {{Wk;i}i=1} k=0, {{Bk;2i-l,2i} i=1 } k=0, {{Bkt2i,2i-i}j=i }{*=o> a n d using the fast multiplication algorithm in section 2. Of course, the algorithm can (and should) be modified to take advantage of the zeros in DKti, UK-i, and zK.t. Once the subtraction has been done, we discard the first nn — nn;i columns of block column i and the first mj — « K ; J rows of block row i. We observe that this leads to a new system of equations of the form Ax = b, where £ K1
'
_ m< - nK-,i (* \ nK,i V ^K;») '
368 and
A has the HSS representation
{23i;2,2}f=1,
{UK-,i}2i=\,
{VK;t}j=i>
{{RkAti}^ {{WkAt^=0, {{BtiJi-i,2i}£'}£o. {{s*i«,M-i}?:r}^o-
Therefore we are left with a system of equations identical to the one we started with and we can proceed to solve it recursively. Once we have done that we can recover the unknowns x from z and x using the formulas
Case 2. Incompressible off-diagonal blocks. It occurs when all block rows for the system cannot be compressed any further by invertible transformations from the left. In this case we proceed to merge adjacent block rows and columns. We do so as follows f) _ (
D2i-1;2,2
UK;2i-lBK;2i-\,2i^K;2i\
\UK;2iBK;2iai-^K;2i-\ fj
_ f
T>
_ (
\
VK-l;i - y
£>2i;2,2
/ '
UK;2i-lRK;2i-l\ UK;2iRfC;2i
) '
VK;2i-lWK;2i-l\ VK;2iWK;2i
J '
We then see that A has an HSS representation given by the sequences: {Z),}? =1 ,
{u)t;\
{v)t;\
{{^,,}ti}f=- 0 1 .
{{WkAt.)k=0\
{{Bk;2i-i,2i}Ci}h\
{{Bk-,2i,2i-i}i=i }%=<}• Let us denote by A the matrix with this HSS representation (of course, A = A). We then observe that the system of equation is now in the form Ax = b, which is exactly in the form we started with, except that the new HSS representation has depth K — 1. Hence we can solve this system of equations recursively for x. Case 3. N o off-diagonal blocks. Observe that if K — 0 the equations read D\x = b, which can be solve by traditional means for x. This case terminates the recursion. References 1. J. Carrier, L. Greengard, and V. Rokhlin, "A fast adaptive multipole algorithm for particle simulations", SIAM J. Sci. Stat. Comput., 9, pp. 669-686, 1988. 2. S. Chandrasekaran and M. Gu, "Fast and stable algorithms for banded plus semi-separable matrices", submitted to SIAM J. Matrix Anal. Appl., 2000. 3. Y. Chen, "Fast direct solver for the Lippmann-Schwinger equation", submitted to Advances in Computational Mathematics, http://www.math.nyu.edu/faculty/yuchen/onr/intro.htm, 2001. 4. P. Dewilde and A. van der Veen, "Time-varying systems and computations", Kluwer Academic Publishers, 1998. 5. P. Starr and V. Rokhlin, "On the numerical solution of 2-point boundary value problem.2", Communications on Pure and Applied Mathematics, Aug. 1994, vol. 47, no. 8, pages 1117-1159.
PARALLEL FAST MULTIPOLE METHOD FOR LARGE-SCALE COMPUTATION OF ELECTROMAGNETIC SCATTERING
ZHANG YAOJIANG, WU FANG, EDWIN LIM CHAN PING, LI ERPING Division of Computational Electromagnetics & Electronics, Institute of High Performance Computing, 1 Science Park Road, #01-01, The Capricorn, Singapore, 117528 Tel: (65)-64191542, [email protected]
Parallel fast multipole method is developed for RCS computation of large targets on multiple processor systems. Sparse Sub-matrices in FMM are calculated and stored in distributed system and MPI is used to perform the matrix-vector multiplication in iterative solution. The accuracy of code is verified by comparison with other publications. Either the computational complexity or speed-up ratio of parallel FMM and parallel MoM is compared.
1. Introduction In recent years, fast multipole method (FMM) has been developed into the most powerful and efficient algorithm in simulating electromagnetic scattering problems [1][2]. FMM reduces the computational complexity from o(N2) of moment method (MoM) to o(NL5) or 0(N\ogN) in its multilevel version, where AT denotes unknowns. However, theoretical prediction of radar cross section (RCS) of an aircraft with hundreds wavelength size often requires more than ten millions of unknowns to model the current density on its surface. Single processor systems can not provide sufficient memory not to mention unreasonable CPU time required. Therefore, parallel version of FMM needs to be developed to fully utilize multiple processor computing platforms [3]. The parallel codes of fast multipole method are developed and successfully implemented on either UNIX or LINUX operating systems. In parallel FMM, the near-group interactions, aggregation and translation procedures in FMM are distributed stored and computed. Message passing interface (MPI) library is used to communicate among different processors in iterative solution procedure, such as conjugate gradient method (CG). Special attention is paid on storage of sparse matrices to reduce RAM costs. The parallel efficiency of our source codes has been also discussed. The accuracy of our codes is verified by the calculation of other authors. 2. Brief review of the algorithm Either moment method (MoM) or fast multipole method is designed to solve the integral equations in electromagnetic scattering, e.g. following electric field integral equation (EFIE) of conducting targets: n x Einc = jk0rjn x I G(r, r') J(r' )dS'+ - ^ V • £ G(r, r')(V • J(r' ))dS'
369
(1)
370
where n i s the normal vector of the surface and J(r') denotes the current distribution; E'"cis the incident wave and k0, the wavenumber. T] stands for the wave impedance in free-space. G(r,r')is the 3D Green's function and expressed as following in FMM:
GneNGm
Ir,-r,l
G(r,,r,.):
^-*(I-kk)e *"""r''^(k-LV2i I 167T
(2.1) Gn e FGm (2.2)
where Gn denotes the n-th group of triangles in FMM and source point r. e Gn, target r € G • NGm and FG stands for near-group and far-group set of Gm • Symbol TL (k • r„„) represents the translation of the group G„ to Gm and is calculated as following: L
^(k-fm„) = I £(-7)'(2/ + lA (2, (V m J^(k-f m „)
(3)
/=0
In which, /j ( (2) ()is second kind spherical Hankel function and P ; ( ) i s the Lendgre polynomials. In the aid of (2.2) and following the MoM, integral equation (1) can be cast into following FMM linear equations:
where vector [at
a2
A
A
IG
A
A
"21
"22
2G
A
A
"Gl
"G2
a c f and [b,
a, a
2
GG. . a G .
b2
V =
b2
(4)
.bG.
b c ]T stands for the unknows and
incident wave, respectively. Sub-matrix Amn denotes the interaction of group G to G , and is expressed as:
AM„ —
M.
G„ e NG.
V ^ V
G„ e FGn
(5)
where M,, is calculated as conventional MoM. v„ is the aggregation of group G and symbol'+' denotes the transpose conjugate. T,^ is the translation matrix of G to G Therefore, the final form of the FMM linear equations can be written as: (ZjVW+V+TV)a-b
(6)
371 where matrices zNN, T and V are called near-interact, translation and aggregation matrix, respectively. They are all sparse one and thus make the computational complexity from MoM's 0(N 2 )toFMM's 0(NLS).
3. Parallel strategy Most time-consuming part of equation (6) is matrix filling and iterative solving. Therefore the matrices ZNN, T and V are calculated and stored partly in different processors. To do so, groups of FMM are distributed equally or almost equally to each processor,.i.e. equation (4) is partitioned into processors by rows. Matrix-vector multiplication are calculated in each CPU and MPI is used to communicate among CPUs to get the input vectors in each iteration. 4. Numerical examples and discussions To verify our source code which is written in FORTRAN 90, mono-static RCS for open cavity is showed in Figure 1. Our calculation agrees very well with that obtained by other authors [4]. We also have tested our code's complexity in aspect of either memory requirement or CPU time versus those of MoM. Figure 2 (a) and (b) demonstrate the advantage of FMM over MoM. It can be seen that our code has achieved the predicted 0(NLS) complexity. Figure 3 compares the speed-up ratio of FMM and MoM versus different processors used. Here, parallel MoM is the special FMM where all the groups are treated as near group set. It can be seen that parallel MoM has better parallel efficiency than parallel FMM. This can be explained by the fact that in parallel MoM, both aggregation and translation matrix are not calculated and involved in matrix-vector multiplication in iterations. Therefore, there is only one matrix-vector multiplication, i.e. it needs only one time to communicate among processors. On the other hand, from equation (6) there are at least five times communications in parallel FMM. From Fig.3(b), we can see that the efficiency of parallel FMM becomes better when dealing with larger problems. Thus parallel FMM is very promising for large-scale computations of electromagnetics.
e (degrees)
e (degrees)
Fig.l Monostatic scattering from an open cavity.
372 — u
1 —•—FMM — « — MoM
y
"
•
••^••1.2350859*10" 4 -/v"* 07a5
w /
•
\
\
t
y Y*
•SJ*
-•—FMM -•—MoM
lev 31=4
level=3
**^y
,'
••••"• 1.23027*1 O ' - N " " ™ '
]
|-
!
N Unknowns
(a)
(b)
Fig.2 Complexity comparison of parallel FMM and MoM
| |
-1 1 ! — ^ — unknowns=10,368 ^ /
Processor number
(a)
7
^ f Spe ed-up Rat
1 ' 1 • 1 • 1 • — • — P a r a l l e l FMM — • — Parallel MoM
/y£
- * - N =56,448 rj =10,368
: :
Processor Number
(b)
Fig.3 Comparison of speed-up ratio (a) parallel FMM and MoM (b) parallel FMM with different unknowns
Reference
4.
Eric Darve, The fast multipole method: numerical implementation, J. Comput. Phys., 160 (2000) pp. 195-240. R. Coifman, V. Rokhlin, and S. Wandzura, The fast multipole method for the wave equations: a pedestrian description, IEEE Antennas & Propagat. Mag., 35,no.3, (1993)pp.7-12. W.C.Chew, J. M. Jin, E. Michielssen, and J. M. Song, Fast and efficient algorithms in Computational electromagnetics, chapter 4, Artech House, Inc, 2001 J.M.Song, W.C.Chew, Multilevel fast multipole algorithm for solving combined field integral equations of electromagnetic scattering, Microwave & Opt. Tech. lett., 10, no. 1,(1995) pp. 14-19.
T W O C L A S S E S OF P R E C O N D I T I O N I N G T E C H N I Q U E S F O R ELECTROMAGNETIC WAVE SCATTERING PROBLEMS J U N ZHANG
AND
JEONGHWA LEE
Laboratory for High Performance Scientific Computing and Computer Simulation, Department of Computer Science, University of Kentucky, Lexington, KY 40506, USA E-mail: jzhangQcs.uky.edu, [email protected]
Department
CAI-CHENG LU of Electrical and Computer Engineering, University of Kentucky, KY40506, USA E-mail: [email protected]
Lexington,
We investigate preconditioned iterative solutions of large dense complex valued matrices arising from discretizing the integral equation of electromagnetic scattering. The main purpose of this study is to evaluate the efficiency of preconditioning techniques based on incomplete LU (ILU) factorization and sparse approximate inverse (SAI) for solving this class of dense matrices. We solve the electromagnetic wave equations using the BiCG method with the preconditioners in the context of a multilevel fast multipole algorithm (MLFMA). The novelty of this work is that the preconditioners are constructed using the near part block diagonal submatrix generated from the MLFMA. Experimental results show that the ILU and SAI preconditioners reduce the number of BiCG iterations substantially.
1
Introduction
The hybrid surface-volume integral equations for electromagnetic scattering problems can be formally written as follows, {Ls(r,r')
• J3(r') + Lv(rS)
-E + Ls(r,r')
• Js(r')
• Jv(r')}tan = -E&(r),
+ Lv(r,r')
• Jv(r')
= -£
inc
(r),
r e S, r e V,
where Emc stands for the excitation field produced by an instant radar, the subscript "tan" stands for taking the tangent component from the vector it applies to, and Ln,(£l = S,V), is an integral operator that maps the source JQ to electric field E(r) and it is defined as La(ry)
• Jn(r')
= iu>fib f (I + fe^VV) e ^ l ^ ' l / \ 4 n \ r - r'\) • J n ( r ' ) d n ' . Jn We follow the general steps of the method of moments (MoM) 5 to discretize the hybrid surface-volume integral equations, and solve the resultant matrix equation by a multilevel fast multipole algorithm (MLFMA), which is a multilevel implementation of the fast multipole method (FMM). The basic idea of the FMM is to convert the interaction of element-to-element to the interaction of group-to-group. Using the addition theorem for the free-space scalar Green's function, the matrix-vector product Ax can be written as Ax = {AD + AN)x +
VfAVsx,
where V/, A, and Vs are sparse matrices. The FMM speeds up the matrix-vector product operations and reduces the computational complexity of a matrix-vector
373
374 AD + AN
Figure 1. The sparse data structure of a dense matrix A (PIA) from electromagnetic scattering.
product from 0(iV 2 ) to 0(N1-5), where N is the order of the matrix 2 . The computational complexity is further reduced to O(NlogN) with the multilevel fast multipole algorithm (MLFMA) 1. The sparse data structure of a partitioned dense matrix A (the P I A case) is shown in Fig. 1. The left panel of the Fig. 1 shows the block diagonal part AD, the right panel of the Fig. 1 shows the block near-diagonal part AD + Aiv, and the far part of A is scattered in the rest of the area of the right panel of the Fig. 1. We iteratively solve the linear system of the form Ax = b, where the coefficient matrix A is a large scale, dense, and complex valued matrix for electrically large targets. We propose to use an incomplete lower-upper (ILU) triangular factorization with a dual dropping strategy 3 ' 4 and a sparse approximate inverse (SAI) preconditioner 6 to construct a preconditioner from the near part matrix (AD + AN) in the MLFMA implementation. By not using a static (prespecified) sparsity pattern, we hope to capture the most important (large magnitude) entries in constructing both preconditioners, while not to consume a large amount of memory. In our experimental study, we use the BiCG method as the iterative solver, coupled with the two preconditioning techniques.
2
Preconditioners
Most preconditioning techniques, such as the ILU(O), rely on a fixed sparsity pattern, obtained from the sparsified coefficient matrix by dropping small magnitude entries. Some SAI techniques need access to the full coefficient matrix (to construct a sparsified matrix), which is not available in the FMM. We evaluate two preconditoners, the ILU preconitioner with a dual dropping strategy (ILUT) (a fill-in parameter p and a drop tolerance r) 3 ' 4 and the SAI preconditioner 6 , using the no preconditioning case as comparison. The total storage of the ILUT preconditioner is bounded by 2pN. Here r controls the computational cost, and p controls the memory cost. We sparsify the AD + AN matrix with a drop-tolerence ei, and then construct the SAI preconditioner using the sparsified matrix A with two dropping tolerences «2 and e3 which are chosen by a heuristic process. By judiciously choosing those parameters, we are able to construct both preconditioners that are effective and do not use much memory space.
375 3
Numerical R e s u l t s and Analysis
In this section, we present examples to demonstrate the efficiency of the ILUT preconditioner and the SAI preconditioner compared to the case without a preconditioned in speeding up the BiCG iterations. ° The test problems are described in Table 1 and some numerical results are listed in Table 2.
Table 1. Information about the matrices used in the experiments (Ao is the wavelength in meters). cases P1A
level 4
P3A
7
unknowns 816 100,800
matrices AD AD + AN AD
AD + AN
nonzeros 25,122 53,296 3,571,808 7,211,632
target size and description 6x2 Conducting plate 22.25 X 22.25 Antenna array
In the case of P3A, the number of the ILUT iterations and the SAI iterations and the CPU time are relatively small compared to the BiCG method without a preconditioner. The sparsity ratio (ration) of the ILUT and SAI preconditioners is less than 1. Both preconditioners do not need a large amount of memory space. Fig. 2 shows the convergence history of the residual norm of the BiCG method with the SAI preconditioner, with the ILUT preconditioner, and without a preconditioner. It is shown that the BiCG method with both ILUT and SAI preconditioners converges faster than that without a preconditioner. Furthermore, SAI is seen to be faster than ILUT. Eigenvalues of the ILUT preconditioned matrix and the SAI preconditioned matrix are shifted to the right hand side of the origin and they are away from zero. They are also very compactly clustered around 1 (see the middle and third panels of the Fig. 3). In addition, the condition number of the eigenvector matrix of the preconditioned matrices is significantly decreased (the ILUT and SAI preconditoned matrices are closer to normal than the original matrix A). These features explain the good convergence behaviors of the ILUT preconditioned matrix and the SAI preconditioned matrix. According to our numerical experiments, we can see that the ILUT preconditioner and the SAI preconditioner constructed from the near part matrix improves the computational efficiency in the sense of reducing both memory and CPU time. The results show that the BiCG method with the ILUT preconditioner and the SAI preconditioner is robust for solving three dimensional model cases from electromagnetic scattering simulations. More deatiled numerical results can be found in a technical r e p o r t 3 .
a All cases are tested on one processor of an HP Superdome cluster at the University of Kentucky. The processor has 2 GB local memory and runs at 750 MHz. The code is written in Fortran 77 and is run in single precision.
376 Table 2. Numerical data of solving the P3A case (unknowns = 100,800). prec NONE ILUT SAI
r
V
lO-0
30
£1
£2
£3
0.03
0.04
0.02
ration
LiUcpu
itnum
114.07
0.77
423 42
103.69
0.27
31
3018.69 330.21
compcpu 3018.69 444.28
237.61
341.30
ttcpu
nmm Figure 2. Convergence history for solving the P3A case.
(LU^A
MA
%
Figure 3. Eigenvalue clustering of the matrices in the P1A case.
Acknowledgments This research work was supported in part by NSF under grants CCR-9902022, CCR9988165, CCR-0092532, ACI-0202934, and ECS-0093692, by DOE under grant DEFG02-02ER45961, by ONR under grant N00014-00-1-0605, by RIST (Japan), and by the University of Kentucky Research Committee. References 1. W.C. Chew, J.M. Jin, E. Midielssen, and J.M. Song, Artech, Boston, 2001. 2. R. Coifman et al., IEEE Antennas Propagat. Mag., 35(3):7-12,1993. 3. J. Lee, J. Zhang, and C. Lu, Technical Report No. 342-02, Department of Computer Science, University of Kentucky, Lexington, KY, 2002. 4. Y. Saad, Numer. Linear Algebra Appl., l(4):387-402, 1994. 5. Y.V. Vorobyev, Gordon & Breach Science Publishers, New York, 1965. 6. J. Zhang, Appl. Numer. Math., 35:67-86, 2000.
ARBITRARY ORDER EDGE ELEMENT FOR 2D EM SCATTERING
METHODS
K. MORGAN, P. D. LEDGER, O. HASSAN, N. P. WEATHERILL Civil & Computational Engineering Centre, University of Wales Swansea, Swansea SA2 8PP, Wales, U.K. E-mail: [email protected] J. P E R A I R E Aeronautics & Astronautics, MIT, Cambridge, MA 02139, U.S.A. E-mail: [email protected] Electromagnetic wave scattering problems in 2D are solved by means of an arbitrary order edge element approach. The development of an a-posteriori error estimator enables bounds to be placed on the scattering width distribution, which is a computed output of practical interest. Attention is drawn to the possibilities offered by the use of methods based upon reduced-order approximations.
1
Introduction
Edge element methods have proved to be extremely popular in the field of computational electromagnetics. An important contribution was made by Demkowicz and co-workers 1 , who developed a two dimensional hierarchical basis for edge elements which enabled fully adaptive hp approximations to be computed. The approach outlined in this paper follows similar lines, but employs the shape functions defined by Ainsworth and Coyle 2 . An a-posteriori error estimation capability is added by employing the approach developed by the group of Patera and Peraire 3 . The error estimator enables inexpensive, sharp, rigorous and constant free bounds to be obtained for the numerical error in computed outputs of practical interest. The error estimation technique is incorporated within the high order element framework and, for the selected application area of electromagnetic wave scattering problems, the selected output is the scattering width. We also draw attention to the use of reduced-order approximations in scattering simulations. These models are constructed from full finite element solutions for a small set of problem parameters and enable the rapid prediction of outputs for a new set of parameters 4 . 2
T h e E M Scattering P r o b l e m
Consider problems involving the interaction between single frequency electromagnetic waves and a scattering obstacle. The waves are generated by a source located in the far field and the obstacle, which is surrounded by free space, will be assumed to be a perfect electrical conductor (PEC). The surface of the scatterer is denoted by Tj for transverse magnetic (TM) simulations and by T2 for transverse electric (TE) simulations. The classical formulation of the problem is governed by Maxwell's equations and the unknowns are taken to be the scattered electric and magnetic field intensity vectors, E and H respectively. Far from the scattering obstacle, the
377
378 scattered field consists of outgoing waves only. To simulate this condition, a finite solution domain, Qf, surrounding the scatterer is selected and a perfectly matched layer (PML) technique is employed 5 . To achieve this, an artificial material layer, Clp, is added to Qf, with the outer surface of the PML denoted by T 3 . 2.1
Variational
Formulation
The solution is sought in the frequency domain, assuming a time variation of the form e l w t . Then, in terms of an unknown U, which is equal to the amplitude of the scattered electric field for T E simulations and of the scattered magnetic field for TM simulations, a weak variational formulation of the problem may be expressed as 6 : find U € ZD, such that A(U,W)=£{W)
VW€Z
(1)
The spaces employed here are defined by ZD = {v | v e W(curl; Q); n A v = - n A U{ on T 2 ; n A v = 0 on T 3 } Z = {v | v <= W(curl; fi); n A v = 0 on r 2 , n A v = 0 on T 3 }
(2) (3)
1
and this simplified formulation is valid for scattering problems involving prescribed non zero values of u). 2.2
Galerkin
Approximation
With finite element subspaces Zu of Z and Z^ of ZD, the Galerkin approximate solution UH 6 Z^ is such that
A(uH,w) = e{W)
vwezH
(4)
When edge elements are used to discretise the solution domain, an approximation of the W(curl;fi) space is obtained in which the tangential component of the solution is continuous across element edges. The family of arbitrary order triangular and quadrilateral edge elements proposed by Ainsworth and Coyle 2 is adopted. 2.3
Computing the Scattering
Width
In 2D scattering simulations, a non linear output of primary interest is the scattering width, or the radar cross section per unit length. The evaluation of this quantity requires the use of a near field to far field transformation, using solution information obtained on a collection surface, T c , lying in free space and totally enclosing the scatterer. When the approximate solution JJu has been computed, the scattering width integral is evaluated as S(C7H;d>) = C°{UH;
(5)
where tj> is the viewing angle, C° (t///; (j>) is defined in terms of an integral over the collection surface, Tc and the overbar denotes the complex conjugate.
379 3
Error Estimation
Outputs computed from solutions obtained on discretisations with a sufficiently high p and small enough mesh spacing h will, in general, be indistinguishable from the exact. However, such solutions can be expensive to compute. It is, therefore, important to be able to evaluate strict upper and lower bounds for specified outputs, such as the scattering width S(Uh,
(6)
To accomplish this, an extension of the a-posteriori finite element error bound procedure proposed by Sarrate, Peraire and Patera 3 for the Helmholtz equation may be employed. The method, which is capable of dealing with quadrilateral, triangular or hybrid discretisations 7 , reduces to a requirement for solving local Neumann sub-problems inside each element, with the balance between elements ensured by using Demkowicz's edge fluxes8. Linearisation is necessary when non-linear outputs, such as the scattering width, are considered, with the variable <j> interpreted as the viewing angle. Note that, with these procedures in place, an adaptive mesh procedure, based upon the computed error bound gap and with error indicator, A, defined as ^ = \ (s+ ~ s~)
(7)
can be readily implemented 7 . 4
Reduced Order Approximation
Reduced-order, or low-order, models have been shown to provide a powerful method for computing outputs in the areas such as turbomachinery 4 . When reduced-order models are employed, it is important to be able to construct associated rigorous constant free error bounds on the computed outputs 9 . 4-1
Bounding the Complete Scattering Width
Distribution
For scattering problems, a reduced-order model technique can be employed to provide a method of extending the pointwise error bounding capability described above by enabling the rapid calculation of error bounds for the scattering width for the complete spectrum of viewing angles 10 . 4-2
Constructing the Scattering Width Distribution for Different Incident
Angles
For design purposes, the analyst will often be interested in the calculation of the scattering width distribution for all possible incident wave angles. This information may also be rapidly determined using a reduced-order model and associated certainty bounds may be computed 11 .
380 5
Conclusions
An hp edge element procedure for the simulation of 2D electromagnetic wave scattering problems in the frequency domain has been outlined. Arbitrary order edge elements may be employed, with the computational domain truncated by using the PML approach. Error bounds may be determined on outputs of electromagnetic scattering problems. The practicality of using the method to bound the computed scattering width at prescribed viewing angles is of particular interest to aerospace engineers. The possibilities offered by the application of reduced-order modelling techniques to this problem area have also been noted. Acknowledgements Paul Ledger acknowledges the support of the UK Engineering and Physical Sciences Research Council (EPSRC) in the form of a PhD studentship under grant GR/M59112. Jaime Peraire acknowledges the support of EPSRC in the form of a visiting fellowship award under grant GR/N09084. References 1. L. Demkowicz and L. Vardapetyan, Comp. Meth. Appl. Mech. Eng. 152, 103 (1998). 2. M. Ainsworth and J. Coyle, Comp. Meth. Appl. Mech. Eng. 190, 6709 (2001). 3. J. Sarrate, J. Peraire and A. T. Patera, Int. J. Num. Meth. Fluids 3 1 , 17 (1999). 4. K. E. Wilcox, J. Peraire and J. White, Comp. Fluids 3 1 , 369 (1999). 5. J.-P. Berenger, J. Comp. Phys. 114, 185 (1994). 6. P. D. Ledger, O. Hassan, K. Morgan and N. P. Weatherill, Int. J. Num. Meth. Eng. 55, 339 (2002). 7. P. D. Ledger, K. Morgan, J. Peraire, O. Hassan and N. P. Weatherill, Int. J. Num. Meth. Fluids (2002) in press. 8. L. Demkowicz, ed. P. Ladaveze and J. T. Oden, New Advances in Adaptive Computational Methods in Mechanics (Elsevier, New York, 1998). 9. L. Machiels, Y. Maday and A. T. Patera, Comp. Meth. Appl. Mech. Eng. 190, 3413 (2001). 10. P. D. Ledger, K. Morgan, J. Peraire, O. Hassan and N. P. Weatherill, Fin. Elem. Anal. Des. (2002) in press. 11. P. D. Ledger, J. Peraire, K. Morgan, O. Hassan and N. P. Weatherill, submitted to J. Comp. Phys (2002).
PARALLELIZATION OF PRECORRECTED-FFT IN SCATTERING FIELD COMPUTATION YAO-JUN WANG The Computational
Electro-Magnetics & Electronics (CEE) Division, Institute of High Computing (IHPC), Singapore 117528
Department of Electrical and Computer Engineering National University of Singapore, 10 Kent Ridge Crescent, Singapore E-mail: [email protected] LE-WEI LI Department of Electrical and Computer Engineering The National University of Singapore, 10 Kent Ridge Crescent, Singapore High Performance
Computation for Engineered Systems (HPCES)
Singapore-MIT
Alliance (SMA), Singapore/USA E-mail: [email protected]
Performance
119260
119260
Programme
119260/02139
ER-PING LI The Computational
Electro-Magnetics & Electronics (CEE) Division, Institute of High Computing (IHPC), Singapore 117528 E-mail: [email protected]
Performance
Precorrected-FFT(PFFT) is a powerful algorithm for analyzing electromagnetic scattering with arbitrarily shaped three-dimensional objects. For available PFFT code running on a single processor machine, it will be impossible to get results in short time when the number of unknowns is huge. So it is necessary to perform the computation on high performance computers in order to efficiently solve the above problems. Actually for a real object, the scattering on an object due to an incident plane wave need to be computed from a range of continuous incident angles. Computation of different incident angles can be done parallel. When using PFFT algorithm, majority of time is spent on solving the matrix with FFT and the correction operation. If these two parts are parallel executed respectively, execution time can be reduced greatly. This paper presents the parallelization of PFFT algorithm and its implementation for computing large scalable scattering problems on high performance multiprocessor platforms and clusters. MPI(message passing interface) has been used for parallelizing the code.
1 Background High performance computers provide better platforms that solve EM problems. This paper presents the parallelization of Precorrected-FFT (PFFT) algorithm for large scalable scattering problems. The Precorrected-FFT algorithm is an excellent fast algorithm that can be applied in a wide variety of EM fields. Its best cost is 0(NlogN)[l]. Similarly to the other algorithms such as the fast multipole algorithm(FMM), the main difficulty of PFFT is how to approximate the long range potentials and how to compute the local interactions. The basic idea of PFFT is that
381
382
uniform grid potentials are used to represent the long distance potentials and directly calculate the nearby interactions. This includes four steps that are (1) projecting onto a grid, (2) computing grid potentials, (3) interpolating grid potentials and (4) precorrecting, respectively. The figure 1 displays the procedure [1,2].
< .
N
/<3>V
Figure 1. 2-D representation of the procedures of the Precorrected-FFT algorithm ( p = 2 )[1.2]
The code is written by Fortran 90 and runs on IBM p690. 2 Parallel Precorrected-FFT Algorithm In view of the problem of scattering, parallelization can be carried out in two ways. One layer is done according to incident angles and the other is parallelization of PFFT algorithm. 2.1 The first layer of parallelization Generally, 180° x 360° scanning need be done to get a complete distribution of scattering for asymmetric objects. Of course, computation can be reduced by half when objects are symmetric. So for a real object, the scattering on an object due to an incident plane wave need to be computed from a range of continuous incident angles. The whole incident angles can be divided into n groups by the sum of available processors as equal as possible. Each processor is responsible for the computation of a group. 2.2.1 The algorithm of the second layer of parallelization Theoretically, each of four steps of Precorrected-FFT algorithm can be parallel executed. However, the statistics of the execution time of each step shows that the third step (interpolating grid potentials) and the fourth step (correction) occupy most of CPU time, about 10-30% and 40-60% respectively. Let the variable rank represent the number of a processor and the first processor in a group of processors is numbered as po while the other
383
processors in this group are numbered as pi, p 2 ,..., p n , respectively. Then the algorithm of the second layer can be described as follows using pseudo code (scanning a 3-D object): IF (rank .eq. po) THEN Project the panel charges to the grid charges CALL MPI_SCATTER() Iscatter data from po to p0-p„ ENDIF ! compute convolution CALL 1-D FFT() for m times ! along axis x , compute FFT CALL 1 -D FFT() for n times ! along axis y CALL 1 -D FFT() for k times ! along axis z CALL 1 -D FFT() for m times ! along axis x , compute FFT 1 CALL 1 -D FFT() for n times ! along axis y CALL 1 -D FFT() for k times ! along axis z IF (rank .eq. po) THEN CALL MPI_GATHER() Igather data from p0-p„ to p 0 Interpolate the grid potentials to the panels ENDIF Correction Definition ! po-p„ IF(ra«fc.ne.po)THEN CALL MPI_SEND( ) ! send data from pi-p„ to pO ENDIF IF (rank .eq. p0) THEN DOm=l,n IF (m .ne. 1) THEN CALL MPI_RECV() Ireceive data from pi-p„ ENDIF Correction Operation ENDDO ENDIF
2.2.2 Memory allocation The operation of correction definition is sensitive to memory size. The goal of this operation is to finish the correction definition of every unknown before correction starts. Generally, there are a few hundreds of corrections for one unknown. When the unknowns increase to a large size, the memory required by the operation of correction definition will become too huge to be satisfied. In order to solve this problem, the correction definition is divided by the number of available processors (assuming the number is n) into n parts and is implemented individually by each processor because correction definition of each unknown is not related to others. Then the memory required by this operation can be reduced to 1/n' . Because the correction definition is ordered, the main processor can acquire the correction definition from the other processors orderly when correction operation is done. 3 The result of the experiment The result of the first layer is omitted since it is easy to apply such a parallelism. Figure 2 shows an example of the second layer. The scattering on a metal sphere is computed. The wavelength is set to be 1 meter. The
384
surface of the sphere whose radius is 1 meter is divided into 5538 unknowns. -S § j| *! H
550 440 330 220 110
pO*—L
-—n- - •-- m
i
2
3
4
5
— « — Practical Time(s)
527.93
384.26
317.68
281.09
265.12
—•— Ideal Time(s)
527.93
263.97
131.98
65.99
33
1
2
4
8
16
Number of Processors
Figure 2. Parallel Computing Time I
4 Conclusion The experiment results show that the running time is shortened greatly after the computation is parallelized. This proves that the algorithm proposed in this paper is efficient in reducing the CPU time. References 1. J. R. Phillips and J. K. White, A precorrected-FFT method for electrostatic analysis of complicated 3-D structures, IEEE Trans. Computer-Aided Design of Integrated Circuits and Systems, Vol.16, No.l0(Oct., 1997), pp. 1059-1072. 2. Xiaochun Nie, Le-Wei Li, Ning Yuan and Jacob K. White, Fast Analysis of Sattering by Arbitrarily Shaped Three-Dimensional Objects Using the Precorrected-FFT Method, Microwave and Optical Technology Letters(September 20, 2002). 3. N.R. Alum, V.B. Nadkarni and J. White, A Parallel Precorrected FFT Based Capacitance Extraction Program for Signal Integrity Analysis, 33rd Design Automation Conference, DAC 96-06/96 Las Vegas, NV,USA 4. Eleanor Chu, Alan George, Inside the FFT black box : serial and parallel fast Fourier transform algorithms(Boca Raton, Fla.: CRC Press, c2000).
FINITE ELEMENT ANALYSIS OF PHOTONIC CRYSTAL FIBRES RUI YANG AND YILONG LU School of Electrical and Electronic Engineering, Nanyang Technological Univ., Singapore 639798 Email: [email protected] Photonic crystal fibre (PCF) has attracted much intension in recent years because of their unusual optical properties. To accurately evaluate propagation characteristics of PCF's, vectorial wave analysis is necessary. In this paper we will present the computer simulation and modal analysis of PCF's by using a powerful finite element method (FEM) solver. In this solver, a combination of edge elements for the transverse field and nodal elements for the longitudinal field is used together with Perfect Match Layer (PML) to cope with the open domain. Based on the realistic field simulator, one can control the dispersion and polarization properties in PCF's by modifying solely the geometrical parameters of these fibers and open the door for optimal design of better PCF's.
1
Introduction
Optical fibers and integrated optical waveguides are today finding wide use in areas covering telecommunications, sensor technology, spectroscopy, and medicine. However, within the past decade the research in new purpose-built materials has opened up the possibilities of localizing and controlling light in cavities and waveguides by a new physical mechanism, namely the photonic bandgap (PBG) effect. The PBG effect may be achieved in periodically structured materials having a periodicity on the scale of the optical wavelength. Such periodic structures are usually referred to as photonic crystals, or photonic bandgap structures [1-2]. Holey optical fibers (HOFs), which are also known as photonic crystal fibers (PCFs), have attracted a lot of intension recently because of their unusual optical properties such as extra large chromatic dispersion, a wide range single mode operation. The complex nature of the cladding structure of the HOFs does not allow for the direct use of methods from traditional fiber theory. Especially for the novel HOF, operating by the PBG effect, the full vectorial nature of the electromagnetic waves has to be taken in account. In this paper, we present finite element magnetic and electric field models for determining the propagation modes in dielectric wave guiding structures. A combination of edge elements for the transverse field and nodal elements for the longitudinal field is used together with Perfect Match Layer (PML) to cope with the open domain.
385
386
2
Basic Equations and Finite Element Formulation
To analyze electromagnetic wave propagation in an inhomogeneous waveguide, the finite element method is employed in the framework of the Galerkin formulation of the weighted residual method to solve the vector Helmholtz equation: V x - V x E = ^£rE
(1)
Mr
where e-e + a/(jco) and e and a represent the permittivity and conductivity, respectively, of dielectric materials. kl = CQ2M0SO a n d £r=s/eo. Assuming for all of the field components the dependence from the spatial coordinate z of the form exp(-yz), with y-a + jfi as the complex propagation constant, and subdividing the electric field into its transverse (E,) and longitudinal (E z ) parts, we get: E(x, y, z) = [E, (*, y) + z Ez (*, y)]e-v (2) Substituting (2) into equation (1), and splitting it into its transverse and longitudinal parts, we can get: V,x(— V,x e ,)-y 2 —(V t e z + et) = kleret Mr
0)
Mr
72S/tx[—(V,ez + et)^z]^y2kl8rezz
(4)
Mr
where e, = yEt and ez = Ez- And equation (3)(4) must be resolved with the boundary conditions: n x e , = 0 ez = 0 (5) at perfectly conducting material, and: (Vtez + et)n = 0 V , x e , = 0 (6) at magnetic walls. To apply the weighted residual procedure, two sets of basis functions and two corresponding sets of weighted functions have to be defined. Since the Galerkin formulation is adopted, each set of weighting functions is equal to the corresponding set of basis functions. We use the vectorial shape functions a\e) as the set of basis function to express the approximate e|e) to the exact transverse part e, of the electric field on the element (e): ei' ) U,y) = Ee? ) oy ) (Jc,y)
(7)
387
and we use the nodal shape functions a,e) to express the approximate e[e) to the exact longitudinal component ez of the electric field on element (e): e(ze)(x,y) =
±e^)a(Je)(x,y)
(8)
7=1
by using the finite element expansion of the unknown field on element (e), we substitute (7)(8) into (3)(4) and annihilate the residue, we can get: "0
o
0
Ms(te)]-klsAr'M
~|r 77(e)
M
Ms[e)]-kieMe)] Mr
-[G^J1
(9)
Mr
lE(te) Mr
Mr
U
where the entries of the local matrices are given in [3]. After assembling all elements and zeroing the residuals, we can get the final generalized eigenvalue problems. Once the normalized operating frequency £o *s fixed, we can compute the propagation and attenuation constants of the characteristic modes of the guiding structures, which can be used to plot the dispersion diagram. 3
Numerical Results and Conclusion
Here we specifically analyzed this kind of photonic crystal fibers which have several rings of air holes around the core. Figure 1. the geometry of PCFs with a ring and two rings of six airs holes is given. Figure 2. is the calculated propagation constant verse working frequency band. Where the dotted line, is just the main propagation mode of common circular air waveguide. The solid line is when there are several rings of air holes around the core, no matter how many rings around the core; there is almost no any difference. Because that the most part of the transmission power are confined in the area inside the first ring. The outer side rings influence propagation properties very slightly. The similar results have been presented in [4]. Based on this efficient analysis method, we can carefully study the unusual optical properties of PCFs, and we can combine with other optimization algorithm such as Genetic Algorithm, we can optimize the design of the geometry of PCFs.
388
O OI
O
O OIO
o •MD— o o -o
6
Figure 1 PCFs with a ring and two rings of air holes
xicf without any air holes with a ring of six hoies
I
r x 10 s
Figure 2 Freq VS Propagation Constant
References 1. Broeng J, "Photonic Crystal Fibers: A New Class of Optical Waveguides". Optical Fiber Technology 5, 305-330 (1999) 2. Bjarklev A. and Riishede J, "Photonic crystal fibers - a variety of applications". Proceedings of the 2002 4th International Conference on Transparent Optical Networks, Volume: 2, 97 -97 (2002) 3. Koshiba M. and Inoue K., "Simple and efficient finite-element analysis of microwave and optical waveguides". Microwave Theory and Techniques, IEEE Transactions on, Volume: 40 Issue: 2, 371-377 (1992) 4. Bjarklev A., Broeng J., Barkou S. E. and Dridi K., "Dispersion properties of photonic crystal fibers". ECOC'98, Madrid, Sept. 1998.
PARALLELIZATION OF FAST MULTIPOLE METHOD USING MPI ON IBM HIGH PERFORMANCE COMPUTERS WU FANG, ZHANG YAOJIANG, EDWIN LIM CHAN PING, LI ERPING Division of Computational Electro-magnetics & Electronics, Institute of High Performance Computing, 1 Science Park Road, #01-01 The Capricorn Singapore Science Park 11. Singapore 117528 E-mail: wufang @ihpc.a-star.edu.sg Massively parallel, distributed memory computers provide the increases in computing performance needed to solve the largest problems in computational electromagnetics. Here, we present a parallel implementation of the fast multipole method (FMM) by using Message Passing Interface (MPI) as the communication back-end. The issues and options in its parallelization are identified, and domain decomposition strategies to suit these are implemented. Good parallelization is exhibited, with the most costly parts of the algorithm displaying essentially linear speedup. Demonstrations using the supercomputer IBM p690 are given. The parallel fast multipole algorithm presented here is scalable, portable, and efficient.
1
Introduction
Electromagnetic scattering analysis for radar cross-section (RCS) prediction and electromagnetic compatibility analysis present very large computational demands. Massively parallel, distributed memory computers provide the increases in computing performance needed to solve the large-scale problems. Therefore, the parallel fast multipole method (FMM), a powerful and efficient algorithm in solving large-scale computational electromagnetics problems, is developed. In order to achieve portability over various multi-computers, the Message Passing Interface (MPI) [1] was used for communication. Some of the unique characteristics of the parallel implementation are presented in this paper. To effectively parallelize the sequential FMM code in the distributed memory system, a non-blocking communication scheme is implemented to reduce communication and synchronisation overhead. Good load balancing among processors is achieved by using carefully designed group partitioning technique. By implementation of dynamic matrix allocation and shrinking strategies in the parallel program, the method achieved is portable, scalable, and efficient. The parallel FMM code is written in MPI FORTRAN 90, and it has been implemented in many different platforms, such as Unix C shell of IBM supercomputer p690 model 681, and IBM Linux Cluster X-series 330. The numerical results show that, by using parallel implementation, the memory consumed in each processor reduces close to half and the running time decreases sharply when the number of processor is doubled. The paper will also present the advancements of the parallel code. 2
Fast Multipole Method
Fast multipole method was initially proposed by V. Rokhlin for speeding up the solution of acoustic wave scattering [2], and later was extended to Maxwell equations by R.Coifman, et al [3]. Research group of Prof. Chew further developed it into multilevel version and successfully applied it into large-scale simulation of aircraft RCS [4]. By
389
390
reducing the computation of far field interaction, FMM transforms conventional dense matrix of moment method into sparse one as following: where ZNN
(1) Zm,x+{VTv)x=b stands for the interaction of near field and is calculated exactly by MoM while
V and T denote aggregation and translation matrix in FMM, respectively. V* is the transpose conjugate matrix of V . Vector x and b represent the unknowns and exciting source respectively. A sequential version of the fast multipole method consists of four main modules: parameter assignment, coefficient matrices calculation, Conjugate Gradient (CG) iterations, and Radar Cross Section (RCS) calculation. We paralleled the sequential 3D fast multipole method using Message Passing Interface (MPI) as the communication back-end, so that the implementation achieves the portability. 3
Parallel Implementation
As we found, when 3D FMM sequential code is implemented, nearly 95% of the CPU time is consumed on coefficient matrices calculation and CG iterations, therefore the parallel strategies are focused on these most time consuming potion of the code. First of all, we need to fully understand the structure of these coefficient matrices^ , T and NN. All these coefficient matrices in equation (1) are sparse ones, and their typical sparse structure can be shown as
Figure 3-1 Sparse Matrices structure of equation (1)
To efficiently solve this essential equation (1) by iterative solver such as conjugate gradient method (CG), we applied some unique parallel strategies: Scaling and shrinking the coefficient matrices so that all the computing power available are utilized effectively, optimal communication scheme is used to achieve the communication efficiency, and load balancing is controlled by group mapping method. 3.1
Scaling and shrinking the Coefficient Matrices
In the parallel code, instead of storing whole matrix in one processor in the sequential code, only part of these sparse and large-scaled coefficient matrices are calculated and saved in one processor. As shown in Figure 3-2, the shaded blocks are non-zero elements of aggregation matrix based on group sequence. If the number of groups in the coefficient matrix is G, and the number of processors used is P, only G/P sized matrix are calculated and saved in each processor. It is necessary to emphasize that only the non-zero elements are saved in memory, based on group sequence. To further saving the memory, coeffcient matrices are allocated and deallocated dynamically. Using this kind of data allocation method minimizes memory used in each processor. Compared with sequential code, the memory consumed in each processor reduced sharply, and the CPU time reduced
391
correspondingly. This technique of shrinking arrays improves the scalability of the parallel code.
Figure 3-2 Memory distribution of Aggregation Matrix for multi-processor
3.2
Optimal Neighboring Communication
Although only part of the coefficient matrices is stored and calculated locally within one processor, there is no communication established among processors before CG iterations, due to the geometric information are saved locally in each processor. In the CG iteration, to exam the vector x in equation (1), the matrix-vector multiplication is implemented within every step of the iteration. Two communications are involved to parallel the multiplication. The following is the parallel strategy applied to CG iteration: 1. The far field matrix-vector multiplication, V*TVx, for a given x: a) Multiplying Ml=Vx within each processor, NO communication established. b) BroadcastingMi among processors and multiplying M 2 = TM,(= TVx) locally within each processor, communication established. c) Finally calculating V+A/2(= V*TVx), NO communication established. 2. The near field matrix-vector multiplication, ZNNx • There's NO communication established. Summing up the result vectors of far field and near field, passing it to one processor to do the termination control, communication established. The communications in stepl (b) and step 3 can not be avoided. In this case, to ensure that the communication scheme is optimized, the computation and communication are overlapped as much as possible and only minimum number of messages are communicated among processors. 3.
3.3
Group mapping for Load Balancing
In this section, we describe a group mapping scheme for balancing the CPU load across the processors. As illustrated in Section 3.1, the non-zero elements of coefficient matrices are sorted in group sequence. Since the elements in each group are automatically appointed based on the finest level of the geometric partition, the number of elements in each group is similar. The outer loop decomposition is based on these data blocks to distribute the jobs evenly to processors. In this way, processors receive the nearly same number of groups and quite even workload. The advantages of the group mapping strategy include (1) the computation load among processors is about equal when the number of elements of each group is uniform (or close to uniform) and (2) the computation of matrix coefficients can be done without communication among processors.
392
4
Parallel Results and Performance
The parallel 3D FMM code was implemented and tested on IBM supercomputer, IBM p690 model 681 which has 7 nodes with 32 CPU (1.3GHz) per node and 64G RAM on each node. The parallel implementation of FMM provides an efficient way to reduce the memory and time consumed, which are critical factors while simulating Large-scale electromagnetic scattering problems. The memory consumed in one processor and the elapsed time used are nearly reduced to half when the number of the processor doubles, as shown in Figure 4-1 and 4-2. Bapsed Time vs. Number of CPUs
Memory consumed in 1 CPU vs. Number of CPUs 100000
1
100
jj
10000
I
1000 ~s~ _
I f
100
UJ
10 2
2 4 £ Number of CPUs used
4
8
Number of CPUs used
Figure 4-1 Memory consumed in one CPU
Figure4-2 Elapsed time used
The speed-up ratio, shown in Figure 4-3, presents that the parallel implementation performs even better when the number of unknowns increases. The load balance is achieved during the simulation (Figure 4-4). Speed-up Ratio vs. Number of CPUs • ™s—
3 6.0
^^
56,448 unknow ns 10,368 unknowns ideal speed-up ratio
= 5.0 |
4.0
5-3.0
I
I
€LL ^
W~
!
./j**^^"^ I
2.0 1.0
_,
,
j
Number of CPUs used
Figure 4-3 Speed-up Ratio for different number of unknowns
Figure 4-4 Load Balance when solving 56,448 unknowns
References 1. M. Snir, S. W. Otto, S. Huss-Lederman, D. W. Walker, and J. Dongarra. MPl: The Commplete Reference. Scientific and Engineering Computation Series. The MIT Press, Cambridge, MA, 1996 2. V. Rokhlin, Rapid solution of integral equations of scattering theory in two dimensions, J. Computat. Phys., vol.86, no. 2, pp. 414-439, 1990. 3. R. Coifman, V. Rokhlin, and S. Wandzura, The fast multipole method for the wave equation: a pedestrian description, IEEE Antennas Propagat. Mag., vol.35, no.3, pp.7-12, 1993. 4. W. C. Chew, J. M. Jin, E. Michielssen, and J. M. Song, Fast and efficient algorithms in computational electromagnetics, Artech House Inc, Boston, London, 2001.
Parallel Unstructured Meshes Approach for t h e Simulation of Electromagnetic Scattering
O. Hassan, J. Jones, B. Larwood, K. Morgan and N. P. Weatherill Civil & Computational Engineering Centre, University of Wales Swansea, Swansea SA2 8PP, Wales, U.K. E-mail: [email protected] A numerical procedure for the simulation of 3D problems involving the scattering of electromagnetic waves is presented. The solution algorithm employs an explicit finite element procedure for the solution of Maxwell's curl equations in the time domain using unstructured tetrahedral meshes. A PML absorbing layer is added at the artificial far field boundary that is created by the truncation of the physical domain prior to the numerical solution. The complete solution procedure is parallelised and several large scale examples are included to demonstrate the computational performance that may be achieved by the proposed approach and to evaluate the limitation of the present software on the AHPCRC T3E-1200 with 1024 processors.
1
Introduction
The accurate simulation of 3D electromagnetic scattering problems of current industrial interest, in realistic time scales, poses major computational challenges. We will address some of these challenges in the context of problems involving the interaction between waves, generated by a source in the far field, and a scatterer of general shape. Difficulties associated with mesh generation are reduced by adopting the unstructured mesh approach, with a fully automatic unstructured mesh generation procedure 4 ' 8 . The solution algorithm employed is based upon the application of an explicit linear Taylor-Galerkin finite element procedure 3 to Maxwell's curl equations. With this method, both the electric and magnetic fields are assumed to vary in a continuous piecewise linear fashion 6 . The non-reflective boundary condition, that must be imposed at the truncated far field boundary that is created to enable numerical simulation, is handled by surrounding the computational domain by an artificial perfectly matched layer (PML). The parameters in the PML equations are defined in such a manner that the amount of reflection from the far field boundary is decreased l'2. The use of the PML is found to lead to a significant reduction in computational costs compared to those associated with the use of traditional local absorbing boundary condition approximations. To enable the solution of large scale problems on current computer platforms, the complete simulation process is parallelised. The computational performance that can be achieved by the resulting capability is demonstrated by including the results of a number of scattering simulations involving plane single frequency incident waves. 2
T h e Governing Equations
Consider the simulation of scattering of single frequency plane incident electromagnetic waves by an obstacle that is surrounded by free space. It is assumed that
393
394 the incident waves are produced by a general source located in the far field. In three dimensions, Maxwell's curl equations for a general linear isotropic material, of relative permittivity e and relative permeability //, can be written, in conservative vector form, and in the dimensionless form dU dt
dFk _ dxk
(1)
where
[/ =
y,H* tEs
S =
( 1 - M ) dt. dEl (1-6) dt
i = 1,2,3 j = 4,5,6
(2)
Here E s and H s denote the scattered electric and magnetic field intensity vectors respectively, E 8 and H* denote the incident electric and magnetic field intensity vectors respectively and Ejke denotes the alternating symbol. The total field is decomposed into an incident field and a scattered field. The incident field can be dropped since it will be specified by the problem definition and will automatically satisfy the Maxwell's equations. 3
Numerical Solution Algorithm
An approximate solution to the scattering problem is obtained by using a two-step finite element Taylor-Galerkin procedure 6 . This procedure, which is notionally second order accurate in both time and space 3 , is outlined briefly here for completeness. The solution of equation (1) is advanced over one timestep, from time level t = tm to time level t = tm+\ = tm + At, in a two step fashion. With the computational domain represented by a general unstructured grid of 3-noded linear triangular elements, the solution \j{m} and the fluxes Fk^m^ at timei = tm are linearly interpolated, over each element S in the grid. In the computational implementation, the solution at time £ m +i/2 = tm + At/2 is obtained by employing the forward difference approximation according to k g{m} _ 9F £ dxk
At/2
{m}
(3)
This results in a piecewise linear discontinuous approximation to the solution at the time level < m +i/2- The solution at the time level tm+\ is obtained following a Galerkin approximate variational formulation 6 , The resulting equation takes the form {m+l} u{jm}) M (uy f S{*}JVjdft- f F{n*}Nidn At Jei (4)
E "
+ JM
°xk
395 The quantities required at time level < m +i/2 are to be computed using the values obtained for U^*' in the first step of equation (3). Here Fn denotes a normal flux on FjVi, which is computed according to the boundary condition being simulated, and M[j denotes the entries in the consistent linear finite element mass matrix. Equation (4) may be readily solved by either lumping this matrix or by explicit iteration 6 . Material interface and perfect conductor boundary conditions are imposed through the boundary integral term in equation (4). This term is evaluated using a local characteristic decomposition at the boundary 7 . The implementation of this procedure in the current context has been described in detail elsewhere 6 and will not be repeated here. Through this approach, the boundary conditions may be regarded as being imposed in a weak sense only. The non-reflective condition that is required at the truncated far field boundary is achieved by the addition of a P M L l to the exterior of the computational domain. In the examples presented here, the truncated outer boundary is always taken to be a regular hexahedron and the PML is discretised using a structured mesh of tetrahedral elements. The formulation which is implemented follows the work of Bonnet and Poupaud 2 , 4
Parallel Implementation
This basic algorithm has already been validated for a number of different scattering problems. However, the nature of the algorithm means that the required mesh size will increase rapidly when the method is applied to the solution of problems involving the electrically large scatterers which arise when realistic frequencies and geometries are considered. Such simulations will require the use of significant computational resources and, in this case, the use of parallel computers becomes essential. It should be noted that the success of this route will require not only a parallel implementation of the basic Maxwell equation solver but, in addition, the effective parallelisation of the mesh generation and solution visualisation stages. The approach adopted for parallel mesh generation is based upon a geometrical partitioning of the domain 9 . The complete domain is divided into a set of smaller sub-domains and a mesh is generated independently in each sub-domain. The combination of the sub-domain meshes produces the mesh for the complete domain. A manager/worker model is employed in which the initial work is performed by the manager, before distributing the mesh generation tasks to the workers. There are a number of different approaches available for serially decomposing a given unstructured mesh. However, for the current application, it is envisaged that the mesh data sets will be too large to load onto one processor. Therefore, the partitioning process has to be parallelised and distributed amongst the processors at all times. The present implementation utilises the ParMetis library for the partitioning 5 . This procedure produces high quality partitions in a fast, robust and parallel manner. In the parallel implementation of the solution algorithm, elements are owned by only one domain and are not duplicated, while points are owned by one domain and are duplicated. This strategy enables data locality to be achieved during the gather
396
y-AAA/V "•I..*-
Figure 1: Scattering of a plane wave by a coated PEC sphere of diameter D = 3A showing (a) the computed contours of -E| (b) comparison between the exact and computed distributions of the scattering width.
process, from points to elements, and the scatter process, from elements to points, and hence there is no need to communicate. For each time step, the interface nodes obtain contributions from more than one domain. 5
Numerical Examples
Two examples are considered to illustrate the performance of the integrated software. The first example involves scattering of a plane single frequency incident wave by a coated perfectly conducting sphere. The sphere diameter D = 3A and the dielectric coating is of thickness t = 0.25D. The coating is characterised by the material properties t = 2.56,/x = 1. The mesh employed in the region between the sphere and the far field boundary consists of 3 296 694 tetrahedra and 599 399 nodes. The structured PML region contains 820 380 tetrahedra and 149 160 nodes. The solution is advanced through 60 cycles of the incident wave and the computed contours of £ | a r e displayed in Figure 1(a). The exact and the computed distribution of the scattering width are seen to be in very good agreement in Figure 1(b). The final example uses the procedure to simulate the scattering of a plane wave by a PEC aircraft. The aircraft length is 10 wavelengths and the mesh employed consists of approximately 7.2 million elements and 1.35 million nodes. The PML is
, jtv\ ' A
J
It
K |
n i
Figure 2: Scattering of a plane wave by a PEC aircraft showing (a) computed contours of / / | on the aircraft surface (b) the predicted distribution of the scattering width
located at a distance of one half wavelength from the aircraft and has a total thick-
397 ness equal to one wavelength. The PML region, consisting of 10 layers of elements, has approximately 1.4 million elements and 0.27 million nodes. The solution was advanced for 40 cycles and the computed contours of /f| on the aircraft surface are shown in Figure 2(a). The computed distribution of the RCS is displayed in Figure 2(b). 6
Conclusions
A numerical procedure that enables the parallel simulation of three dimensional electromagnetic scattering problems using automatically generated unstructured tetrahedra meshes has been presented. The solution algorithm employs a scattered field formulation and a two step Taylor-Galerkin time stepping scheme. The truncated far field boundary condition is imposed by the addition of a PML. Parallel mesh generation is accomplished by a Delaunay procedure, following a geometrical partitioning of the domain. A number of computationally challenging examples have been included to demonstrate the numerical performance of the proposed procedure. References 1. Berenger J. P., A perfectly matched layer for free-space simulation in finitedifference computer codes, Journal of Computational Physics, 1994;114: 185200 2. Bonnet F. and Poupaud F., Berenger absorbing boundary condition with time finite-volume scheme for triangular meshes, Applied Numerical Mathematics 1997; 25: 333-354 3. Donea J., A Taylor-Galerkin method for convection transport problems, Interernational Journal for Numerical Methods in Engineering, 1984; 20: 101— 119 4. George P. L., Automatic Mesh Generation. Applications to Finite Element Methods, Wiley: Chichester, 1991 5. Karypis G. and Kumar V., Multilevel k-way partitioning scheme for irregular graphs, Journal of Parallel and Distributed Computing 1998; 48: 96-129 6. Morgan K., Hassan O. and Peraire J., An unstructured grid algorithm for the solution of Maxwell's equations in the time domain, International Journal for Numerical Methods in Fluids, 1994; 19: 849-863 7. Shankar V., Hall W. F., Mohammadian A. and Rowell C , Theory and application of time-domain electromagnetics using CFD techniques, Course Notes, University of California Davis, 1993 8. Weatherill N. P. and Hassan O., Efficient three-dimensional Delaunay triangulation with automatic point creation and imposed boundary constraints, International Journal for Numerical Methods in Engineering 1994; 37: 20052040 9. Weatherill N. P., Hassan O., Morgan K., Jones J. W. and Larwood B., Towards fully parallel aerospace simulations on unstructured meshes, Engineering Computations" 2001; 18:347-375
DEVELOPMENT OF PARTING LINE GENERATION TOOLS FOR A 3D CAD INJECTION MOULD SYSTEM
W. M. CHAN Institute of High Performance Computing, 1 Science Park Road, #01-01 The Capricorn, Science Park II, Singapore 117528, SINGAPORE E-mail: [email protected]
Singapore
S. L. LIEOW Kas Equipment and Trade, 623 Aljunied Road, #04-04 Aljunied Industrial Complex, 389835, SINGAPORE E-mail: [email protected]
Singapore
Parting line tools for use to generate core, cavity and inserts in a 3-dimension CAD injection mould system are presented in this paper. These software tools provide efficient ways to generate parting lines in complicated solid model mould products. Using these tools parting lines are generated by selecting portions of the product such as face, point or edge. The more advance parting line tools require selection of a combination of edge, face or point to define the cutting plane. Mathematics of 3-dimension global co-ordinate transformation to local 2-dimension work-plane co-ordinates is formulated. This formulation is used in one of the tools to generate the parting line. Some examples of how these parting line tools work are shown using a commercial 3-dimension CAD software.
1
Introduction
As product life cycle decreases the demand to shorten mould design time increases [1]. In many cases design changes and modifications are made to the product. A way to shorten design time is to automate the mould design process [2]. Parting line generation is one of the key components in mould design. Tools developed to aid parting line generation can thus help reduce the design time. Researchers have developed ways to aid parting line generation. Wong et al. [3] proposed a method to determine the cutting plane of complex shaped product. Their method uses an algorithm that slices the product. Nee et al. [4] described a methodology to determine optimal parting lines. A methodology that generates non-planar parting lines and surfaces was presented by Nee et al. [5]. Hui [6] introduced a blockage test that determines the interference between product, side core and split core to aid parting line positioning. Ravi and Srinivasan [7] proposed a parting line generation method using section and silhouette. Other research focuses on rules to generate parting lines. Nine rules that can be used by the mould design engineer to develop a suitable parting line in the product are presented by Ravi and Srinivasan [8]. These rules are projected area, flatness, draw, draft, undercuts, dimensional stability, flash, machined surfaces and directional solidification. Ganter and Tuss [9] proposed a method to locate the parting line for cast parts using a set of rules: center of gravity, principle axes and direction of draw specified by the user. Parting line generation for complex parts involve more engineering judgement and knowledge. The parting line position of the core and cavity is not simple to determine using algorithms and rules alone. Often engineering judgement and experience is used to determine the parting line.
398
399
Some parting line generation tools are developed to aid the generation of core, cavity and inserts using engineering knowledge and judgement. These tools provide efficient ways to generate parting lines. Using these tools parting lines are generated by selecting portions of the product such as face, point or edge. The more advanced parting line tools require selection of a combination of edge, face or point to define the cutting plane. These tools are presented in the following sections. Examples of how some of these parting lines tool work are shown using commercial 3-dimension CAD software. 2
Parting Line Tools
Eight parting line tools are presented as follows: 1. Cutting plane on a point: This function defines the cutting plane on a selected part point as shown in Figure 1. The cutting plane orientation is selected to be perpendicular to one of the three major axes. This plane can be shifted by a userdefined amount along the axis. Also, the user can adjust the plane angle. The right figure shows the part cut by the plane. 2. Cutting plane perpendicular to two points on a straight line: This function positions the cutting plane perpendicular to two points on a straight line. Two points are selected to define the straight line. The user can specify an amount to shift the cutting plane along the straight line. 3. Cutting plane perpendicular to two points: This function positions the cutting plane at the centre of two selected points. The orientation of the plane is perpendicular to the direction of the two points. A user-defined amount can be specified to shift the cutting plane between the two points. 4. Cutting plane parallel to part face: The cutting plane is defined as parallel to the selected part face, see Figure 2. This cutting plane can be shift by an amount in the direction of either into or out of the part. The right figure shows the mould cavity after the cutting process. 5. Circular cutting plane: This function defines a circular cutting plane. Selecting the circular hole on a part generates the circular cutting plane with the same diameter as the hole. Changing the cutting plane diameter creates an annulus (see Figure 3). 6. Cutting plane perpendicular to the edge and positioned on the middle of the edge: This function defines a cutting plane that is perpendicular to a selected edge. The function automatically positions the cutting plane on the middle of the edge. Shifting of the cutting plane is achieved by a user-defined amount. 7. Cutting plane defined on the middle of the face with rotate plane option: This function defines a cutting plane that is perpendicular to the selected face, see Figure 4. Position of this cutting is at the middle of the face. The user can rotate the cutting plane direction. 8. Cutting plane defined by edges: Selecting the part face defines that the cutting plane is perpendicular to that face. The cutting profile is defined by selecting the edges (see Figure 5).
400
Figure 1. Cutting plane on a point.
Figure 2. Cutting plane parallel to part face.
"8F
Figure 4. Cutting plane defined on the
Figure 3. Circular cutting plane.
middle of the face with rotate plane option.
Figure 5. Cutting plane defined by edges.
Figure 6. Global to local workplane co-ordinate transformation.
3
Co-ordinate Global to Local Workplane Transformation
Global co-ordinate to local work-plane co-ordinate transformation is used when creating cutting planes in orientations other than those of the major axes. Figure 6 shows a local work-plane. Points 1, 2, 3 and 4 are global coordinates. Point 4 is the mid-point between line 13. Local co-ordinate transformation is shown below.
a = yj{x\ - xlf + (y\ - yl)2 + (d - zl)2 b = 0.5xyj(x\ -xlf x4 = 0.5x(xl-jc3)
• fa^ti2
;y4 = 0.5x(_vl-)>3)
0 = 2cos
^
+ (yl - y3)2 + (zl - z3)2
z4 = 0.5x(zl - z 3 )
w3 = acosf?
u\ = a
v3 = asin0
vl = 0
M2 = 0
v2 = 0
401 The work-plane co-ordinates for points 1, 2 and 3 are (ul, vl), (u2, v2) and (u3, v3), respectively. Note that this method of transformation does not involve matrix rotation and thus certain blind spots are avoided. This transformation is used in the "Cutting plane defined by edges" function to position the cutting plane. CONCLUSION This paper presented eight tools developed to generate parting lines in a 3dimension CAD injection mould system: cutting plane on a point; cutting plane perpendicular to 2 points on a straight line; cutting plane perpendicular to two points; cutting plane parallel to part face; circular cutting plane; cutting plane perpendicular to the edge and positioned on the middle of the edge; cutting plane defined on the middle of the face with rotate plane option; and cutting plane defined by edges. These tools provide an efficient and simple method to generate the core, cavity and inserts. Usage of these tools is demonstrated in commercial 3-dimension CAD software, SolidDesigner. Also, mathematical co-ordinate 3-D global to 2-D local work-plane transformation is formulated. REFERENCES [1] Altan T., Lilly B. W., Kruth J. P., Konig W., Tonshoff H. K., Van Luttervelt C. A. and Khairy A. B., "Advanced Techniques for die and mold manufacturing", Annals of the CIRP, 42(2), 707-716, 1993. [2] Fu M. W., Fuh J. Y. H. and Nee A. Y. C , "Undercut feature recognition in an injection mould design system", Computer-Aided Design, 31, 777-790, 1999. [3] Wong T., Tan S. T. and Sze W. S., "Parting line formation by slicing a 3D model", Engineering with Computers, 14, 330-343, 1998. [4] Nee A. Y. C , Fu M. W., Fuh J. Y. H., Lee K. S. and Zhang Y. F., "Determination of optimal parting direction in plastic injection mould design", Annals of the CIRP, 46(1), 429-432, 1997. [5] Nee A. Y. C, Fu M. W., Fuh J. Y. H., Lee K. S. and Zhang Y. F., "Automatic determination of 3-D parting lines and surfaces in plastic injection mould design", Annals of the CIRP, 47(1), 95-98, 1998. [6] Hui K. C, "Geometric aspects of the moldability of parts", Computer-Aided Design, 29, 197-208, 1997. [7] Ravi B. and Srinivasan M. N., "Computer-Aided parting surface generation", Proceedings ASME Manufacturing International Conference, Atlanta, 125-129, 1990. [8] Ravi B. and Srinivasan M. N., "Decision criteria for computer-aided parting surface design", Computer-Aided Design, 22, 11-18, 1990. [9] Ganter M. A. and Tuss L. L., "Computer-assisted parting line development for cast pattern production", Transactions of the American Foundrymen's Society, 795-800, 1990.
SIMULATION OF TEMPERATURE AND STRESS FIELD IN DEPOSITION PROCESS FOR RPST BY HOMOGENIZATION METHOD GUILAN WANG, ZHIHUA XU, HAIOU ZHANG State Key Lab. of Plastic Forming Simulation and Die&Mold Tech. Huazhong Science & Technology. Wuhan, 430074, Hubei, P. R. China Email: [email protected], [email protected]
University of
Rapid plasma spray tooling (RPST) is a process that can quickly make molds from rapid prototyping or nature patterns without limitation of pattern's size or material. In previous research, two-scale asymptotic homogenization theory was introduced to predict the effective properties of plasma sprayed coating as functions of pore volume fraction, and the temperature field in deposition process for RPST was simulated with two dimensional plane models. The purpose of this paper is to simulate the temperature and stress field and explore a way for the simulation of curved substrate models through a axisymmetric rotating model. The macro-micro mathematical and mechanical models are established by homogenization method. The effect of scanning path of spray gun in deposition process for RPST is discussed and the simulation for the spray on axisymmetric rotating substrate model in deposition process is performed by the developed FEM software system deposition.rapid plasma spray tooling, homogenization theory, axisymmetric, finite-element method
1
Introduction
Rapid plasma spray tooling (RPST) has gained more and more attention because it can be used to make metal molds from rapid prototyping or nature patterns [1]. During the process of metal spraying, the molten metal particles impact onto the substrate with high speed, then deposit and freeze layer by layer so as to form porous coating with the residual stress. It is necessary to research on eliminating or reducing the residual stress which is one of the main factors that result in coating deforming, crazing, peeling, etc. For the traditional continuous media mechanics method cannot be used effectively to describe the microstructure and the mechanical behavior of porous coating, homogenization method, as an effective method able to be applied in many areas of physics and engineering, is applied to this work. In previous work, the coating growth and pore formation have been simulated, and the homogenization method has been applied to model and simulate the temperature field with the flat substrate model [2,3]. In this paper, the effect of scanning path in deposition process for RPST is discussed through the simulation of stress field. As is all known, there are many axisymmetric and non-axisyrnmetric concavo-convex shapes on the surface of mold in practical manufacture, thus, the simulation for the axisymmetric rotating substrate model is also performed in this paper.
402
403
2 Modeling for temperature and stress field by homogenization method An effective approach to Temperature displacement,strain,stress obtain the temperature Macro model Macro model and stress field of discontinuous media is, by homogenization method, to establish relative mathematical M M M M M M M M M M t* J [ - 1 l - l l - l M [ • ] M M M [ • J M M M W M M M M M M models which can be used Micio model Micro model to reflect the influence caused by the :-ySSsK&.; ;,,.;«.:;.: , sis^ssMi&ssaas&ci^ discontinuity of material Figure 1. Analysis of the temperature and stress field without considering microstructure at each point. The homogenization method is based on the idea of asymptotic expansion of 3>£ (x), as shown in Ref. [4]
1}
l - H - l l - l l - l 1*1 [ » l [»1 [ • ] ( • ] [»1
3>e ( x ) = <E>°(x, y) + eO1 (x, y) + • • • (e = xly(0 < e « 1 ) )
(l)
where <E>£ (x) is the field function, for example, in the temperature field, O e (x) should be replaced by TE (x), x denotes the macro-scale variable, and y denotes the micro-scale variable. According to the principle of virtual work, the increment format of thermal stress yields
f E^d^dAv'dQ=\ ,jkl
Jil'
dxt ?r
dxj ?>r
E^Ael^-dQ "kl
i^
U
(2)
?r
In respect to the homogenization theory, extend Aw£ into an asymptotic expansion Au£ (x) = Aw° (x, y) + eAu1 (x, y) + •
y = xl e
(3)
Substitute formula (3) into (2), the equilibrium equation, which describes the macro -micro mechanical behavior, can be available like this
f E* M M
JY
>
'
dx,
dyj
J y =
f E,Mi h.
'
^M.y dxj
, Vv G VY (4)
where Aekl = CCkl \+ 'T—'T)= CCklAT, OCkl is the coefficient of thermal expansion and E"L is the matrix of elastoplasticity increment.
404
3 FEA results and discussion 3.1 Effect of scanning path in deposition process By simulating the temperature and stress field with four different kinds of scanning path (Fig.3): (a)vertical V=0.1m/s to the long side; (b) parallel to the Environment Temp. 20°C long side;(c)spire out; (d)spire in, Initial Temp.: 20°C Time step: the effect of scanning path in At = 0.02 s deposition process is discussed. Powder feeding: 100mm m=60g/min Fig.2 is a FEM model with geometry size signed in model. The model boundary conditions 4mm are set as: on the two side areas, the right and the left, the first temperature boundary condition Figure 2. The macro flat substrate model. T=20°C.The heat flux on the lower area is • 10w/m2. The physical n parameters of the substrate and the powder have been a " b given in Table 1. Compare the von Mises stress peak f i value as shown in figure3, the result show that the peak value in (c) spire out Figure 3. Von Mises Stress vs. Scanning Path is the lowest. Table 1 Material Physical Parameters Parameters
Substrate
Powder
Heat Conductivity (w/(m • °C))
50
50
Specific Heat (J/(kg • °Q) Thermal Expansion Coefficient (m/°C)
520
520
1.3E-05
1.7E-05
Density (g/cm3) Elasticity Module (MPa) Poisson Ratio
7850
7850
1.0E+05 0.3
1.0E+05 0.3
Yield Intensity (MPa)
200
200
Plasticity Strengthen Coefficient (MPa)
5.0E+04
5.0E+04
405
In addition, the pictures show that the peak value of stress at longer straight line is higher than that on the short straight line, and the stress in the fields of turn is smaller than that on the long straight line 3.2 The simulation in the deposition process of the cylinder substrate Fig.4 is a quarter part of a cylinder substrate model, we take inner radius=60mm, outer radius= 65mm and axial length=80mm. Its boundary conditions are set on the model. The displacement constraint (the yellow) on the two side circumferential areas is set to zero, the first temperature Figure 4. A quarter part of cylinder substrate model boundary condition (the blue) is T=20°C on the axial areas, and the heat flux (the red arrow) on the inner area is q=10w/m2. Under the same calculating conditions as given in (section 3.1), the coating growth, the temperature and stress field are simulated as shown in Fig.5.
.s
M-
Figure 5. The coating growth, temp, field and stress field
4 Conclusion 1. Homogenization method can be applied to simulate the temperature and stress field in Deposition Process for RPST; 2. The peak value of von Mises stress with the spire out scanning path is the lowest, which assist to maintain the stability of original shape,
406
furthermore, the scanning path should consist of fold lines and curves as many as possible and the line segment should be as short as possible to reduce the peak value of stress. 3. The temperature and stress field with axisymmetric rotating substrate models can be simulated by the developed FEM software system, which gives a cue that the simulation is possible for arbitrary geometry shape models in the future. Acknowledgements This research was funded by the Ministry of Science and Technology and the National Natural Science Foundation of China through research grants 2001AA421150 and 50075032, respectively. The authors would like to thank Dr. Yanxiang chen References 1. 2.
3.
4.
H. Zhang, G. Wang, T. Nakagawa, in: T. Nakagawa Ed. Proceedings of the 8th International Conference on Rapid Prototyping, June 12_13, Tokyo, Japan, 2000, p. 444. YanxiangChen, GuilanWang, Haiou Zhang, Numerical Simulation of Coating Growth and Pore Formation in Rapid Plasma Spray Tooling, 5* Asia-Pacific Conference on Plasma Science & Technology, 13th Symposium on Plasma Science for Materials, 2000.9.10-13, Dalian, China. GuilanWang, YanxiangChen, Haiou Zhang, Homogenization theory applied to plasma sprayed coating: modeling and numerical simulation of the temperature field, in: H. Bin (Ed.), Computer Aided Production Engineering CAPE2001, Professional Engineering Publishing Limited, IMechE, London, UK, 2001, pp.355-358. B. Hassani, E. Hinton *, A review of homogenization and topology optimization I-analytical and numerical solution of homogenization Equations, Computers and structures 69 (1998) 719-738
GEOMETRIC MODEL AND NUMERICAL SIMULATION FOR THE LAYING PROCESS OF WIRE ROPE GUILAN WANG, JIANFANG SUN AND HAIOU ZHANG State Key Laboratory of Plastic Forming Simulation and Die & Mold Tech., Huazhong University of Science and Technology, Wuhan, 430074, PR China E-mail: [email protected] ABSTRACT - A mathematical model of wire rope considering space geometric structure and characteristic of the laying process is proposed with self-rotating ratio. Depending on the position in the stranded rope the wires can be in the form of single, double and triple helices, the vector of which can be obtained. And a program for calculating boundary conditions of a three-dimensional finite element modeling is developed to create data which can be accessed by ANSYS for nonlinear analysis. Additionally, die deformation of once laying and secondary laying acting on the wires of a strand or wire rope is analyzed. The results show better agreement with the previous analytical solutions. Keywords: Geometric model, laying forming, self-rotating ratio, forming stress, FE analysis, wire rope
1
Introduction
Many researchers have devoted to the behaviors of stranded rope under axial loads (tension, shear, bending and torsion) [1-3]. But they fail to predict the complicated laying process of wire rope. Especially the laying process with self-rotating ratio is rarely studied. Self-rotating ratio is defined as a ratio of angular velocity of wire /strand wound around its own axis in the opposite direction to its laying direction to the angular velocity of wire /strand wound around the centroidal axis of strand /rope, which is an important parameter of laying process and is called as self-rotating ratio of wire rope kr and that of strand ks, respectively. In this paper we present geometric model of the laying process including the classification of wires and the vector equations of helices in stranded rope. Structural discretization and boundary conditions are given for FE analysis. Moreover, we solve numerial examples of once laing and secondary laying with different self-rotating ratio. Finally, concluding remarks are given. 2
Geometric mathematic model
A typical stranded rope with independent wire rope core (IWRC) is composed of two major structural elements. One is the strand and the other is the core. It is assumed in this work that all wires have a circular cross-section and remains circular when deformed. The centroidal axes of
407
408
a wire and a strand are selected to represent the path of the wire and the strand used to study the geometric characteristics that are related to deformation. The generatrices of each wire are as well studied because the self-rotating ratio is taken into account. There is at most one straight wire located in the center of a rope. The remaining wires and their generatrices can be classified geometrically into three groups: single helices, double helices or triple helices. When a straight strand is laid, the free end of outer wire moves and the other end is fixed. The out wire has a single helical form because it is wound only once around a straight axis. If a strand is helically wound around into a rope, the center wire of the strand has a single helical form. All of the other wires have a double helical form because they are wound twice, once around the strand axis and another around the rope axis. Furthermore, if the self-rotating ratios of strand and rope are considered, generatrices of the helical wire have triple helical form in rope because the helical wire is wound thrice, once around the strand axis, another around its own axis and the other around the rope axis. The centroidal axes of both the wire and the strand can be considered to be lying on right circular cylinders developed into a plane. Some basic relationships can be established by using the developed views shown in Figure 1. Without self-rotating ratio of rope spread of shown in Figure 1(a), the length of the helical wire straight wire rope Sr and the length of strand Ss in rope are: sr=rrertgp
rr9r
s.= c o s / ?
r
where rr and /?r are helix radius and helix angle of rope, respectively, 6r is angle of rope rotation. Similarly, in strand Ss and Swc are:
Developed view of strand and helical wire in IWRC rope
Ss=rsestgps,
e "
"ire
=
COS^,
where rs, (5S and Swc are helix radius, helix angle of strand and the spread length of helical wire, respectively, 6S is angle of strand rotation. Because Ss in rope equals that in strand, ds can be obtained. However, if the self-rotating ratio of rope is taken into account shown in Figure 1(b), 9S is t
t
changed into 6S . 6S and 0S can be expressed:
409
-kr)8r
-M,=(- r zosP tgP s
parallel to x axis single helix parallel to ns axis Figure 2. Helices model in IWRC Rope with self-rotating ratio
r
(1)
s
As shown in Figure 2, let A and B presented by vector R and P, be a point of centroidal axis of strand and a point of axis of helical wire, which have single helical form and double helical form, respectively. The vector equation of R for single helix in global system is expressed below. ns, b s and ts which configure the Frenet-frame are normal vector, binormal vector and tangential vector at point A. Bold face letters identify vectors or
matrix. R
rr cos Qr 1 *4,1 = 'jyA )• = •{>•, sine } f - cos Gr ]
(2)
f sin Pr sin 6r 1
ns=\ - s i n # r )• , b s =-j-siny5 r cos# r \
- cos Pr sin dr 1 •\ COSy9r COS^ f
\
(3)
\ 0 lj l[ cos fi, lj \ siny9r lj The vector Q traces the axis of a double helical wire in Frenet-frame ns-bs-ts. Because the head of R is located exactly at the tail of Q, the vector P can be obtained through vector addition. So the vector equation of P is: ^
ti
rr cos8r + rs(-cosOr cos#, + sin/?r sin#r sin#.
P = \yB }•=•{ rr sinf9r +r,(-sin^ r cos^ -siny9r sin^r sin05 ) \ \*B\
(4)
rrtgfir6r +rs cosfir sin^ s
In a similar way the vector equation of triple helix can be derived. 3
FE analysis results and discussion
Three-dimensional solid brick elements were used for the structural discretization. When a wire rope is laid, the displacements of nodes at the free end of the helical wires are calculated by a program, which is developed with C++. After inputting geometric data, discretization rules, material and process parameters, the files of boundary conditions accessed by ANSYS for nonlinear FE analysis can be created.
410 Table 1. The geometric data and process parameters of strand/wire rope.
strand/wire rope IWRC1X7 Core Stran
d
OWRC7X7)
Diameter of helical wire (mm)
Pitch length (mm)
Diameter (mm)
Diameter of center wire (mm)
8.50
2.86
2.86
102.5
—
2.34
2.66
52.3
0.0,0.5,1.0, 1.2 1.0
—
2.14
2.34
52.3
1.0
Self-rotating ratio
With JIS SWRS72A wire a simple 1 X 7-wire strand and IWRC 7X 7-wire rope have been analyzed in this paper. Young's modulus, yield stress and coefficient of work-hardening of the material are 183.9 GPa, 1290 MPa and 0.033, respectively. Table 1 details the geometric data and process parameters. The pitch length and self-rotating ratio of IWRC 7X7 are 133 mm and 1.0. The deformation of the helical wire in 1 X 7-wire strand is depicted in Figure 3. With different self- rotating ratio interior axial stress is compressive stress while exterior axial stress is tensile stress. When ks equals zero, shear stress in the cross-section is the largest; the distribution of Von Misses stress is concentric circle, and the center of cross-section is elastic area while the outer is plastic area far more than the elastic area, which results in smaller elastic spring back. With the increase of ks the distribution of tensile stress and compressive stress are increasing while the absolute value of shear stress is decreasing; the plastic area of Misses stress is decreasing and the elastic area interlacing with the plastic area is increasing, which results in larger spring back. When ks equals 1.0, shear stress is the smallest, which helps to reduce the ratio of broken-off wire; concentric circle distribution of Misses stress is changed into band shape and the plastic area is far less than the elastic. When ks equals 1.2, distribution of Misses stress is changed into concentric circle again. Additionally, the distribution of equivalent plastic strain is similar to that of Misses stress. But when ks equals 1.2, concentric circle distribution of plastic strain is less apparent than that of Misses stress. Furthermore, it can be deduced that with nonzero self-rotating ratio the helical wire in strand is in close contact with the center wire and the property of the whole strand is good because of smaller shear stress, equivalent plastic strain and torsion stress. So taking shape stability, safety and tensile strength into consideration, the reasonable value of self-rotating ratio should be slightly more than 1.0.
411
The deformation of IWRC 7 X 7-wire rope is shown in Figure 4. The stress and strain are non-uniformly distributed along the helical wire. With the position change of helical wire in stranded rope, the absolute values of stress and strain in the secondary laying are apparently more than those in the once laying. And the self-rotating ratio with the value of 1.0 is favorable.
Figure 3. Stress and strain of once laying in IWRC 1 X 7-wire strand
4
Figure 4. Stress and strain of secondary laying in IWRC 7 X 7-wire rope
Conclusions
A geometric mathematical model for the laying process of wire rope has been presented for FE analysis. Both once laying and secondary laying examples indicate that a reasonable self-rotating ratio should be slightly more than 1.0. The results have shown better agreement with the results in literature [4]. A model that considers more complex cross-section and contact condition of wire rope should be adopted in future research. And this is being undertaken in our current research. Acknowledgements The authors gratefully acknowledge the support of the Ministry of Education of China through research grant [1998] 679.
412
References 1. Costello, G. A., Theory of Wire Rope (Springer Verlag, New York, 1997) pp. 24-102. 2. Utting W. S. and Jones N., The response of wire rope strands to axial tensile loads: partD. International Journal of Mechanical Science 29(9)(1987) pp. 605-619. 3. Jiang, W. G. and Henshall J. L., A novel finite element model for helical spring. Finite Elementa in Analysis and Design 35(2000) pp. 363-377. 4. Guilan Wang and Haiou Zhang, The computational simulation by elasto-plastic FEM for the laying of metal wire rope. China Mechanical Engineering 12(5)(2001) pp. 7-10.
AGENT-BASED COMPOSABLE SIMULATION FOR VIRTUAL PROTOTYPING W. X I A N G Institute of High Performance Computing,! Science Park Road, #01-01 The Singapore Science Park II, Singapore 117528 E-mail: [email protected]
Capricorn,
S. C. FOK, G. THIMM Design Research Centre, School of Mechanical & Production Engineering, Nanyang Technological University, Nanyang Ave, Singapore 639798 System performance evaluation without real physical prototype is an attractive feature in virtual prototyping. This paper proposes an agent-based composable simulation framework to address the challenges in virtual prototyping. The concept is to use an agent to manage a component. A circuit of components is then equivalent to a configuration of agents. Domain agent is proposed to represent various virtual components in a system. The implementation of "composable" simulation in virtual prototyping evaluation depends on the communication and collaboration of the multiple agents. A case study is done in domain of fluid power system design. This work describes initial effort towards the development of an intelligent distributed environment for the virtual prototyping.
1
Introduction
A prototype normally has to be developed for evaluation of system design. A general system prototyping can be classified into physical prototyping and virtual prototyping. The physical prototyping process involves the evaluation by the real commercial components. It can be tedious, time consuming, and expensive. While virtual prototyping can be regarded as a computer-aided design process, which consists of modeling and simulating tools to address the broad issues of evaluation under various operating environments [1]. To fully exploit the advantages of virtual prototyping, following challenges need to be addressed. • Integration Various features of the system prototype such as system dynamics, assembly, and maintenance have to be considered, since in the real physical world, slight changes in one domain often have profound implications in others [2]. • Composability A promising idea in the future trend of a simulated-based-design is the concept of prebuilt models. However, composability is still a frontier subject in modeling and simulation, and current capability is limited [3]. • Distributed coordination To integrate virtual components to a virtual system requires more than a simple conversion of each component feature model into an individual entity. It requires a mechanism to allow a group of component models to communicate and engage in cooperative tasks and capable of adapting to changing circumstances [4]. • Interaction and reality The interaction and reality challenges in virtual prototyping concerns many implementation details like web-accessibility, interactive and operation, etc [5].
413
414
This paper proposes an agent-based composable simulation framework to address the above challenges. It regards each component as an agent and the system as a multi-agent system. First, an agent-based virtual component representation is introduced. Section 3 further addresses the composable simulation based on agent communication and collaboration, a simple case study is presented for validation. Finally is the conclusion. 2
Agent-based Virtual Component Representation
In a specific system, components interact with each other to give the overall system function. This fundamental concept is analogous to an agent-based framework. If a component can be represented as an agent, then a circuit of components is equivalent to a configuration of agents. The agents will cooperate and interact to achieve the overall features of the system. An approach to integrate the component features through an object-based virtual model, controlled by an associated domain agent is introduced in this paper. Figure 1 shows the representation of domain agent. The domain agent essentially consists of three parts: domain knowledge set, control unit and the component model.
* List of acquaintances * Domain knowledge Douiata XBOmWgs Set Domain Agent
Figure 1 . Representation
of domain
agent
The domain knowledge set contains essential data and knowledge required by the agent to perform its activities. Unlike the component data, which remains static, some knowledge needs to be continuously updated either by an agent itself or by a system designer. The knowledge base has an acquaintance list of other agents that it can directly communicate within the system. This acquaintance list reflects the change of assembly. The control unit is responsible for the communication with other agents and the subsequent reaction. The message parser first parses an incoming message. Then the message handler analyzes the message using the agent's knowledge properties or component model's features, and finally responds to the message if necessary. The ensemble of control units of all domain agents constitutes a part of the coordination mechanism of the system. The concurrent execution of the control units of the agents determines the emergence behavior for the resulting system. A virtual component must provide sufficient information for consideration in the product life-cycle support activities including design, analysis, test, documentation, assembly, and administration. The virtual component model is defined as following: Definition 1. A virtual component model is a 4- tuple: ModelComponem : (/, G, N,Pr), where I is the set of component interface features, i.e. the set ofports for component communication, G is the set of geometrical features representing the physical properties of the component, i.e. the 3-D graphics model with position and orientation information, N is the neural network model representing the component's behavior, Pr is the set of product attributes associated with the component.
415 3
Agent-based Composable Simulation
The system performance is caused by the interaction among all components in a designed system. Since agents are used to manage components and to simulate the components interaction by inter agent communication in a distributed way, the implementation of composable simulation depends on the communication and collaboration of the multiagent system. 3.1
Agent Communication and Collaboration
Agent communication requires a common language. All messages passed in proposed virtual prototyping are written in Knowledge Query Manipulation Language (KQML) format. Agent collaboration is realized by two ways: direct collaboration and collaboration through a control agent. Direct collaboration is used to reduce the message flow within the system and to obtain a stronger harmony among the domain agents, whereas the other achieves a higher-level management. Composable simulation is proposed based on the agent communication and collaboration. Domain agents register for managing different components' virtual models. These domain agents monitor the changes of the important performance parameters of the relative controlled components, communicate, recalculate and propagate the state when they check any changes. By inter agents' communication and collaboration, virtual prototyping system's dynamics is simulated. 3.2
Agent-based composable simulation
The idea of agent-based composable simulation is proposed based on the agent communication and collaboration. Domain agents register for managing different components' virtual models. These domain agents monitor the changes of the important performance parameters of the relative controlled components, communicate, recalculate and propagate the state when they check any changes. By inter agents' communication and collaboration, virtual prototyping system's dynamics is simulated. Such agent-based composable simulation is validated by a simple case study in domain of fluid power system. 3.2.1
Case study: a simple validation experiment in domain of fluid power system
A simple experiment is configured according to the circuit in Figure 2. The experimental transient value of pressure P, flowrate Ql, Qr, and Qp are measured by transducers located in the system. Domain agent 'pump-agent' and 'valve agent' respectively monitor component 'pump' and 'pressure relief valve' and to exchange the pressure and flowrate according to the rule: the pressure at the connection port is the same, while the flowrate sums to zero. When an agent receives KQML message of changing value occurred in system, it triggers the calculation of the behavior model (neural network model) and generates a new output based on the network's inputs. Then this new output value is propagated to other communicated agents. With such a communication, the dynamic behavior can be simulated. Such agent-based composable simulation can be described as a combined behavior model by following equations.
416 -calculated pressure P
*
experimental pressure P
— ^
Assure relief valve
Figure 2. The circuit of the fluid power system
P=Nrdief_,AP{-k\QA-k)]
Figure 3. The validation result
(2)
The validation experiment was set to change the Ql (flowrate through load) and then the pressure was measured accordingly. Validation result was the comparison between the calculated pressure through our agent-based composable simulation and the measured experimental pressure shown in figure 3. The result shows that the proposed agent-based composable simulation can evaluate the dynamic performance of a fluid power system. Moreover, the errors may expect to be reduced by organizing a more accurate experiment and by building more accurate behavior models. 4
Conclusion
A framework of an agent-based composable simulation is proposed for virtual prototyping. Since the integrated virtual component models were used to represent individual components, composable simulation can be realized by combining all available virtual models for forming a virtual prototype of a final designed system, and getting the system evaluation easily. The idea of composable simulation is analogous to an agentbased framework. The structure of the domain agent is introduced. Composable simulation is implemented based on the communication and collaboration among all domain agents. It is validated by the experimental of a simple fluid power system. References 1. Shyamsundar, N and Gadh, R., Internet-based collaborative product design with assembly features and virtual design spaces, Computer Aided Design, 33 (2001), pp 637-651 2. Fok, S. C , Xiang, W. and Yap, F. F., Feature-based Component Models for Virtual Prototyping of Hydraulic Systems, Int. J. Adv Manu. Tech., 18(9), (2001), pp665672. 3. Diaz-Calderon, A., A composable simulation environment to support the design of mechatronic Systems. Ph.D. Thesis. Carnegie Mellon Univ., Pittsburgh, PA. (2000) 4. Wooldridge, M. and Jennings, N. R., Intelligent Agents: Theory and Practice. The Knowledge Engineering Review. 10(2), (1995), ppl 15-152. 5. Kruger, W.and Bohn, C.A., The responsive workbench: a virtual work environment. Computer, 28(7), (1995), pp.42-48.
KNOWLEDGE-BASED RAPID VIRTUAL ENGINEERING SYSTEM FOR PRODUCT AND TOOLING DESIGN R. D. JIANG Institute of High Performance Computing, 1 Science Park Road, #01-01 Capricorn, Singapore Science Park II, Singapore 117528 E-mail: iiangrd®ihpc.a-siar.edu.sg T. W. LIM Molex Singapore Pte Ltd, 110 International Road, Jurong, Singapore 629174 E-mail: [email protected] B. T. CHEOK Ministry of Defence, 303 Gombak Drvie, #03-29 Singapore 669645 E-mail: [email protected]
This paper presents a knowledge-based rapid virtual engineering system for product and tooling design. The integrated knowledge-based design system consists of product design, progressive die design, injection mould design and process planning and NC code generation system. In order to achieve higher productivity, accuracy and design re-use, a database is built from proven past engineering design complemented by a series of CAE simulation studies. The engineering knowledge base covers different requirements, design intent, performance and various know-hows for the design of product, progressive die and injection mould, etc. The knowledge base is captured, maintained, and accessible by a master model that resides in the enterprise's knowledge repository. This makes it possible for multi-disciplinary optimization of products and processes, and the designers have high degree of confidence to the product quality and reduced time-to-market.
1
Introduction
Product design is a highly complex process. There are no common rules nor de facto procedures to follow. Even in the same company, the products may vary in a wide variety and change with time. To be a product designer, he/she has to consider the product quality itself and its influence to the tooling design, manufacturing process and assembly condition, etc. In the connector industry, the product design has a special close relationship with the die and mould design process. One of the solutions is to call project meetings among product designers, die designers and mould designers so that different groups of designers who concentrate on different areas can come to such a design that the inter-relationship and affections between the upstream and downstream are resolved. However, this process needs transfer of information among designers and errors may be introduced. This also makes the product design cycle become longer. This paper proposed a Knowledge-Based Rapid Virtual Engineering System (KBRVES) for the product and tooling design of connectors. The integrated software environment can perform product design, die design and mould design functions that make use of the company's proven design rules. All the engineering standardization and heuristic rules can be reused and expanded. With the company's knowledge base, the KBRVES system has the capability to overlook the product design, die design and mould design processes and reduce the lead-time and errors.
417
418 2
System Structure
The system is made up of four subsystems - The product design system, progressive die design system, injection mould design system, process planning and NC code generation system, as shown in Figure 1. There are a knowledge-base engine and engineering knowledge base to support the integrated intelligent product and tooling design system. KB-RVES for Product and Tooling Development
....JJ-.
<£
Product Design
Feature Input
Planning
Tr Unfold
Z3EI Design
Firing Rules
Rules
^r Nesting
3D Solids
^r Templates
TT
3T
KB-Die
KB-Mould
=l>
Initializing
Mould base
v Load Part
Pocket
IT
I
Drawings
JL
Shrinkage
FEA <^
^>
Structures
I
\ Layout
Drawings
HI
Process Planning & NC Code Generation Figure 1. System Structure of KB-RVES
2.1
Product Design
To develop a product from an idea to manufactured parts ready for assembly includes quite a number of processes. This needs industrial designers and engineers working very closely to design and produce the products. A multidisciplinary team comprised of computer science, electrical design, mechanical design, and manufacturing skills must be assembled. For innovative product design and short time-to-market requirement, it is necessary to incorporate a database of good engineering knowledge based on past successful cases and a series of computer-based CAE simulation studies. The schematic structure of product design system is shown in Figure 2. The engineering knowledge base for product design is composed of several types of design knowledge [1,3,5]: Heuristic rules, engineering table and CAE simulation. The heuristic rules are proven successful design which are represented using conditional statements in the form IF
419 3D CAD Environment Feature-based Modeler
3T Interface (Design Violation & recommendation)
Heuristic Rules
Engineering Tables .
JX
_IE_
CAE Simulation
"""IE
Company Engineering Knowledge-base Figure 2.Schematic Structure of product design system
2.2
Knowledge-based Progressive Die Design (KB-Die)
Progressive die design is an important component of tooling engineering, it is a skillintensive task and experience-driven process. Incorporating the design know-how into the system will make the design lead-time reduced significantly. The use of KB and solid modeling technology also allows the system to be tightly coupled with FEA analysis programs such that "on-line" metal forming simulation can be conducted during the design stage. This sub-system accepts the output of product design system. The major design processes are [2]: 1. Feature Modeling: This is the input module in which all sheet metal features are input and identified. These features are hole, burring, lance, emboss, half-cut, coining, marking, bending, etc. The information entered is stored as a Features-Tree that would be used for down-stream design activities. A solid model of the part is produced automatically for verification. 2. Unfold: This module unfolds the 3D stampings modeled in Feature modeler. 3. Template manager: This module manages the various die templates used by a company to design the "non-working areas" and ancillary tooling arrangements of the die. It can be used to build up and manage a library of die templates 4. Process Planning: This is the key stage in which domain expert knowledge is applied. Operation arrangement, design of working and non-working areas, etc. are all completed in this process. 5. Die configuration: After process planning, it is time to fire the rules to complete the design of the components and plates and establishing their relationships. 6. ToolViewer: The output from the system can be viewed using the ToolViewer. You can view a single component, a sub-assembly or the entire die assembly. 7. Auto-dimensioning: This module extracts the design information from upstream design phase and automatically produce the detail drawings of plates and components, assembly drawings, BOM, etc. 2.3
Knowledge-based Mould Design (KB-Mould)
There are a number of theoretical research and practical system for injection mould design [4]. The productivity improvement is limited for a generic design system of injection
420
mould. A system with built-in company rules and customized company standards has great advantage in terms of shortening design lead-time. The knowledge-based mould design subsystem adopts a lot of design rules and several standard mould bases. It can be divided into initialization and detailed mould design: 1. Initialization: In this step, shrinkage analysis, number of cavities, cavity layout and mould base selection, etc. are carried out. 2. Detailed mould design: Ejection direction, parting line/surface, core/cavity blocks and inserts design, feeding system, cooling system and venting system are completed with the support of various libraries and knowledge base. Finally a complete assembly tree is setup and 2D drawings, BOM are generated. 2.4
Process Planning and NC Code Generation
This subsystem is integrated with KB-Die and KB-Mould systems and takes the outputs from these two subsystems in either 3D or 2D format. It is capable of offering users rich knowledge-based engineering capabilities and enhancements for interactively or automatically recognizing manufacturing features from die and mold design, planning manufacturing processes and generating NC codes. 3
Conclusion
With the increasing complexity of new products and stringent requirements of customers, more and more time is spent in translating a concept between designers, engineers, modelmakers, and tool builders. This paper proposed a knowledge-based rapid virtual engineering system to build a good bridge over this multidisciplinary team so that the concept refinement and design optimization can be achieved. The system has the support of proven enterprise design rules, which are managed and expended by a knowledgebased shell. Benchmarks on several subsystems have shown that the design lead-time has reduced significantly. 4
Acknowledgements
The authors wish to thank our colleagues at IHPC and Molex Singapore Pte Ltd for their support and collaboration throughout all the projects. References 1. Riel. Object Oriented Design Heuristics, Addison-Wesley, 1996 2. B.T. Cheok, A.Y.C. Nee, Configuration of Progressive Dies, Artificial Intelligence for Engineering Design, Analysis & Manufacturing: Vol.12, No.5, 1998, pp 405-418. 3. G. Chryssolouris and K. Wright, Knowledge-Based Systems in Manufacturing, Annals of the CIRP Vol.35/2/1986, pp 437-439. 4. Altan T., Lilly B. W., Kruth J. P., Konig W., Tonshoff H. K., Van Luttervelt C. A. and Khairy A. B., Advanced Techniques for die and mold manufacturing, Annals of the CIRP, 42(2), 707-716, 1993. 5. Mroczkowski, Robert S. Electronic connector handbook : theory and applications, New York : McGraw-Hill, cl998.
VIRTUAL AESTHETIC DESIGN: ARCHITECTURE AND SOME RESULTS Weishi Li, Shuhong Xu, Gang Zhao, Yinglin Ke+ Institute of High Performance Computing, Singapore, ^Zhejiang University, P.R.China 1 Science Park Road, #01-01 The Capricorn, Singapore Science Park II, Singapore 117528 ^Dept. of Mechanical Engineering & Automation, Zhejiang University, P.R. C. 310027 [liws, xush, zhaog]@ihpc.a-star.edu.sg A novel CAVE-based approach for seamless digital aesthetic design is presented in this paper. In this approach, the virtual prototype is initially constructed using virtual sculpturing, and then refined through virtual measurement. The obtained 3D model can be visually evaluated and/or accurately analysed using finite element method. Finally, the accepted model is reconstructed using standard surface representations. Several of the key technologies, such as virtual measurement and B-Spline curve fairing, are also discussed.
1
Introduction
The market success of consumer products strongly depends on their aesthetic character, i.e. the emotional reaction the product is able to evoke, since the product functionality and quality of different companies have more and more adjusted to one other. The aesthetic design consists of two phases: conceptual design/styling and elaborated design. The application of computer in conceptual design is not very predominant in contrast to the great success of computer application in elaborated design. Physical models are still indispensable in most cases and reverse engineering is unavoidable. This results in digital gap between conceptual design and elaborated design. The gap causes great loss of data quality when the surfaces are reconstructed!!]. Consequently, closing this gap will greatly improve the design quality and reduce time consuming for product design also. The main obstacles for digital aesthetic design are how to provide stylists a intuitive design environment and how to evaluate the digital models without fabricating the physical models[l]. In this paper we present a novel architecture for seamless digital aesthetic design in CAVE. In this architecture, the whole product design process can be implemented without the aid of physical models absolutely. 2
Aesthetic Design in CAVE
Conceptual design phase of aesthetic design is a creative design process. In contrast to commercial CAD (Computer Aided Design) systems that are lack of support for creative design, Virtual Sculpturing systems provide opportunity for the stylists to work out these kind of creative designs with their familiar tools in VR (Virtual Reality) environments. Commercial Virtual Sculpturing systems, like FreeForm™ of SensAble Technologies, are available now. As shown by Riedel et al[2], Modeling in CAVE has several advantages, such as "intuitive working", "real time interaction" and "full-scale modeling" and CAVE is more suitable for product design in contrast to other VR technologies. Then Virtual Sculpturing in CAVE will give stylists a new environment for aesthetic design. The digital models derived from virtual sculpturing are constrained by triangular faceted close surfaces that are suitable to display in VR environment. But the surfaces should be reconstructed using B-Spline surface or other standard surface representation in
421
422
order to refine the designs in CAD systems. Designers that are experienced in RE (Reverse Engineering) often complain that it is very difficult to understand and recognize the design intent of stylist on a limited size 2D screen, whereas in CAVE it is not a problem any more. As in CAVE the designers can view the virtual prototypes in 3D and draft 3D curve frameworks intuitively on the full-scale model, they have more freedom to grasp the design intent of the stylist. Base on CAVE, a virtual aesthetic design workflow is shown in Figure 1. Virtual Sculpturing Here 2D sketching is one phase of JL ~ Virtual Sculpturing and not shown in Vinunl I'romlvpes Z( Visual the workflow. The virtual prototypes Evaluation 3E attained from Virtual Sculpturing Virtual Measurement systems are evaluated aesthetically JL ~ and/or using CAE (Computer Aided Refined Virtual l'rototypes j== Engineering) systems. Fraunhofer IAO[2] makes considerable success in JL ~ Conceptual Surface Reconstruction CAD data evaluation in CAVE-like Design VR environment and we will not JL — in CAVE discuss this problem in this paper. Surface Model CAE is optional and applicable for the products whose shapes are important for their function, like ships and sports CAD cars. The evaluation outcome is fed back to give the designer some clues to Refined Surlike Model ^ Engineering Design modify the initial design. The method called Virtual Measurement, which Figure 1. Virtual Aesthetic Design Workflow used in prototypes refining, is discussed in the next section. The approved prototype is reconstructed using standard surface representation. The succeeding process is the same as the current design process. For assurance the refined surface model could be transformed into triangular faceted surfaces to be evaluated in CAVE again. Surface reconstruction is a main topic in RE. What we are interested in is how to obtain the 3D curve framework for complex surface reconstruction with good quality. Firstly, the curve framework should be defined according to the designer's design intent. This is achieved as a doomed advantage of modeling in CAVE; secondly, the curves should have good quality, in other words, the curves should be fair enough. Our experience in local fairing of B-Spline curves is given in Section 4. 2D sketching and Virtual Measurement also benefit from this technique. The main advantage of this workflow is the design is implemented wholly in a digital environment and the quality loss caused by transformation between physical model and digital model is eliminated. Secondly, aesthetic design is not pure art, it should meet a wide range of constraints and goals[3], where digital model have apparent advantage over physical model. Thirdly, aesthetic design in digital environment also facilitates the stylist to turn their thought perfectly into reality. Some characteristics, like symmetry and rotate, which are very difficult to model on physical model, are very easily to achieve in digital environment. Fourthly, it could help the stylists and designers to achieve a higher level of elaboration in the early development phases for a better decision support. Lastly, the time consuming for product development could be apparently decreased to meet the requirement of the rapid changing market.
I
423
3
Virtual Measurement
The quality of the virtual prototypes should be improved, just as physical model should be polished. A method, so called virtual measurement, is applied to amend the flaws of the virtual prototypes. Firstly four B-Spline curves that form a loop and surround a surface region with bad quality are created and the two non-adjacent curves are defined over the same set of knots with full multiple knots at the beginning and end of the knot sequence. A Coon's surface interpolating the four curves, as shown in Figure 2, is created. For a vertex of the triangle meshes if a projected point could be found on the Coon's surface and its distance to the projected point is less than "••"•"•""•y^-'-^v^iV::^ the user defined distance, the vertex is inside the surface /,':\if;Vv.\'jViV»ViV:!Vi-:V:V:«V-.'vv.V-A region defined by the four curves and its parameter is set according to its projected point. Then the boundary of ' i l i ^ - s l * ! ! " ^ the surface region is extracted and the surface region is Figure 2. Coon's surface separated through trimming the surface along the created from 4 boundary boundary of the surface region. Finally, B-Spline surface is fitted to vertices of the surface region. The fitted B-Spline surface is evaluated by visual analysis and used as a new local virtual model to finish the local sampling and meshing. In the re-sampling process, the boundary points of the surface region are preserved to make sure the boundary points are in correspondence to the initial surface and virtual measuring points outside the boundary are deleted. The boundary points and the interior points are triangulated. At last the triangular mesh is merged into the initial surface. An example is illustrated in jj Figure 3 to demonstrate the I practicability of this method. All surface meshes are rendered with (a) Initial local surface (b) after local measurement Fienre -V F.xamnle of virtual measurement Gouraud-shaded triangular facets. 4
Local Fairing of Cubic B-Spline Curves
Fairness is an important indicator for industrial aesthetics and curve fairing is one of the most fundamental functions for aesthetic design. For local fairing algorithms, two questions should be answered. The first one is how to fair a point. The second one is which point should be faired. Farin et al[4] proposed a simple algorithm for "knot removal" and apply it to fair planar cubic B-Spline curves. Unfortunately, Farin's knot removal algorithm may cause the fairness of the curve to be worse than the initial curve in some cases and as a consequence the curves cannot be faired reasonably without prerequisite. As shown in Figure 4, a curve is defined by control polygon c = {d\, i = 0,1,2,3,4} and knot vector •={'o='l='2<'3<'«
= 13 = t6}. After removing a
knot f3, the new control polygon constructed by Farin's algorithm is C = {D,,J = 0,1,2,3}. But the
Figure 4. A counterexample of Farin's knot removal algorithm
424
fairness of the new curve is worse than the initial one. The least squares approximation(LSA) method performances better than Farin's algorithm. Sapidis et al[5] considered the "worst point" as point should be faired in each iteration. But the fairing iterations often stop before maturing as we tested. Being inspired by curve lofting with physical spline, we propose that, if the worst point could not be faired, the less worse point should be tried to fair. An improved fairing algorithm is obtained according to the above analysis: 1. Evaluate the internal knots; 2. Remove the worst knot using LSA method. If the new curve is fairer than the old one, go to the first step, else undo last iteration; 3. remove the less worse knot using LSA method; 4. If the new curve is fairer than (b) (a) the old one, go to the first step, else undo last iteration; 5. If no bad knot, exit, else go to the third step. Figure 5 shows a curve faired with Sapidis' algorithm, UGII V15.0 and our algorithm. The curvature of the curve is drawn as porcupines to make all wiggles and unfair regions visible. The curve shown in Figure 4(d) is fairer than the other ones. 5
(c) (d) Figure 5. Cubic B-Spline curve faired with different algorithms: (a) the initial curve; (b) faired with Sapidis' algorithm; (c) faired with UGII VI 5.0; (d) faired with our algorithm
Conculsion
A virtual aesthetic design workflow is presented in this paper. Our goal is to provide product designers with a seamless digital product design technique to improve the design quality and reduce time consuming. Two key technologies, B-Spline curve fairing and virtual measurement, are also discussed. Examples demonstrate their efficiency in improving the quality of curves or triangular faceted surfaces respectively. Reference 1. C. Werner Dankwort and Gerd Podehl. A new aesthetic design workflow - results from the European project FIORES. In P. Brunet, C. Hoffmann & D. Roller (eds.). CAD tools and algorithms for product design. Springer, 2000. 2. Riedel Oliver, Ralf Breining, Ulrich Hafner and Roland Blach. Use of immersive projection environment for engineering tasks. SIGGRAPH 1998 Course 14. 3. Westin, SH, Computer-Aided Industrial Design. Computer Graphics(1998) pp.49-52. 4. Farin G, G. Rein, N. Sapidis and A.J. Worsey, Fairing cubic B-Spline curves. Computer Aided Geometric Design 4(1987) pp.91-103. 5. Sapidis N. and G. Farin. Automatic fairing algorithm for B-Spline curves. Computer Aided Design 2(1990) pp. 121-129.
NUMERICAL SIMULATION AND EXPERIMENT ON PREDICTION FOR RETENTION FORCE DONG HONGZHI Institute of High Performance Computing, 1 Science Park Road #01-01 The Capricorn Singapore 117528 E-mail: donghz@ihpc. a-star. edu, sg LIM TONGWAH, LOW BOONHENG Molex Singapore Pte. Ltd., No.110, International Road Jurong Town Singapore 629174 FEM is an effective tool to solve the prediction of retention force in pin withdrawal from housing in design of connector assembly. The research in the paper focuses on generating finite element contact model for predicting the retention force. Under consideration of actual assembly, contact and mechanics models are analyzed as well as FEM model. And then, the calculated results are given, followed by the comparison with experimental results. Lastly, the analysis is offered to the errors in calculation so that the models can be improved in the further application.
1 Introduction A connector provides a separable interface between two subsystems of an electronic system. [1] The assembly and disassembly in pin and housing are very common in the design of connector system. Insertion force exists in assembly of pin and housing. On the contrary, retention force just keeps the pin in the housing during the disassembly of connectors. However, they are different in magnitude. This is why the research was done in the paper. Calculation of retention force is one of the important processes in the design of electronic connector assembly. It is very difficult for designers to predict its magnitude. The actual application often depends on designers' experience. And, experiments in laboratories are also very tedious. One effective method to calculate and predict the retention force is finite element analysis based on numerical simulation technology, which saves time and cost and gives designers the reference parameters. [2] 2 Setup of finite element analysis model 2.1 Analysis of contact model The analysis of retention force is a typical contact problem. For contact between a deformable body and a rigid body, the constraint associated with no penetration is implemented by transforming the degrees of
425
426
freedom of the contact node and applying a boundary condition to the normal displacement. This can be considered solving the problem: [2] aa K
cto
(1)
K
bb.
ba
where K is the system stiffness matrix, u is the nodal displacement, and / is the force vector, a represents the nodes in contact which have a local transformation, and b represents the nodes not in contact and, hence, not transformed. Of the nodes transformed, the displacement in the normal direction is then constrained such that^ 5 n is equal to the incremental normal displacement of the rigid body at the contact point. 2.2 Modeling of retention force Retention force is the friction force between pin and housing. The actual physics of friction continues to be a topic of research. Hence, the numerical modeling of the friction has been simplified to two idealistic models. The most popular friction model is the Adhesive Friction or Coulomb Friction model. This model is used for most applications with the exception of bulk forming such as forging. The Coulomb model is: [2]
afi<-non*t
(2)
Kl where, an is the normal stress, afr is the tangential (friction) stress, /j. is the friction coefficient, tis the tangential vector in the direction of the relative velocity. 2.3 Setup of FEM model The geometric model and position of housing and pin are described in Figure 1. The analysis model is considered as a symmetric one with the boundary conditions shown in Figure 1, in which the displacement on the boundary ab is fixed alone X and 7 direction, boundary be and ad are fixed along the Y direction. Retention force takes place in the course of withdrawal of pin from housing. The FEM model after mesh is shown in Figure 2.
427
Figure 1. Geometric model and boundary conditions
Figure2. FEA model
Figure 4. The first step in pin withdrawal
Figure 3. End of pin insertion
Friction force
Increment Figure 5. The second Lockbarb out of withdrawal
Figure 6. History curve of friction force at the contact node in withdrawal process
3 Computational results and comparison The basic input parameters are as follows:
428 Table 1. Basic Input Parameters
Parameters
Materials
Value
Isotropic Zenite 6130
Poisson ratio 0.25
Young's modulus 21,000Mpa
Friction coefficient 0.12
When the pin inserted the housing completely and after the first step in pin withdrawal, the friction force status is shown in Figure 3 and Figure 4. When the second Lockbarb was out of housing in withdrawal, the distribution of friction force is represented in Figure 5. What we concern is the maximum friction force on every contact node of housing. So, the Figure 6 describes the history curve of friction force at the contact nodes with calculated increment in withdrawal process. For the sake of verifying the accuracy of calculation for the further application, the experiments were done in a company laboratory. During the withdrawal of pin from housing, the measured maximum retention force is 8.319 N, the calculated one 11.010 N. And, the latter is 32.3% more. So, it is seen that the difference still exists in simulation results compared with the experiments. 4 Conclusions As we know, FEM is an effective tool to solve the problem in engineering application. The accuracy of results will depend significantly on the accuracy of FEM model in the preprocessing. From the above calculation, it can be seen that FEM application in analysis of retention force has been approved that we can use it to predict retention force as well as insertion force. However, the accuracy of material properties, geometric model and FE model is vital. In the next research, the more accurate results will be achieved by refining the mesh, especially in the contact area and setting up more exact material properties. References 1. Robert S. Mroczkowski, Electronic Connector Handbook-Theory and Applications, McGraw-Hill. (1998) pp. 1.1 2. Stephane Kugener, Simulation of the Crimping Process by Implicit and Explicit Finite Element Methods. AMP Journal of Technology 4(1995) pp. 8-15 3. MSC.Marc Manual Volume A: Theory and User Information. MSC software (2001)
FAILURE PROBABILITY OF WIRE BONDING PACKAGES F. WANG, Y. Y. WANG, C. LU
Division of Computational Mechanics, Institute of High Performance Capricorn, Singapore Science Park II, Singapore 117528. Email:
Computing, #01-01 The [email protected]
This paper presents a methodology for elastic-plastic analysis of the wire bonding packages. The methodology/model is based on the finite element method for rate sensitive materials and applicable to very large deformation processes. Nonlinear spring elements are adopted to simulate the gradual formation of intermetallic bonding layer between the ball and the electric pad. The effect of temperature increase on material property is taken into account. Ultrasonic power is simulated as displacement boundary condition. The mechanical behavior of the silicon chip during the wire bonding with the ultrasonic irradiation is thus revealed. Failure probability with reference to the stress distribution is discussed. The developed FE methodology is simple but effective to evaluate the failure probability of wire-bonding packages.
1
Introduction
In wire bonding process, there are a lot of parameters that influence the quality of the chips. It is difficult or impossible to determine the effect of all the active factors by experiment or analytical model. On the other hand, research in the area of FE analysis has produced more and more computationally efficient algorithms enabling simulations with greater detail and accuracy for electronic packages. Therefore, there is an increasing tendency to apply FEA in electronic packages for design purpose or to explore the detailed information on performances of the electronic packages [1-3]. In the present study, the advanced FEA software is used to simulate the gold wire bonding process. Since the ball has a very large deformation, it is not very accurate using general meshing technique even using a very fine mesh for the ball and the bond pad. An adaptive meshing technique is implemented to perform this elastic-plastic large deformation analysis. Special nonlinear spring elements are adopted to model the gradual formation of the intermetalic bonding between the gold ball and the pad. Detailed information of the overall stress distribution is then derived. Since cratering at the ballpad interface is the most common failure pattern of a wire bonding package observed in experiment or industrial practice, failure probability regarding to the cratering at the ballpad interface is evaluated according to the derived stress level.
2
Finite Element Model
In the present study, the target gold wire bonding package is taken from the published paper [1] to do a benchmark test of the present FEA methodology. The material properties and also the loading conditions are consistent with the data provided in [1]. The schematic diagram for the geometry of the wire bonding packages are shown in Figure 1. The substrate has length and width of 310 um, the thickness of 300 um. The radius of the circular aluminium terminal is 75 um. Thickness for the aluminium terminal is 1 um. The radius of the gold wire ball is 35 um. The thickness of the oxide file is 1 um. The
429
430
properties for the different material are listed in Table 1. The unit system is defined selfconsistently as length - (im, time - |xs, mass - |ig, force - mN and stress - GPa. Table 1. Material Properties adopted in the present study. Material Gold Capillary Al Terminal Oxide Film Si*
Young's Modulus (GPa) 68.6 313.6 70.3 100.0 166.4
Poisson's Ratio 0.44 0.23 0.345 0.224 0.26
Rate of Strain Hardening (MPa) 1459
Yield Stress (MPa) Eq.(l)
-
-
Figure 1. Schematic diagram of wire bonding process.
Taking advantages of the symmetric geometry, loading and boundary conditions, only a half of the whole structure is modelled with appropriate boundary and loading definition. The FE mesh is shown in Figure 2. The compression bonding load is proportionally varied from 0 N to 0.98 N during 1 millisecond. The 110 KHz frequency ultrasonic wave is simulated as a displacement of the tool to reach its peak value during half period of a cycle and move back to original position during the other half cycle. Adaptive meshing technique is adopted since very large deformation is expected for the gold ball. Nonlienar spring elements are used between adjacent nodes of the gold ball and the pad after the ball is pressed onto the electric pad. The spring element are only applied to those area that the intermatelic bonding occurs. The spring element are assigned an almost naught value initially and an extremely large value to simulate an rigid connection after the intermetalic bonding is assumed to form. The effect of high temperature on the material property is reflected on the strain rate dependency of the yield stress for gold ball as follows, fjy=32.7+0.057 e (MPa)
(1)
431
Figure 2 Finite Element Mesh.
3
Results and Discussion
Figure 3 gives the calculated Von Mises stress distribution of the pad, the dialect and the substrate when the gold ball is fully compressed on the pad. For comparison, the result give in [1] is also shown in Figure 3. It is see from the Figure 3 that the maximum Von Mises stress derived in the present study is 593.6 MPa. It is comparable with the results given in [1], which is 547.2 MPa. The percentage error is 7.82% of the current result. The effect of the ultrasonic power is seen from the increase of the shear stress. Figure 4 shows the transverse shear stress distribution before and after the ultrasonic power is applied. Since it is agreed by many researchers that the formation of the intemetallic layer between the ball and the pad depends on the transverse shear stress, where a larger transverse shear stress improves the intermetallic bonding. In view of this, the application of the ultrasonic power improves the intermetallic bonding since it the increases the shear stress from 177.8 MPa to 180.5 MPa.
432
The application of the ultrasonic power also increases the maximum principal stress from 227.4 MPa to 327.7 MPa (see Figure 5). The maximum principal stress is usually considered the reason to cause cratering at the ball-pad interface. Failure probability because of the ball-pad interface cratering has the following form in terms of the principal
5, Kises (AVA. c r i t . : 75%) — - +5.936G-01 • - 4-5.44.U-01 • - *4 .9478-01 M - M.452a-Gl
Result from [11 Max. 547.2 MPa
1 0
p
zzFms.
i
mj)\ .200/
-50-
B
Present study
4-3^636-01 +2.968a-01 +2.474e-01 +1.979e-01 +1.4S5a-01 *9.9Q3a-02 *4.958e-02 +-1.303e-04
/
-"V , _^.100
100-
J
1
wir«_boncUne (t=iwn, u=l) OOBi t h i r t _ l . o d b ABRQUS/Explicit. 6.2-4 Step• Stap-i rncramant 5304356s s t a p Tima = 992.7 Primary Var; S, Miaas Deformed V a n U Deformation S e a l s Facto:
100
I Oct 22 1 S I 5 6 I U 1WUJST 2002
Figure 3. Won Mises Stress Distribution in the Pad, the Dialect and the Substrate.
After the ultra sonic is applied
Before the ultra sonic is applied
I
&L
Mil
m m.
Figure 4. Shear Stress Distribution in the Pad, Dialect and the substrate.
stress,
= 1 - exp
P = \ — exp - P
t°mr
a v
m
-M
(2)
J
where p and m are the two Weibull parameters. The values of p, m, and a m for the ballpad interface are not very clear. However, it is known the product p/(am)m is proportional to the area of the sample. Suggestion was made by some researchers that am is equal to 360 MPa with the assumption that the mean fracture stress is independent of the area of
433 Before the ultra sonic is applied
After the ultra sonic is applied
mm
wirft_hondinfl
(tal UJ5S, Stap Tiira -
9
Figure 5. Maximum Principal Stress Distribution in the Pad, the Dialect and the substrate
the structure, p, m may be given the value of 0.6 and 11 respectively for the area of the 70 mm2. In the current study, the sample area is the pad-ball interface (D=35 urn). Thus, the product p/(om)m can be reduced by 1.21e-5. The reduction can be implemented by adopting the values of 7.26e-6 and 360 MPa for p and a m respectively. Different combination of values can be adopted to give the same product p/(am)m, but there is no difference in the predicted value of the cratering rate P. In the present study, the derived maximum principal stress at the pad-ball interface before and after the application of the ultrasonic power are 227.4 MPa and 327.7 MPa respectively. Substituting the parameters of p, am, and m and the maximum principal stresses into Eq. (2) gives the failure probability before and after the application of the ultrasonic power as Pl =1.06xl0~ 7 , P = 5 . 9 x l 0 - 6 respectively. So the failure probability of the wire bonding package is increased five times by application of the ultrasonic power. 4
Conclusions
Through the above analysis of the derived results, it can be concluded that a FE methodology for overall analysis of the wire bonding package has been successfully established. The derived stress distribution in the pad, dialect and the substrate are comparable with the results reported by other researchers. The function of the ultrasonic power to improve the ball-pad intermetallic bonding and to reduce the reliability of the packages has been well predicted with the present methodology. Reference T. Ikeda, N. Miyazaki, K. Kudo, K. Arita, H. Yakiyama, "Failure Estimation of Semiconductor Chip during Wire Bonding Process", Journal of Electronic Packages, Vol. 121, pp. 85-91, June, 1999. B. Chylak, S. Kumar and G. Perlgerg, "Optimizing Wire Bonding Process for 35 (im Ultra-Fine-Pitch Packages", SEMICON, Singapore 2001. Y. Takahashi, M. Inoue, "Numerical Study of Wire Bonding — Analysis of Interfacial Deformation Between Wire and Pad", Journal of Electronic Packages, Vol. 124, No. 1, pp. 27-36, March 2002.
SHAPE CONTROL OF SMART COMPOSITE PLATE STRUCTURES BASED ON ACTUATOR SHAPE OPTIMISATION QUAN NGUYEN AND LIYONG TONG School of Aerospace, Mechanical & Mechatronic Engineering, The University of Sydney, NSW, 2006, Australia. E-mail: quan @aeromech. usvd, edu. au
Bldg. J07
This paper presents a coupled alternating loop optimization system (CAWS) for shape control of smart plate structures, in which loci and the size of piezoelectric actuators as well as applied voltages are treated as design variables. CALOS is a two-stage process that consists of the linear least square (LLS) method employed to search for the voltage distribution in a given actuator configuration and the sequential linear programming (SLP) method used to optimize the shapes and loci of the actuator patches for given voltages. An illustrative example is given to validate CALOS.
1
INTRODUCTION
Shape control of a smart structure can be obtained through optimising the applied electric fields, loci and sizes of the piezoelectric actuators attached to the host structure. In this field, many researchers have focused on shape control via finding the optimal actuator electric field for matching the desired shape. In this instance, the electric field in an actuator is the design variable and the optimal value is that which seeks to minimise the difference between the actuated and desired shapes. Koconis et al1 developed analytical methods for determining the optimal values of voltages applied to fixed shaped actuators for achieving the specified shapes for sandwich plates and shells. Chee et al employed the 3rd order plate theory for mechanical deformation and the layer-wise theory for modelling the electric field in the finite element formulation. In this paper, the actuator shapes are fixed and the optimal electric fields are determined using LLS. In the majority of published research on shape control, the shape and location of an actuator are not treated as design variables. This might lead to a high-energy consumption because high voltages may be required. To achieve improved shape control with minimum energy consumption, we propose to investigate piezoelectric actuator design optimisation (PADO). In PADO, the location, shape and size of an actuator are optimised in addition to the value of the applied electric field. In this paper, a couple alternating loop optimisation system is developed to implement PADO for shape control of smart structures.
2
FINITE ELEMENT FORMULATION
In the finite element formulation, the mechanical deformations and the electric fields are modeled using the 3rd order plate theory and Layer-wise theory. For the quasi-static shape control of a smart structure, Hamilton's variational principle can be employed to develop the final system of equations, in terms of the nodal variables including both mechanical and electric quantities, as given by K
K
w II ^*
434
-Q.
435
Where us and
OPTIMISATION METHODS
Shape control can be achieved by minimising the least squared error function N
.
Lnm=Y(w'd-w'c)r
I AN • wd
an
d wc are the desired and calculated displacements
;=i
respectively and N is the total number of selected matching nodes. CALOS is a two-stage process that uses the LLS and SLP methods in an alternating and coupled fashion. 3.1
Linear Least Square (LLS) Technique3
This method minimises the least square error function between the actuated and the desired shapes. It relies on the system being linear between the voltage and displacement variables. The solution of the LLS method can be written as
«4c w Nc4Vrk} Where {(/)}, [wd] and [C] are the voltage, desired displacement and the Influence Coefficient Matrices, respectively. 3.2
Sequential Linear Programming (SLP) Technique4
The shape control can be expressed in a standard form by linearising the objective function and constraints as follows: Minimise V
(vk - xk_1)
(^ ~ number of iteration)
r)\MC
i - ( x k - x kJ ) < ( 1 +factor^? Subject to < (l-factor)wf < — 3x, ' 1
i =i N j = 1...M
(xi-xML)<xj<(xi+xML) Where x are the design variables and xML is the move limit to ensure the linear approximation. M is the number of design variables. The convergence of the optimisation process is checked by the following criterion: |(Lnmlc-Lnm][.i)/Lnmk.1| < £;, or xML< e2 where £i, e2 are convergence tolerance. 3.3
Coupled Alternating Loop Optimization System (CALOS)
In CALOS, the first stage uses a LLS method to determine the optimal voltage distribution to produce the desired shape with fixed actuator geometry. The second stage seeks to optimise the geometrical configuration by minimising Lnm with fixed applied voltage using SLP. Stage one and two are repeated until the convergence condition is achieved. The flowchart of this process is shown in Figure 1.
436
(Start Piogiaiiy ^\ / Cyh e. c k e d , \\
LLS Solver
"
Jtera tion >1 Q Return V, Itera tion = l n Nn
>
SLP Solver
Yes
<
No " "^Checked \ ^Lnm or x M U />— Yes
1
IF
End Pro grarry
^Return new Geometry N
)
Figure 1
4
NUMERICAL EXAMPLE
Consider a cantilever plate with three piezoelectric patches and clamped at its left edge. The plate has a length of 0.15 m, width of 0.06m and consists of 2 layers of thickness 0.01m. The desired shape is defined as wd(x,y)=10"5cos(10x-l). The plate material has the following stiffness constants: Cn=c22=c33=82.68GPa, c]2=Ci3=c23=27.56GPa, C44=c55=c66=26.5GPa. The FE model of this plate is composed of 10 elements. The three active patches are bonded to the plate; these are modeled using six elements. Patches #1 and #3 are bonded at both ends and patch #2 is bonded in the middle of the plate as shown in Figure 2. The patch properties are: the cu=c33=84.8GPa, c22=29.68GPa, ci2=Ci3=c23=36.35GPa, c44=C55=c66=24.2Gpa; the x„=x22=15.3xl0-9F/M, X33=15.0xl09 F/M and the d31=254pm/V, d32=-204pm/V, d33=374pm/V, d24=484pm/V, d15=584pm/V. In this example, the movement of the selected design points are assumed to be only in the x direction as shown in Figure 3. ilE
f:-:|~- !4 p : ' --sis
•<|W...^p
feH Figure 2
s „
m-,M •o*'*c-* +o+
JV;«
•O
Figure 3
The optimisation process converged at the 4th iteration. Table 1 gives the results of the voltages and Lnm at each iteration. In the 1st iteration of the 1st stage, the fixed geometry as given in Figure 2 has been used to calculate the voltage (V;) distribution, the results are shown in column 1. The 2nd stage uses V, as known and optimises the actuator shape. After the solution for the 1st iteration has converged, the process return to the 1st stage of the 2nd iteration to re-calculate Vj as given in column 2. A converged solution is reached once the difference between Lnmnew and Lnm0u is within a relative error (<10~6). It is seen that initially the total energy and shape error function is 1006.563 volts & 2.12xl0"5 respectively and the final optimum values are 892.768 volts & 8.25xl0"6. Thus CALOS
437
can reduce the total applied voltage by approximately 13.31% and the error function Lnm by 61.08% . The final geometrical shape of the structure is shown in Figure 4. Tablel: Iteration history of CALOS Iteration V1 V2 V3 Total Lnm
5
1 393.22 477.663 135.68 1006.563 2.12E-05
2 377.641 445.996 121.464 945.101 1.38E-05
3 363.607 433.006 138.257 934.87 1.07E-05
4 355.259 394.238 143.271 892.768 8.25E-06
CONCLUSIONS
This paper considers a broader problem formulation of quasi-static shape control of smart plate structures, in which both actuator configurations and applied voltages are optimised for a shape matching. As an implementation of piezoelectric actuator design optimisation (PADO), a coupled alternating loop system is proposed and then validated using an illustrate example 6
ACKNOWLEDGEMENTS
The authors are grateful to the support of Australian Research Council under the Discovery Projects grant scheme (DP0210716). REFERENCES Koconis DB, Kollar LP & Springer GS, "Shape Control of Composite Plates and Shells with Embedded Actuators. II Desired Shape Specified", J. Composite Materials, 28 (3) (1994) pp. 262-285. Chee, C , Tong, L. and Steven, G., "A Buildup Voltage Distribution (BVD) algorithm for shape control of smart plate structures", computational mechanics, Vol. 26,2000, pp 115-128. 3. Rorres, C. and Anton, H., Application for linear algebra, New York, Wiley 1979. 4. R.H. Gallagher O.C. Zienkiewicz, Optimum Structural Design; theory and applications, Chapter 7, London, New York, Wiley 1973 5. Nguyen, Q. and Tong, L., " Shape Control of Smart Composite Plates Structures With Non-Rectangular Shape PZT Actuators", Proceeding of the Third Australasian Congress on Applied Mechanics, 2002, pp 421-426
NUMERICAL INVESTIGATION OF MICRO-SCALE SHEET METAL BENDING USING LASER BEAM SCANNING Z.Q. ZHANG AND G. R. LIU Department of Mechanical Engineering, National University of Singapore 10 Kent Ridge Crescent, Singapore 119260 E-mail: engp0581 @nus.edu.sg X. M. TAN Institute of High Performance Computing, 1 Science Park Road #01-01 The Capricorn, Singapore Science Park II, Singapore 117528 E-mail: [email protected]
The numerical techniques such as the finite element method (FEM) are revolutionizing the conventional trial-and error methods in industry today. These methods have proved to be very useful in product development and process design. The parametrical numerical simulation of a laser beam scanning on a micro-scale suspension flexure has been carried out, which includes a non-linear transient indirect coupled thermal-mechanical analysis accounting for the temperature dependency of the thermal and mechanical properties of the materials. From the 3D FE simulation, relationships between the permanent bending angle of the flexure and control parameters of scanning are established and this will help to predict the bending angle of the macro sheet metal under any other scanning condition.
1
Introduction
Laser techniques has been used widely in a serial of engineering processes, such as cutting of complex shapes, drilling on curved surfaces, surface treatment, and the welding of dissimilar metals. The rapid, flexible, lowcost and precise laser forming technology has also attracted considerable attention in the sheet metal forming in recent years [1, 2, 3, 4]. The laser forming process is a thermal-mechanical coupling process. It utilizes the thermal stress induced by laser irradiation to form structural elements into different shapes. High-powered laser irradiation yields high temperature gradients between the irradiated surface and the neighboring material. Non-uniform thermal stresses occur due to the temperature distribution. The material deforms plastically once the thermal stresses exceed the yield point of the material. In a precision device, a small angular distortion of the micro-scale sheet metal inside is unacceptable. To solve this kind of problems, the laser bending technology has more advantages than traditional mechanical adjustment methods. It can achieve the desired accuracy within a short
438
439
time, which leads to low cost. Also, the laser bending technology is a noncontact technology; this will satisfy some special technology requirements. Instead of trial and error experiments, the numerical simulation method using FEM is applied to find out solutions such as heating patterns and control parameters in the process etc. The three-dimensional FE model built up for the micro-scale stainless plate is illustrated in Figure 1. The width of the model is 2.5mm and length of the model is 11mm. The thickness of the model is 0.025mm. The objective of laser heating operation is to adjust the bending angle within 0°~1°. With enough simulation cases being performed, the relationships between bending angle 0 and heating parameters are established, which will be much helpful for practical operations.
ftH
^
•^.Scanning H area
^ (a) isometric view
(b) top view
Figure 1 Illustration of the micro device for adjustment
2. Solution equations Laser-bending process is a thermal-mechanical coupling process. To find out the time-dependent temperature distribution, three assumptions have been used to simplify the general equation of the three-dimensional transient temperature: the laser bending is being performed under the melting temperature of the material; the convection term is much smaller than the heat-transfer terms, so it can be neglected;the material is isotropic. Using Galerkin method, the controlling equation for calculation of temperatures and their dependency on time can be described as follow: [C(D] {7(f)} + [K(T)] {T(t)} + {Q(t)} = 0
where, C(T) denotes the temperature-dependent specific-heat matrix, K(T) denotes the temperature-dependent conductivity matrix, Q(T) is the heat flux vector and T(t) and f(t) are the time dependent nodal temperature and the time derivative of the nodal temperature vector, respectively. This
440
equation can be solved using the Newton-Raphson procedure and the Newmark integration method. On the other hand, equations of thermal stresses and strains can be described by [M(T)]{U(t)} + [C{T)]{u(t)} + [K(T)]{u(t)} + {F(t)} + {F'\t)} = 0
where M(T), C(T) and K(T) are the temperature-dependent mass, the damping and the stiffness matrices, respectively. F(t) is the external load vector and F'"(t) is the temperature load vector. {u(t)}, {u(t)} and {ii(t)} are the displacement, velocity and acceleration vectors, respectively. 3. Finite-element Simulation of the Laser-bending Process The laser beam heat input is modeled as a moving heat source over the surface of the micro-scale sheet metal. The distribution of the heat input is given in the form of the thermal flux density which obeys a normal distribution as follows [4]: 1=—rexp(
r
)
where I is the thermal flux density of the laser beam, a is the absorption coefficient of the sheet metal surface, P is the laser beam power, rb is the laser beam radius and r is the distance from the center of the laser beam. So the mean thermal flux density within the area of the laser beam scanning on the sheet metal surface is: /„ = — I I(2m)dr = — I — r e x p ( ml * nrl * mb2
r)rdr = rb2
— mb2
AISI type 304 stainless steel has been used for the micro-scale suspension flexure. Since the temperature-dependent material properties are important for the accurate calculation of a temperature distribution, the material properties of the specified stainless steel with temperature effect are taken from references [5, 6,7]. When temperature varies from 20° to 1000°, Density of the stainless steel changes from 7.9g/mm3 to 7.49g/mm3 while the thermal conductivity increases from 14.6w/(mK) to 25.2w/(m-K) and the specific heat value shift from 470 J/(kg-K) to 675 J/(kg-K). On the other hand, temperature-dependent mechanical properties will affect the mechanical simulation results seriously. The temperature effect on thermal expansion coefficient, Young's Modulus and yield stress etc. of the stainless steel are also taken into consideration. When temperature increases from 20° to 1000°, Young's Modulus decreases from 193Gpa to
441
128GPa, thermal expansion coefficient changes from 17|im/(m-0C) to 20fxm/(m-°C) and yield strength decreases from 410MPa to 66MPa. In this simulation, Newton-Raphson method is used to avoid the possible divergence. 4. Parametric Study of the Laser Bending Process by FEM The effects of parameters including laser input power (P) & laser scanning time (t) on angular deformation caused by a laser heating process are investigated by this FEM simulation. When a laser beam irradiates on the model surface, the temperature of the irradiated area increases rapidly within very short time. A sharp temperature gradient is established in the sheet metal along thickness. The thermal-mechanical response of the sheet metal is depicted qualitatively in Figure 2, which is explained as follows.
I (a) Temperature distribution
(b) Displacement distribution
Figure 2 Simulation results from both thermal and structural analysis
During the heating period, compressive stresses arise because of the thermal expansion of the heated zone and the bulk constraint of the
Power (w)
0.024
0.76
(a) Bending angle changing with power
(b) Bending angle changing with angle and time
Figure 3 Bending angles due to different scanning conditions
442
materials surrounding the heated area. Plastic deformation occurs in some high temperature zones as the yield strength decreases and the thermal expansion coefficient increases with the increasing temperature. As a result, thermal expansion of the materials causes the target to bend away from the laser beam after cooling down. With a lot of simulation results collected, the relationship between heating parameters and bending angle are established. Figure 3 shows the effect of the laser power and scanning time on the final bending angle both in curve and surface formats. 5. Conclusions This study presents numerical simulation methods on micro-scale sheet metal bending using laser scan beam. With consideration of the timedependent material properties, simulations of a specified stainless steel suspension flexure inside a micro device have been performed to find out the bending angle due to various scanning conditions of the laser beam. With relationships between bending angle and scanning conditions, the angle due to any scanning condition can be easily predicted from the curves or surfaces. References 1. Vollertsen F., Geiger M. and Li W.M., FDM-and FEM-simulation of laser forming: a comparative study. In: Advanced Technology of Plasticity III, Ed. by Z.R. Wang, Y. He (1993) pp. 1793-1798. 2. Kermanidis Th. B., Kyrsanidi An. K. and Pantelakis Sp. G., Numerical simulation of the laser forming process. In: metallic plates, Proc. 3rd Int. Conf. on Surface Treatment '97, (Oxford, UK, 1997) pp. 307-316. 3. W. Li, M. Geiger, F. Vollertsen, Study on laser bending of metal sheets, Journal of Lasers 25(9) (1998) pp. 859-864. 4. Hu Z., Labudovic M., Wang H. and Kovacevic R., Computer Simulation and Experimental Investigation of Sheet Metal Bending Using Laser Beam Scanning. Int. J. Machine Tools & Manufacture 41 (2001) pp. 589-607. 5. Harvey P.D. (Ed.), Engineering Properties of Steel American Society for Metals. (Metals Park, OHIO 44073, 1982). 6. Brandes E.A. and Brook G.B. (Eds.), Smithells Metals Reference Book seventh ed. (Reed Educational and Professional Publishing Ltd, 1998.) 7. Davis J.R. (Ed.), Stainless steels, ASM specialty handbook. (Ohio: ASM International, 1994).
THREE-DIMENSIONAL FINITE ELEMENT STUDY OF THE ELASTIC FIELDS IN QUANTUM DOT STRUCTURES Q.X. PEI AND C. LU Institute of High Performance Computing, 1 Science Park Road, Singapore E-mail: [email protected]
117528
The elastic fields in the self-organized quantum dot (QD) structures are investigated in details by three-dimensional finite element analysis for an array of lens shaped QDs. Emphasis is placed on the effect of elastic anisotropy of the materials. It is found that the elastic anisotropy strongly influences the distributions of strain, stress, and strain energy density in the QD structures. By changing the elastic anisotropy ratio and the cap layer thickness, substantially different distributions of strain energy minima on the cap layer surface are obtained, which may result in various QDs ordering phenomena such as vertical alignment, partial alignment or complete misalignment.
1
Introduction
Quantum dots (QDs) have drawn great attention due to its potential application in the fabrication of a wide variety of novel optoelectronic and microelectronic devices, such as light emitting diodes, photovoltaic cells, and quantum semiconductor lasers [1]. Selfassembled QDs can be grown layer by layer to form ordered nanostructures via the Stranski-Krastanow growth mode that consists of three-dimensional (3D) inlands growth on a two-dimensional (2D) wetting layer. It is well understood that such self-alignment is due to long-range elastic fields induced by the misfit strain between the QDs and the substrate [1,2]. It is also well known that the elastic fields produced by the QDs substantially modify the electronic band structure and so strongly affects the performance of the electronic devices [3]. Hence, the elastic fields in and around the QDs have to be studied in order to obtain a well ordered QDs structure and improve the performance of the electronic devices. The elastic fields in and around the QDs can be analyzed with the atomistic approach, the analytical continuum approach, and the finite element (FE) approach [4]. Comparing to the other two approaches, the FE technique is more powerful and can be used for structures of any geometry shape. In this paper, we report on a three-dimensional FE calculation of the elastic fields induced by an array of lens shaped QDs with wetting layer, which are submerged in a semi-infinite half space. We present a detailed study of materials anisotropy effects on the elastic fields in the multiple QDs system and on the vertical ordering of the QDs. 2
Analysis Method and Conditions
A schematic of the lens shaped QD array is shown in Fig. 1, assuming the QDs are distributed uniformly. Due to symmetry of the structure we only analyze the central square area surrounded by the heavy dotted lines, which covers one complete QD and four quarter QDs. In the model, the distance ( D ) between the two side QDs is taken to be 45 nm, while the thickness of the wetting layer ( WL ) is taken to be 1 nm. The base diameter ( d) of the lens shaped QD is taken to be 24 nm with its height ( h ) being 6 nm. The 3D finite element model of the QDs structure is constructed and analyzed with MSC/MARC,
443
444
1
Cap /
"»
•>
/ » / * > y+~
-> / .//.
i >-•
'
WL"
Sub
l0-1/ X
Figure 1. Schematic of the array of lens shaped quantum dots.
a commercial FE code. The QD lattice constant ad is taken to be bigger than the matrix lattice constant a with the lattice misfit £ n = ( a . — a )/a =0.04. This lattice mismatch is modeled by employing a pseudo thermal expansion of the QDs. For materials of cubic crystals, the degree of elastic anisotropy is usually characterized by the anisotropy ratio A = 2C44 /(C,, — C12) with A = 1.0 being isotropic elasticity. It is roughly equal to the ratio between the values of Young's modulus along the <111> and the <100> directions [5]. The A values for some semiconductor materials are: PbTe 0.27, PbSe 0.29, PbS 0.51, Si 1.56, Ge 1.64, GaAs 1.83, InAs 2.08, ZnTe 2.04, and ZnS 2.53. To cover most of the semiconductor materials, A values are taken in the range of 0.25 to 4.0 in our investigation. The elastic constants C„= 150, Cl2= 50, Cu= 50 GPa are used for the elastic isotropy case A = 1.0, while for each of the four elastic anisotropy cases A = 0.25, 0.5, 2.0, and 4.0, C„ and C12 are kept unchanged with Cu adjusted to make A equal to the corresponding value. Calculations are also carried out for different cap layer thickness with the ratio of cap layer thickness to dot height, H/h, being 2.0, 3.0,4.0, 5.0, and 6.0 respectively. 3
Results and Discussion
We first analyze the calculation results with the cap layer thickness H/h = 2.0. Fig. 2(c) shows the contour plot of the strain 8xx distribution in the Y = 0 plane for the case of elastic isotropy A = 1. Due to the lattice parameter of the QD and the wetting layer being larger than that of the matrix, it can be seen that negative 8xx, i.e. compressive strain, occurs in the QD and the wetting layer. In the matrix, positive £xx, i.e. tensile strain, occurs in the regions directly below and above the QD, while small compressive strain exists in the regions near the QD corners. The influence of elastic anisotropy on exx distributions in the Y = 0 plane is shown in Figs. 2(a), 2(b), 2(d), and 2(e) for the cases with the elastic anisotropy ratio A = 0.25, 0.5, 2.0, and 4.0 respectively. Significant difference of exx distributions can be observed in these maps. In the QD, the magnitude of 8xx contour increases with A changing from 1.0
445
Figure 2. The strain £xx distributions in the Y = 0 plane for different values of anisotropy ratio A.
to 4.0, while it decreases with A changing from 1.0 to 0.25. In the matrix around the QD corners, the exx contour shapes are more horizontally narrowed with A increasing from 1.0 to 4.0, while they become more horizontally elongated with A decreasing from 1.0 to 0.25. This clearly show that when A > 1, the [100] and the [TOO] directions are the elastically soft directions and thus the strain £xx decays rapidly in these directions. When A < 1, the [100] and the [TOO] directions are elastically hard directions and thus the strained region extends further away along these directions. Figure 3(c) shows the distributions of strain energy density on the cap layer surface for the case of elastic isotropy A = 1 and the cap layer thickness Hlh = 2.0. It can be seen that there are pronounced energy minima at positions directly above the buried QDs. These energy minima may make the QDs of next layer to nucleate there preferentially, which result in vertical alignment of newly formed QDs with the buried QDs. It can also be seen in Fig. 3(c) that small satellite energy minima exist in the midway between the buried QDs. Figs 3(a), 3(b), 3(d), and 3(e) show the distributions of strain energy density on the cap layer surface for the anisotropy cases A = 0.25, 0.5, 2.0, and 4.0 respectively. As A increases from 1.0 to 2.0 and further to 4.0, these satellite minima have developed into local minima as seen in Figs 3(d) and 3(e), which may lead to additional QDs formation in the next layer, and thus some newly formed QDs may be misaligned vertically with the buried QDs. However, as A reduces from 1.0 to 0.5 and further to 0.25, the satellite minima gradually disappear and only the pronounced local minima at the top of the QDs remain as seen in Figs. 3(b) and 3(a), which may result in a fully vertically aligned QD structure. When the cap layer thickness is increased to Hlh = 3.0, the strain energy distributions on the cap layer surface are shown in Figs. 4(a)-4(e). For the elastic isotropy case A = 1.0, it can be seen in Fig. 4(c) that besides the pronounced local energy minima at positions directly above the QDs, there are some satellite minima at positions between the QDs. As A increases from 1.0 to 2.0, these satellite minima develop into pronounced local minima as seen in Fig. 4(d), which may result in a partially misaligned QD structure. As A increases further to 4.0, the original pronounced local minima above the QDs disappear and only the local minima between the QDs remain as seen in Fig. 4(e). In this situation, a totally misaligned structure may be formed. As A reduces from 1.0 to 0.5 and further to 0.25, it can be seen from Figs. 4(b) and 4(a) that the satellite minima disappear and only the pronounced local minima at the top of the QDs remain, which may result in a vertically aligned QDs structure. The strain energy distributions on the cap layer surface for the other cap layer thicknesses are also obtained in this study. The calculation results show that the elastic anisotropy and the cap layer thickness greatly influence the energy distribution on the cap layer surface, which may result in different QDs ordering structures.
446
Figure 3. The influence of the elastic anisotropy on the distributions of strain energy density (in GPa) at cap layer surface for the cap layer thickness H/h=2.0.
Figure 4. The influence of the elastic anisotropy on the distributions of strain energy density (in GPa) at cap layer surface for the cap layer thickness H/h=3.0.
4
Conclusions
The three-dimensional finite element approach is used to calculate the elastic fields induced by an array of lens shape QDs. The effects of elastic anisotropy of the materials on the elastic fields are investigated in details. It is found that the elastic anisotropy has significant influence on the elastic fields. Therefore, in calculating the elastic fields of QDs structure, the isotropy approximation should not be used, especially when the material exhibits strong anisotropy. It is also found that the elastic anisotropy and the cap layer thickness have strong influence on the distribution of the energy minima at the cap layer surface. Thus, various QDs ordering structures such as vertical alignment, partial alignment or complete misalignment may be obtained with the changing of the materials anisotropy and the cap layer thickness.
References 1. Shchukin V.A. and Bimberg D., Spontaneous ordering of nanostmctures on crystal surfaces, Review of Modern Phys. 71 (1999) pp.1125-1171. 2. Liu P., Zhang Y.W., and Lu C , Self-organized growth of three-dimensional quantum-dot supperlattices, Appl. Phys. Lett. 80 (2002) pp.3910-3912. 3. Schmidt O.G., Eberl K., and Rau Y., Strain and band-edge alignment in single and multiple layers of self-assembled Ge/Si and GeSi/Si islands, Phys. Rev. B 62 (2000) pp. 16715-16720. 4. Liu G.R. and Jerry Q. S., A finite element study of the stress and strain fields of InAs quantum dots embedded in GaAs, Semicond. Sci. Technol. 17 (2002) pp.630-642. 5. Holy V., Springholz G., Pinczolits M., and Bauer G., Strain Induced Vertical and Lateral Correlations in Quantum Dot Superlattices, Phys. Rev. Lett. 83 (1999) pp.356-359.
DIRECTIONAL DEPENDENCE OF SURFACE MORPHOLOGICAL EVOLUTION OF HETEROEPITAXIAL FILMS P. LIU 1 , Y.W. ZHANG 2 , C. LU 1 Institute of High-Performance Computing, Singapore E-mail: liuping @ ihyc-a-star. edu. ss, luchun @ ihpc. as tar, edu. sg 2
Department
of Materials Science and Institute of Materials Research and Engineering, University of Singapore, Singapore E-mail: [email protected]
National
A three-dimensional continuum method is used to simulate the surface morphological evolution of a heteroepitaxially strained film. In the formulation, the film surface evolves through surface diffusion driven by the gradient of the surface chemical potential, which includes the elastic strain energy, elastic anisotropy and surface energy. Our simulations reveal that the elastic anisotropy strength effects markedly the self-assembly of quantum dots. In addition, it is shown that the island alignment, the island spacing and the island size are related to the elastic anisotropy strength.
1
Introduction
During heteroepitaxial growth, a film may undergo a growth mode transition, that is, from a layer-by-layer growth mode to a threedimensional growth mode[l]. Through such a transition, the film forms a rippled structure, which eventually breaks up into islands. Since these islands are normally dislocation-free, the self-assembly process may be used to fabricate quantum dot arrays, which have many potential applications in microelectronic and optoelectronic devices. The performance of these devices requires a uniform and regular arrangement of quantum dot arrays. Although many attempts have been made to grow a uniform and regular array of quantum dots through self-assembly[2-6], so far there are no reliable procedures to do so. The surface roughness and subsequent island formation are caused by the competition between the strain energy and surface energy of the system. During the surface evolution, the total strain energy decreases while the total surface energy increases. The first-order perturbation has been carried out to analyze the critical condition of strain-induced surface roughening for both elastic isotropic films[7-10] and elastic anisotropic films [11,12]. These analyses have shown that for a perturbed wavelength X, if X > A,c, where Ac is the critical wavelength, the strain energy will dominate the process, and therefore the island formation becomes
447
448
energetically favourable; while if X < Xc, the surface energy will dominate the process, and therefore the surface will remain flat. Of particular interest is the film having elastic anisotropy. In this scenario, the critical wavelength depends not only on the surface orientation [12], but also on the elastic anisotropy strength. Therefore, tuning the elastic anisotropy of the film may change the dot spacing and alignment, providing another degree of freedom to manipulate the self-assembly of quantum dot growth. In this paper, the effect of the elastic anisotropy on the surface evolution is examined. Our attention is focused on the dependence of island selfassembly, i.e., the island alignment and island spacing on the elastic anisotropy strength. 2
Formulation
Consider an elastically anisotropic thin film with initial thickness hf and lattice spacing
af
heteroepitaxially grown on a thick elastically
anisotropic substrate with lattice spacing as. The mismatch strain is defined as e0 = \af — as )las. Furthermore, it is assumed that the film and substrate have the same elastic properties. The surface chemical potential can be written as
z=Zo+n(a>-*r)
(!)
where %0 is the chemical potential of the bulk material, Q is the atomic volume of the diffusive atom, G) = <Jij£iJ/2is the strain energy density, K is the mean curvature, and y is the film surface energy, which is assumed to be isotropic. A linear elastic relation between the stress and strain is assumed, i.e., O^ = C^Sy, where Cijkl is the component of the elastic modulus tensor, o.t. is the component of stress tensor, and £tj is the component of the strain tensor. In the present simulation, the annealing process is assumed. Based on the conservation of mass, the surface evolution equation can be written as vn=DVs2Z (2) where v„ is the normal velocity, D = DsSjkBT, Ds is the surface diffusion coefficient, Ss is the diffusive layer thickness, kB is the
449
Boltzmann constant, and Tis the absolute temperature. Eq. 2 can be written in the following weak form \vnSvndA= \DVs2%SvndA (3) s s where integration is over the film surface. By assuming a symmetrical condition and applying the surface divergence theorem, Eq.3 can be rewritten as
\vn8vndA=\D%Vs\dvn)dA S
(4)
5
Since the above weak form is very stiff due to the term K in the chemical potential %, a semi-implicit Euler scheme is introduced to integrate the above equation[ 13]. For an elastic isotropic crystal, i.e., A — 1, there are only two independent elastic constants: the elastic modulus E and the Poisson's ratio v. For the FCC or BCC crystals, there are three independent elastic constants in the reference coordinate system, namely, Cu, Cn and C^. The elastic property can also be expressed in the elastic modulus E, the Poisson's ratio v and the elastic anisotropy strength A. The two sets of elastic constants have the following relationships: 2 E = (c* + CuCn -2C, 2)/(Cn + C12), v = Cn/{CU + C12), and A = 2C44/(C11 - C12). The strain energy density in the initially flat film is ^o
=
V^ii + ^11^12 ~^n)£o/^n
•
The calculation procedures are as follows: suppose the shape of the stress free reference at time t is known; the deformation and diffusion occurring at a subsequent infinitesimal time interval At will be determined. Firstly a finite element method is used to calculate the strain and stress along the film surface, then the strain energy density can be obtained. Secondly, according to the surface geometry, the surface curvatures can be obtained. Thirdly, a finite element method is used to determine the velocity of the surface in the reference configuration at time t. Finally the change in the shape of the surface during the time interval At is then deduced; the procedure is repeated and the shape of the surface is calculated as a function of time. In the calculations, the following scheme of normalizations is used: co* = a)/co0, I* = lco0/y, t = ?^1D((W0/^)4, where / is the length scale and t is the time scale.
450
3
Results and Discussions
We will examine the effect of elastic anisotropy on the surface roughening and island formation. We will vary elastic anisotropy strength A, while keeping the elastic modulus E and the Poisson's ratio v fixed. All the simulations start from a same random surface. The unperturbed film surface is the {100} surfaces. 3.1 A=l The simulation results for A=1.0, i.e., the isotropic case, have been extensively reported[14]. It was shown that at the initial stage, the surface evolves into random ripples. Subsequently the ripples break up into islands, which are randomly distributed. Due to the elastic interaction, the islands are able to self-organize to a certain extent. Thereafter, the islands undergo ripening. As the ripening process proceeds, the larger islands expand under the expense of smaller islands. 3.2 A>1 For A=2.0, our simulation showed that the ripples are formed markedly along the <100> directions. Subsequently the ripples break up into islands, which are also aligned along the <100> directions. The island array is more uniform and regular than the isotropic case. The island spacing and island size are increased with the increase of A. Similarly, the island array also undergoes ripening. For the case with a stronger elastic anisotropy strength, A=4.0, the island formation is shown in Fig. l(a)(b)(c)(d). At the initial stage, the ripples are predominately along the <100> directions as shown Fig. 1(a). Although the ripples break up into islands as shown Fig. 1(b) and (c), which are similar to the previous cases, however, in this case, the island array is remarkably uniform and regular as shown in Fig. 1(d). The island spacing and size are further increased. Unfortunately, the island array also undergoes ripening.
451
(d)
W
Figure 1 The surface evolution with A=4. (a) the surface develops into ripples, which are predominately along the <100> directions; (b) and (c) the ripples break up into islands, which are predominately aligned along the <100> directions, and (d) the islands self-organize into a fairly uniform and regular array.
(a)
'
(b)
Figure 2 The surface evolution with A=0.5. (a) the surface develops into ripples, which are predominately along the <110> directions; (b) the ripples break up into islands, which are predominately aligned along the <110> directions, and (c) the islands self-organize into a fairly uniform and regular array, a dislocation-like defect can be seen, and (d) the islands undergo ripening.
3.3 A<1 For A=0.75, the simulation results showed that the ripples are strongly aligned along the <110> directions. Subsequently, the ripples break up into islands, which adopt a fairly uniform and regular array. The island array is also strongly aligned along the <110> directions. The island
452
spacing and size are decreased as compared to the isotropic case. This island array still undergoes ripening. A surface evolution process for A=0.5 is shown in Fig.2(a)(b)(c)(d). These results show a similarity with A=0.75. But the alignment along the <110> directions is stronger as shown in Fig.2(a)(b). The islands adopt an almost uniform and regular array except for some regions where exhibit dislocation-like defects as indicated in Fig.2(c). The island spacing and size become even smaller. Still the island array undergoes ripening as shown in Fig.2(d). In summary, it is clearly shown that the elastic anisotropy strength has markedly effects on the surface roughening and island morphology. When A>1, with the increase of elastic anisotropy, the surface ripples and formed islands become increasingly aligned along the <100> directions. The island arrays become increasingly uniform and regular. The island size and the averaged island spacing are increased. For the cases of A<1, with the decrease of elastic anisotropy strength, the surface ripples and formed islands become increasingly aligned along the <110> directions. The island arrays become increasingly uniform and regular. The island size and the averaged island spacing, however, are gradually decreased. In all of these cases, the island arrays undergo ripening. 4
Conclusions
We have used a three-dimensional finite element method to investigate the effect of elastic anisotropy on the island formation. It is shown that when the elastic anisotropy strength is greater than one, with the increase of elastic anisotropy, the ripples and islands are becoming increasingly along the <100> directions, the ability of self-assembly of these islands is increased, and the island size and island spacing are increased; whereas when the elastic anisotropy strength is smaller than one, with the decrease of the anisotropy strength, the ripples and islands are becoming increasingly along the <110> directions, the ability of self-assembly of these islands is also increased, but the island size and island spacing are decreased.
453
References 1. J.Y. Tsao, Materials Fundamentals of Molecular Beam Epitaxy. Academic Press, Boston, MA, (1993). 2. J.A. Floro, G.A. Lucadamo, E. Chason, L.B. Freund, M. Sinclair, R.D. Twesten, and R.Q. Hwang, Phys. Rev. Lett. 80,4717(1998). 3. P. Sutter and M.G. Lagally, Phys. Rev. Lett. 84,4637 (2000). 4. R.M. Trop, F.M. Ross and M.C. Reuter, Phys. Rev Lett. 84, 4641 (2000). 5. G. Springholz, V. Holy, M. Pinczolits, and G. Bauer, Science, 282, 734 (1998). 6. X. Deng, J.D. Weil, and M. Krishnamurthy, Phys. Rev. Lett. 80, 4721 (1998). 7. R.J. Asaro and W.A. Tiller, Metall Trans. 3, 1789 (1972). 8. M.A. Grinfeld, Sov. Phys. Dokl. 31, 831 (1986). 9. D. J. Srolovitz, Acta metal. 37, 621 (1989). 10. L.B. Freund and F. Jonsdottir, J. Mech. Phys. Solids, 41, 1245 (1993). 11. H. Gao, Modern theory of anisotropic elasticity and applications, Edited by J.J. Wu, T.C.T. Ting and D.M. Barnett, SIAM, Philadelphia, 139 (1991). 12. Y. Obayashi and K. Shintani, J. Appl. Phys. 84, 3141 (1998). 13. Y.W. Zhang, A.F. Bower, L. Xia, and C.F. Shih, J. Mech. Phys. Solids, 47, 173 (1999). 14. A.G. Cullis, D.J. Robbins, A.J. Pidduck.and P.W. Smith, J. Cryst. Growth, 123, 333 (1992). 15. C.S. Ozkan, W.D. Nix, and H. Gao, Appl. Phys. Lett. 70, 2247 (1997). 16. Y.-W. Mo, D.E. Savage, B.S. Swartzentruber, and M.G. Lagally, Phys. Rev. Lett. 65, 1020 (1990).
FORMING OF NANOSTRUCTURED MATERIALS: NUMERICAL ANALYSIS IN E Q U A L C H A N N E L A N G U L A R E X T R U S I O N OF M A G N E S I U M , ALUMINIUM AND TITANIUM ALLOYS
B.H. HU AND J.V. KREIJ Singapore Institute of Manufacturing Technology (SIMTech), 71 Nanyang Drive, Singapore 638075 E-mail: [email protected] Equal channel angular extrusion (ECAE) is a promising technique for producing ultra-fine grained (UFG) or nanostructured materials based on the principle of simple shearing. Through analysis, it is shown that only the geometrical factor , namely, the half-angle of the two intersecting channels, and the number of ECAE passes, N, affect the effective strain. The equivalent linear reduction ratio, ro/ri, is derived to describe the size reduction effect of an object such as a grain. The most effective intersecting angle (2) is 90°. Compared to traditional area reduction extrusion, the deformation effect is equivalent to an area reduction ratio of 1 million or a linear reduction ratio of 1022 after 12 passes of ECAE. Magnesium AZ31B, aluminium 6061 and pure Titanium were used for the study. Three types of die designs for ECAE of each alloy were proposed and numerically analysed. The effective strain, von mises stress, equivalent area reduction ratio and equivalent linear reduction ratio were compared for the three types of die designs based on the simulation results using ANS YS/LYDYNA. The parameter N,i_,„m, namely, the number of passes of ECAE required to reduce lOOu. structure into lOOnm structure, was calculated for each design. A grain size of lOOu, can be deformed into nanostructure through as few as 12-17 passes of ECAE.
1
Introduction
The research and development in nanotechnology has attracted tremendous attention world-wide[l]. One of the important aspects of nanotechnology is the development and application of nanostructured bulk materials, as this type of materials is visible, practically-usable and engineerable. Three-dimensional nanostructures are important to manufacturing industries, when high or supper mechanical properties are required. This is extremely so where light-weight structural metals such as magnesium, aluminium and titanium are used. Equal channel angular extrusion (ECAE), also named as equal channel angular pressing (ECAP), is one of the most promising techniques that can produce ultra-fine grained (UFG) or nanostructured materials. It is claimed that lOOnm or even smaller grain sizes can be produced. Some achievements have been reported by researchers from USA, Korea, Russia, etc. [2-6]. However, information or in-depth analysis on die design and the related specific flow and deformation behaviour has not reported. As part of the effort of an on-going project in casting and forming nanostructured light alloys, ECAE is being evaluated and used in SIMTech in forming of nanostructured light alloys such as magnesium, aluminium and titanium. 2
The effective strain in an ECAE process
An ECAE die consists of two channels that have an equal square cross-section meeting at a sharp bend, intersecting at an angle of 2<5, as shown in Fig. 1. A billet, for example, a metal ingot, is placed in the top channel or the bottom channel. The billet is then forced by a press into another channel and undergoes a simple shear process in a thin layer at the
454
455
cross plane of the channels. Heavy uniform deformation can be imposed throughout massive billets without a change in cross section [3-5]. Given p is the punch pressure, o 0 the yield stress, yxy the shear strain and e the effective strain, to conduct a complete simple shearing process, the p should be at least the following equation, namely E q . ( l ) [3].
Fig-1 Sketch of ECAE[3]
The simple shear process can be repeated many times by putting back the previously extruded material to the equal channels for the next pass extrusion. The total shear strain (yxy) after N passes is given by Eq (2)[4-6]. Based on the von Mises' yield criterion [6-7], the effective strain, e, after N passes of ECAE, will be given by Eq (3). When (£ is 45°, the biggest e can be achieved. p = -^o-0ctg«>
(1)
Yxy=2NctgO
(2)
e = ^ctgO>
(3)
If comparing the effectiveness in mechanical deformation of ECAE to the conventional forward extrusion, the equivalent area reduction ratio (Ao/AO deduced from the total effective strain will be given by Eq (4), where Ao and A] represents the inlet and the outlet cross-section areas in conventional forward extrusion. If r0 and r! are the respective edge lengths (for a squared cross section) or the diameter (for a circled cross section) of the inlet and the outlet of the conventional extrusion die channels, the Eq (4) can be converted into an "equivalent linear reduction ratio", namely, ro/r1; shown in Eq (5). It can be used to describe the linear reduction effect of an object, such as the size of a grain.
A
o
( ctg0)
_0- = e A l
f
V3
'o K (4)
(
Jrctga))
J U _0.=eV3 r l V Al
(5)
It can be seen that only the geometrical factor O, namely the half-angle of the two intersecting channels, and the numbers of ECAE passes, N, affect the effective strain. The punch pressure is only determined by O and the yield stress of the material (o 0 ). The most effective intersecting angle (20) is 90°, namely a
3
Conceptual design of ECAE dies and numerical analysis
The design of ECAE has been so far considered as classified information and is not well known or understood by the public. It is interpreted as that the technology or the mechanism regarding the die design and the plastic deformation of ECAE has not been fully developed yet. In order to understand the behaviour of the plastic deformation
456
before cutting a physical metal die, some conceptual die designs were proposed and numerically analysed. This is to minimise the development time and cost. A commercial FEM software ANSYS/LS-DYNA was used numerical simulation. The alloys used for calculation were magnesium AZ31B, Aluminium 6061 and the commercially pure (CP) titanium. The yield stress used for numerical simulation is 229 MPa for AZ31B, 55MPa for 6061 and 280MPa for CP-Ti. The elastic modulus used is 45 GPa for AZ31, 63GPa for 6061 and 118GPa for CP-Ti. The Poisson's ratio is 0.30 for AZ3 IB, 0.35 for 6061 and 0.36 for CP-Ti. Three types of ECAE die designs based on the most effective intersecting angle (20) were numerically analysed. The Design 1 has 90° sharp corners at the intersecting of the both channels. The inner diameter of the channels for this study is 15mm. The blank billet used for the numerical analysis is 15mmxl5mm in the cross section area and 50mm in length. Fig. 2 is the simulation result on the effective strain of the AZ31B specimen during the first pass of ECAE at a stage of 60% extruded. It indicates that the largest strain exists in the corner region, where the simple shear occurs. The effective strain along the shear plane is about 1.10, which is converted to an area reduction ratio of 3 or a linear reduction ratio of 1.73. This value is very close to the theoretical value of the simple shear process. The simulation also shows the stress at ,
.
•
, • ,
-,™*m
, • .
,
,
^' A1 ? ,, ^H?'1™
stral
°
of
ECAE of AZ3IB (Design 1)
the sharp corner is as high as 390MPa, which may lead to potential problems such as worn-die, cracks and/or entrapment of "dead' materials at the intersecting corners, etc.. To overcome these potential problems, fillets were added respectively to the inner and outer corners of the ECAE die to form a new die design, namely Design 2. The blank metal billet is still 50x50 in cross section and 50mm in length. The radii of the fillets for the inner and outer corners are 7.5mm and 22.5 mm respectively. Fig. 3 is the simulation result on the effective strain of the ^,7=™™^" iS;*s AZ31B specimen during the first pass of ECAE at a stage of 60% extruded based on Design 2. It indicates that the largest strain still exists in the corner region, where the simple shear occurs. The effective strain along the shear plane is about 0.36, which can be converted to an area reduction ratio of 1.43 or a linear reduction ratio of 1.2. The stress at the intersecting *• R 3 corner is reduced to about 290MPa. It indicates that Design 2 8- Effective strain of . „ . . . . , , . ,. . j . ECAE ofAZ31B (Design 2) will minimise the tendency of worn-die, cracks and/or entrapment of "dead' materials at the intersecting corners faced by Design 1. However, compared to Design 1, the effective strain is reduced from 1.10 to 0.36. It means a big reduction in the mechanical deformation effect. To compromise the advantages and side effects of both Design 1 and Design 2, another design, namely Design 3 was proposed. It is a design with no fillet for the inner corner but a fillet for the outer corner with a radius of 15 mm. The blank metal billet is 50x50 in cross section and 50mm in length. Fig. 4 is the simulation result on the effective strain of the AZ31B specimen during the first pass of ECAE at a stage of 60% extruded based on Design 3. It indicates that the
457 largest strain still exists in the corner region, where the simple shear occurs. The effective strain along the shear plane is about 0.85, which can be converted to an area reduction ratio of 2.34 or a linear reduction ratio of 1.53. The stress at the intersecting corner is changed to 310MPa. The advantages of the Design 2 are kept, but the effectiveness of mechanical deformation in terms of effective strain is greatly improved from 0.36 to 0.85, namely the area reduction ratio from 1.43 to 2.34, or a linear reduction rate from 1.20 to 1.53.
&JKT
"\
Fig. 4 Effective strain of ECAE of AZ31B (Design 3)
The simulation analysis results are summarised in Table 1, where Ei represents the effective strain after the first pass of ECAE. Total effective strain e after N passes will be Nei.The parameter, N^,,,,, represents the number of passes of ECAE (rounded to the nearest integer) required to reduce of 100^ structure into lOOnm structure, based on the Eq (4)-(5). Namely, N ^ n m is given by Eq (6), where the ratio ^,=100,000/100=1000.
N.
[L n (-^-) 2 ]/ e i =13.8/e
= (L. n A
n
(6)
"1
'i
It can be seen that, to deform a microstructure (lOOu) into a nanostructure (lOOnm), 13 passes are needed for Design 1, 16 for Design 3 and 38 for Design 2. Being considered the fact that Design 1 may result in many potential problems due to high localised concentration stress and the N R ^ n m value, Design 3 is recommended to be used for the next stage of practical ECAE experiments. Table 1 Comparison of three designs for ECAE
Design
El
1 2 3
1.10 0.36 0.85
Ao/Ai(times)
TQ/TI (times)
^U->nm
^max
3
1.73 1.20 1.53
13 38 16
390 290 310
1.43 2.34
Similar numerical analysis was also conducted on ECAE of aluminium and titanium based on the previous three types of die design. Fig.5 shows the simulation results respectively for 6061 (Fig. 5-a) and CP titanium (Fig. 5-b) during the first pass of ECAE at a stage of 60% extruded based on Design 3. It indicates that the largest strain still exists in the corner region, where the simple shear occurs. The average effective strains along the shear plane are about 0.90 for 6061 and 0.81 for CP titanium, which can be converted to area reduction ratios of 2.46 and 2.25 or a linear reduction ratio of 1.57 and 1.50 for 6061 and CP titanium. According to Eq (6), to deform microstructure (100u) into nanostructure (lOOnm), 15 and 17 passes of ECAE are needed for 6061 and CP titanium respectively.
mP^
I)
i \
;W *
1
"£r
•
II
|l (a)
(b)
Fig 5 Effective strain of ECAE of Al-6061 and CP-Ti
458
4
Conclusions
Only the geometrical factor
Acknowledgements This study is part of the SIMTech's NanoAlloys project (C02-P-008AR) in casting and forming nanostructured light alloys funded by the Agency for Science, Technology and Research (A*STAR) of Singapore. References 1. M.C. Roco, J. of Nanoparticle Research, Kluwer Academic Publ., Vol. 3, No. 5-6, pp 353-360, 2001 2. I. Kim, W.S. Jeong, J. Kim, K.T. Park & D.H. Shin, Scripta Materialia 45 (2001) pp 575-581 3. L.R. Corwell, K.T. Hartwig, R.E. Goforth & S.L.Semiatin, Materials Characterisation 37, (1996), pp 295-300 4. H.G. Selam, Proceedings of 1CCE/9, San Diego, July 1-6 (2002), pp 689-690 5. V.M. Segal, Mat. Sci. & Eng. A197 (1995), pp 157-164 6. S.C. Chen, Z.Q.Wu, B.H.Hu, J. Liang & B.J. Wu, Hot-working Technology, Tsinghua University (1992), pp 150-199 7. B.H. Hu and J.v. Kreij , Forming of Nanostructured Materials (I) -Numerical Analysis of Plastic Deformation in Equal Channel Angular Extrusion (ECAE) of Magnesium AZ31B Alloy, Submitted to Journal of Materials Processing Technology.
THE DEVELOPMENT OF STANDARD PART DATABASE FOR PROGRESSIVE DIE DESIGN ZHONGHUI WANG Institute of High Performance Computing,! Science Park Road Singapore 117528 Progressive die has been widely used to mass produce metal stamping for electrical, electronic and mechanical applications. Of all the components within a progressive die set, standard parts account for a big portion of design work, which requires a lot of interactions for designer to do part selections and parameter specifications. Thus, the performance of data retrieving from the standard part database is an important factor in shortening tooling design lifecycle. Excel-based worksheets are now very popular in storing standard part parameters. However, the existing tool to retrieve Excel-based data is not sufficient due to its low speed for interactive operation in die design system. Besides, the content in a table cell still needs to be evaluated before it can be used for design. This paper reports a method to expedite the data retrieving efficiency by use of a set of predefined keyword-format files that can be easily accessed. A converting tool has been developed to convert the original Excel file into keyword-format files. Retrieving functions like searching and matching are available for these files. The proposed database is made up of the original excel files, the intermediate keyword files and tools for file conversion and data searching. The practical examples have proven the efficiency of the proposed database in our Knowledge-based Die design system. Keyword: Database, Standard Part, Progressive Die Design, Computer Aided Design, Data retrieving
1.Introduction: Progressive die design is one of the important design activities in tooling industry. However, progressive die design is still a complex, skill-intensive task and experience-driven process [1][2]. A typical progressive die set includes metal plates such as Die shoes, Punch plate, Backing plate, Stripper plate, Die plate, and various inserts such as punches. During the design of each plate, standard parts like Fasteners, Dowel pins, Spacers, Springs, guiding pins and bushings would be used for purpose of locating the plate, or transmitting force or mechanic movement. These components can sum up to hundreds or even thousands in quantity. Thus, standard part design takes a big portion of the overall design task. Designers need to visit the standard part database for part specification or for parameter selections or for material properties. This imposes a requirement for efficient data retrieving from the database, especially for the progressive die design system nowadays. Due to its flexible format, Excel worksheet has getting more popular for industry to store engineering data. For example, the Japanese standard part provider MISUMI p l has already distributed its catalog for standard component in Excel worksheets. The "catalog of standard components for press dies" is one of the most frequently used standards for tooling design. A common way to interact with Excel data would run through the open database connectivity protocol (ODBC). ODBC provides a call-level API that different database vendors implement via ODBC drivers specific to a particular database management system (DBMS). Applications can use this API to call the ODBC Driver Manager, which passes the calls to the appropriate driver. The driver, in turn, interacts with the DBMS using Structured Query Language (SQL). Such process hierarchy is quite heavy. An ODBC driver for Excel worksheets has been provided by Microsoft. Nevertheless, the efficiency of the driver for retrieving is inadequate for design interaction according to our experience. The details will be explained on the comparison section. Another aspect is that the industrial data stored in Excel sheet is only "Raw" data, which requires proper parsing tool to analyze and identify before use. Take the Block Lifter Set as an example, its block thickness "T", has a series of values like "16,20, 25, 30, 35, 40" within one cell,
459
460 which is also referred to as the enumeration format. During data retrieve, the string for "T" should be obtained first, and then we need to parse the string into an array of real values before it can be used for matching or selections. Other tables, for example Lifter Pin Set, its spring length "FL", take the form of "20-5-70" in one cell, which is called the incremental format, and actually means that FL may have the value from 20 to 70 and 5 is the increment. For this format, the start value, increment and end value has to be identified before data searching. In order to settle the two drawbacks in the Excel worksheets, we propose a method to improve the data retrieving by organizing file storage and way to access. A set of key-word formatted files are used as intermediate files to store Excel data, and then data fetching is implemented. The proposed database is made up of the original excel files, the intermediate keyword files and tools for file conversion and data searching. The practical examples have demonstrated the improved efficiency of the proposed database in our Knowledge-based Die design system. 2.Proposed method: The use of database is a necessity for a progressive die design CAD software system. However, a major factor in a user's satisfaction or lack thereof with a database system is its performance. If the response time for a request is too long, the value of the system is diminished. The performance of a system depends on the efficiency of the data structures used to represent the data in the database and on how efficiently the system is able to operate on these data structure [4"5' Although ODBC provides a common way to interact with Excel worksheets, the process hierarchy is quite heavy as explained earlier. Such speed may not be acceptable for the interface of die design system, where the data retrieving and parameter selection are enormous. As text-format files are easy to be accessed by common stream library, we use predefined keywords to define the format for the tables originated from excel files so that each table and its field titles and data column can be easily identified. Our proposed database model use the Excel data as the "Raw" source, and use the keyword-formatted file as the source for data retrieving and matching. A set of tools has been available, which include a tool to convert the Excel Data source into keywordformatted source, and tool to do data retrieving and matching. The overall structure of the database model is illustrated in Fig 1.
Domain aDDlications Fig 1. The overall structure for the Database (StdBase)
2.1
Data storage
To represent a specific table in the data source, a set of keywords is used to define the table format. Each table is delimited by the "TABLE_BEGIN" and "TABLE_END" pair, in which field
461 titles are preceded by "COLUMN_HEAD", and column data are delimited "COLUMN_BODY" and "COLUMN_END" pair. TABLENAME indicates the name corresponding table. All the keywords, the data titles and data value will be prefixed by symbol '#', so that each data segment can be separated and identified during data querying. example for a title block table named Molex (Table 1) is illustrated below:
by of the An
Table 1 A table named Molex in the Excel work sheet
TBNO* TBName Width 1 MOLEX A4P 178 2 MOLEX A3L 377 MOLEX_A2L 553 3 The corresponding keyword-formatted table #TABLE_BEGIN # TABLENAME=MOLEX #COLUMN_HEAD #TBNo* #TBName #Width #Height #COLUMN_BODY #1 #MOLEX_A4P #118.0*200.0 #2 #MOLEX_A3L #377.0 #205.0 #3 #MOLEX_A2L #553.0 #314.0 #COLUMN_END #TABLE_END
Height OriginX 200 0 205 0 314 0 is as follows:
#OriginX #0.0 #0.0 #0.0
OriginY 0 0 0
#OriginY #0.0 #0.0 #0.0
TextHeight 2.5 2.5 2.5
#TextHeight #2.5 #2.5 #2.5
Based on the keyword-formatted text, a set of tool has been developed to parse the text block and retrieve data items. It will be proved in the next section that data retrieving from keyword formatted file is much faster than that from the Excel file. 2.2
Data Matching
In order to meet the requirements for progressive die design, a standard part database should provide the functions for designer to match with an initial value. For example, a typical matching method should provide the following functions: 1) Get the maximum (or minimum) value for a parameter in the table; 2) Get the closest allowable value for a parameter in the table if an initial search value is used; 3) Get the allowable value that is greater but closest to a search value; and so on. Implementation of these functions is simple if each cell in a table holds a single value, however is complicated if multiple values are held in a cell like the enumeration format and incremental format as mentioned on the Introduction section. Here, we aim at the general case with a mixed combination of enumeration format and incremental format like "20,30,40, 45-5-60, 70,80,90". We have to identify how a cell data is organized so as to separate each value. As the incremental format is characterized as three values connected by two '-'s, the whole string in the cell can be parsed into two groups of fragments corresponding to the enumeration format and incremental format. The enumeration group can be stored in data array, say, the Enum_Array. while the incremental group is stored in a list structure with three double members, and we name the list as Inc_List. By sorting the Enum_Array and the Inc_List, We then shall be able to implement the above mentioned basic functions. The matching algorithm for these functions has been realized both for the Excel format based matching method and keyword format based matching method, and the details will be described in later paper.
462
3. Comparison: In this section, we conduct a comparison between the previous ODBC based Excel data retrieving and the proposed keyword-format based retrieving method. Two testing programs, the DBFetch and DBContain are employed for this purpose. DBFetch uses the Excel driver provided by Microsoft to retrieve excel-format data, while DBContain is our program to retrieve keywordformat data. As to the data source, a standard part catalogue, the Guide Post Set, is chosen for the comparison. Guide Post Set has 10 diameter profile tables (each has 9 rows and 23 columns), one equivalent table, and 81 length profile tables (each has 6 rows and 6 columns). DBFetch and DBContain are supposed to process two operations: Select all the tables from the Guide Post Set catalogue Fetch all the column data from a table, whose name is "RS" in the Guide Post Set. The result of the comparison is shown in Table 2. Table 2. The efficiency comparison between DBFetch and DBContain
Operation Operation 1 Operation 2
Time used by DBFetch (Tl) (Seconds) 36.12 2.285
Time used by DBContain (T2) (Seconds) 0.08 0.01
Speed of DBContain DBFetch (T1/T2)
over
451.5 228.5
From Table 2, it is clear that the speed of keyword-format data retrieving has greatly improved than that of ODBC-based Excel data retrieving. 4.
Conclusion: Excel worksheet is easy for user to input and store the design parameters and standard library. However, it not convenient for use with traditional ODBC method in a real CAD system, especially for progressive die design. By introducing the intermediate keyword-formatted data storage, and a set of tool, our proposed database can retain the easy-use feature of excel worksheet, and on the other hand has improved the efficiency for data retrieving. Such database has been incorporated into our knowledge-based die-design system. Reference: 1 Cheok, B T and Nee A.Y.C, "Configuration of progressive dies", Artificial Intelligence for Engineering Design, Analysis and Manufacturing, 1998, Vol 12, pp405-418. 2 Cheok, B T Zhang Y.F and Leow L.F.,"A skeleton-retrieving approach for the recognition of punch shapes", Computers in Industry, 1997, Vol. 32, pp249-259. 3 MISUMI Corporation, "Face, Standard Components for Press Dies", April, 1998, http://www.musumi.co.jp 4 Korth, H.F ."Database System Concepts", Second edition, McGraw-Hill, Inc, 1991. 5 Karpovich, J.F. and French J.C, "High Performance Access to Radio Astronomy Data: A Case Study", Proceedings of Seventh International Working Conference on Scientific and Statistical Database Management, Charlotteswville, Virginia, Sept. 1994, pp240-249.
THE APPLICATION OF SENSITIVITY ANALYSIS TO MODIFYING CAR BODY CONFIGURATION Xuerong Zhang Maotao Zhu School of Automobiles and Transportation, Jiangsu University, Dantu Road 301,Zhenjiang city, Jiangsu province, China E-mail: [email protected] In this paper, three dimensions CAD model of a car body is built up with UG software. The model is inputted into ANSYS program by port. After the model is modified properly, it can be meshed. It is necessary for modal analysis to define materialO real constants and constraints etc. Then we can launch the ANSYS solver to calculate the model. The modal parameters (modal frequency and mode shape) are gained. Their validity is verified by modal test. On above base, structure modification and optimization are carried out. According to the theory of interior cumulation relativity, their mode shape and frequency relativity is calculated. The degrees of relativity determine if the finite element model can substitute the prototype or not. According to request of exposing environment, its first order frequency must less than 23Hz,but the actual value is 24Hz, so the adjustment must be made. Sensitivity method is used to modify body structure. The values of sensitivity to thickness of sheetmental are calculated. The parameters of thickness corresponding to larger value are regarded as design variables. The anticipant target is minimum expense.
1
Introduction
There are two cases during structural dynamic modification. First, the structure modification slightly in detail due to design or manufacture, we can work -out dynamic property changing value according to it. Second, we are able to change structure parameters in order that structure dynamic properties (for example, natural frequency and mode shape) meet request we are expected. In allusion to complex structure, there are many modification precept and design variables, so it is necessary to decide which way is most effective. We can find out these parameters or variables that is highly sensitive to dynamic property and use them as design variables during optimization. The method based on sensitivity analysis can avoid blindfold action, improve efficiency and cut down design cost [1]. 2
Building Body Finite Element Model
In this paper, the car body surface model is built by UG software. During the process, we take action in simplifying model mainly including omitting function and unloaded component, simplifying section shape properly, omitting these features influenced performance slightly such as small hole, boss, groove and flange etc [2]. Finally the CAD Model of car body is built up. It is inputted into ANSYS program by IGES port. After the model is modified properly and element type, real constants (thickness of sheetmental) and material are set up, it can be meshed. The whole finite element model is composed of 14,848 nodes and 15,544 shell elements. 3
Modal Analysis with Finite Element Method
Any constraint and force is not applied to FE model, that is to say the model is free of restraints, you specify a modal analysis and mode extraction method. The four methods (Block Lanczos, subspace, PowerDynamics, and reduced) are the most commonly used in
463
464
ANSYS program. Block Lanczos method is applied to finding many modes (about 40) of large models, recommended when the model consists of poorly shaped solid and shell elements. This solver performs well when the model consists of shells or a combination of shells and solids. Works faster but requires about 50% more memory than subspace. So first six modes (mode shapes and frequencies) are shown in Table 1 and Fig.l to Fig.4. 1
ANSYS S . O FEB 2 3 2 0 0 2 09:56:01 DISPLACEMENT STEP-1 SUB - 1
Fig. 1 First mode shape of whole torsion
Fig.3 First mode shape of whole bend
4
Fig.2 Second model shape of whole torsion
Fig.4 Local vibration of coping
Validate Finite Element Model by Experiment
According to structure characteristic, 161 testing positions forming grid chart are lay out in whole car body. The whole experiment system is composed of charge amplifier, three-dimensional acceleration sensor, force hammer (including force sensor), software system of modal analysis and SD380 dynamic analysis apparatus etc. During the process, one curve of transformation and interference function is got as shown in Fig.5. According to experiment results, numerical value of interference function is bigger near interesting frequency. In fig.8 the interference function is 0.998 on 61.125Hz.It is found that only 0.2% in response signal is produced by interference, other is produced by inspirit signal. So the result is accurate. Modes (mode shapes and frequencies) of experiment are shown in Table 1 and Fig.7 to Fig. 10. In Table 1, the number of frequencies less than 50Hz is 6 in calculating and 4 in testing respectively. Their degrees of relativity determine that the finite element model can substitute the prototype whether or not. According to the theory of interior cumulation relativity, their mode shape and frequency relativity is calculated.
465
Sensitivity
2000 0
,1,
I , I , HM^.1, | H I > L+jyl
-?—0
11 1
* i *J*J— .558 "Jff|. iTfl si.iannz cm"™ "'sis .ii(ii~>&' XF-KI 'SOS H 3 Fig.5 Curve of transformation and interference function
Component Nuiber Fig.6 Sensitivity to first order frequency Tablel Comparing Frequency between calculation and Experiment
Modal Order 1 2 3
Frequency^ HzD Calculating Testing 24.04 23.90 28.03 30.87 28.14 39.20
Modal Order
FrequencyD HzO Calculating Testing 35.70 44.74 38.75 51.04 45.73 54.24
4 5 6
/.
Fig.8 Second model shape of torsion of of testing
Fig.9 First mode shape of whole bend of testing
Fig. 10 Local vibration of coping
Assuming X e Rn, it is real mode shape of testing modal.D Y e Rn is mode shape of calculating modal. The interior cumulation relativity of X and Y in Hilbert space is defined as [3]
pc{xj)=n=l==JL== J{X'*){f'?)
(l.i)
We program to calculate it with MATLAB software according to (1.1). The result is shown in Table2.
466 Table2 The interior cumulation relativity of first six order mode shapes
Testing Modal Relativity 2 3 4 5 6 Order 1 0.3541 0.1021 0.2593 0.3856 0.2235 1 0.9242 n 0.2432 0.1847 0.1645 0.1811 0.2403 0.2920 2 S o SL 0.0084 0.3035 0.8794 0.3864 0.4299 0.3106 3 Q. O 0.3124 0.1252 0.2123 0.2163 0.1475 0.4011 4 0.1256 0.3127 0.2384 0.2134 0.1643 0.8142 5 3' 00 0.2736 0.1679 0.1377 0.2070 0.3293 0.8627 6 The interior cumulation relativity ranges from 0 to 1 .The bigger these values are, the better their relativities of them are. According to analysis results of relativity, corresponding modes is found and possibly missing modes in testing is also found (Because the inspiritment method in testing is single point. If we want to avoid missing modes, we can apply multipoint inspiritment method to distinguishing dense and equal frequency modes). The relation of corresponding modes between calculation and experiment is shown in Table3. So the finite element model can substitute the prototype. The model can be used as foundation in modifying body and optimization design. Table3 The relation of corresponding modes
Testing Modal Frequency Order 23.90 1 / 2 30.87 3 4 / 5 39.20 6 44.74 4.1
5
Calculating Modal Frequency Order 24.039 1 28.027 2 28.135 3 35.702 4 38.749 5 45.726 6
rc.eiauvuy 0.9242 / 0.8794 / 0.8142 0.8627
Annotation* The symbol ' / ' denotes missing of this order mode
The Application of Sensitivity Analysis to Modifying Car-body Configuration
According to request of exposing environment, the first order frequency must less than 21Hz,but the actual value is about 24Hz,so it needs be adjusted. This paper applies sensitivity method to modifying body structure. The sensitivity values to thickness of sheet mental are calculated, the thickness parameters corresponding larger sensitivity value are regarded as design variables. So it will expend minimum to realize the anticipant target. Here we assume that Xr (eigenvalue) is complicated function of ra^kVpz-(mass, stiffness, damp): K = / K / ,ky ,Cy) (1.2) We are able to change structure parameters in order that structure dynamic properties (natural frequencies and mode shapes) meet requests we are expected. When the scope of adjustment is narrow, second-order correction item is omitted. According to literature [4], the eigenvalue is approximately defined as
"- = X X < ^ 7 » - • X X <$£*«. * X X <£j««. <'-3> 1 = 1 j' = l
v
;=i
j= \
'J
,=i
y=i
u
467
On the basis of equation (1.4), we have A/r = X(-^)A/*,.
(1.4)
Where Ah, stands for basic variable of thickness of sheetmental and 3 /r/ stands for /ahi
natural frequency sensitivity to thickness. On the premise that sensitivity have been known, we can find out the change of thickness according to the change of frequency. It is highly important that we are able to figure out the value of sensitivity. Sensitivity is a board conception. In mathematics, it is classified as differential and difference [4]. According to literature, we know that the calculation of differential sensitivity is highly complicated and data size is very enormous. So we figure out difference sensitivities. The whole body is composed of 17 components through weld or bolt. Thickness of every component is regarded as independent variable. The thickness of every component from 1 to 17 increases 10% respectively, then we figure out the change of natural frequency respectively. The change of frequency divided by the change of some component thickness is sensitivity to thickness of this component. Through repetitious calculation, the sensitivity of first order natural frequency is shown in Fig.6.It is found that the components of larger sensitivity are 15, 17.They correspond with front standing pillar and doorsill.The target value of first order frequency is less than 23Hz. So 4fi =f\-J\= 23.0-24.013 = -\M3(Hz) According to equation (1.4), assumed that A/i, is direct ratio with its sensitivity. Their scale coefficient is k. A/, = (-^-AA15 +-^-Ahl7) = 3787 *0.38/t + (-2675* (-0.27 k)) = 2161.3* = -1.013 So the scale coefficient is -4.69e-4. Therefore the thickness of above two components cut down 0.2 millimeter and add 0.1 millimeter respectively.In ansys program, we modify the corresponding real constants, then solve again. It is found that the first order frequency is 29.92Hz.So it is satisfying. Therefore the sensitivity analysis is an effective method in modifying body structure. It will expend minimum to realize the anticipant target. References 1. Zhifang Fu and Hongxin Hua, Theory and Application of Modal Analysis(Shanghai liao Tong University Press,2000). 2. Tianming He and Xiangnong Xu ect. Intensity Analysis of Unitized Body With Stub Front Frame. Wuhan Automobile University Transaction. Vol.18 NO.l 1996.4 3. Qiyin Cao, Relativity Analysis of Modal Based on interior cumulation.Application Mechanics Transaction( 15)4,1996. 4. Vanhonacker P.Differential and Difference Sensitivities of Naturnal Frequencies Via Sensitivity Analysis,Proc.of the 3rd IMAC ,1985
THE SIMULATION OF THE IT-TYPE CONSTRAINT BENDING PROCESS Xu Hongzhi Institute of High Performance Computing, 1 Science Park Road, HOI-01 The Capricorn Singapore Science Park II, Singapore 117528 Email: [email protected] In forming complex sheet bent part the calculation of the unfolded length is crucial to design technology, which is one of the numerous factors affecting the final shape and size of a product. This paper presents the analyses of the Il-type constraint bending deformation process by means of FEM. Mechanic models is set up according to the deformation characteristics and the gap-element is employed to handle the contact problem of the punch, die and blank. To study the deformation regularities and calculation method of the unfolded length, the effects of die parameters and material properties on deformation are discussed particularly and the distribution of stress, strain and displacement in a blank are examined under the Il-type constraint bending under different condition. The deformation features and regularities of the Il-type constraint bending are found and a formula for calculating the unfolded length of the Il-type constraint bent parts are proposed.
1.
Introduction
The bending process is very important in stamping industry, for bending parts the calculation of the unfolded length and the forecast of springback are crucial for stamping die designer. But the calculation of the unfolded length and springback are related to many factors such as tooling geometry, material properties, friction and so on, so it is hard to calculate the unfolded length of bending parts accurately. Usually people use empirical formula to calculate the unfolded length of bending parts [1]. Hill's accurate mathematical theory of plasticity [2] gave people theoretical help in mathematical calculation in the bending process. In recent years many people have done research in bending process using FEM simulation [3, 4] and achieved good results which could help engineers to design stamping die. They have studied the V-bend and free bending process using FEM simulation and received good results which compared with the practical test. For complicated parts, for example, the Il-type constraint bent part, there are many facts affect on them to simulate the bending process and it is also difficult to use empirical formula to calculate the unfold length accurately. The Il-type part is a blank for the strengthen frame, as shown in Fig 1. The calculated result of unfold length of Il-type part is very important for the precision of the strengthen frame. This paper presents the
n- lypa part
strengthen
frame
Fig 1. the n-type part and the strengthen frame
analyses of the Il-type constrained bending deformation process by means of FEM. Mechanic models is set up according to the deformation characteristics and the gapelement is employed to handle the contact problem of the punch, die and blank. A formula for calculating the unfolded length of the Il-type constraint bent parts is given.
2.
Simulation of Il-type constraint bending process
For analysis the bending process of Il-type constraint bent part, the constrained bending process is simulated by ADINA software using non-linear gap elements. The discretization of mechanic model is shown in Fig 2. In the simulation process very fine meshes are required on the sheet part between the bending punch and die for analysis the
468
469 deformation feature and regularities and calculating the length changing of the part. The updated formulation adopted in this simulation is shown in follows:
{o}w = {of-" + [D] {Ae}(i)1 where the {ay''' is the stress in the gap elements. [Lp-20 j Punch
tens
ii Die
Fig 2. Discretization of mechanic model in ITtype constrained bending The sheet material used for simulation is steel 20, brass H62 and LY12 Al, sheet of thickness 1.0mm and length 60.0mm. The material property parameters are listed in Table 1. Due to its symmetry, only half of the blank is considered and it is divided into 750 quadratic elements with six layers in the thickness. Table 1 Material property parameters Material Yield stress Young's modulus E0 (MPa) as (MPa) 20 240 21000 H62 188.72 10000 LY12 277.03 7100
Poisson's ratio v 0.37 0.34 0.31
The tangent modulus after yielding ET(MPa) 1000 1364 1558
The simulation results are shown in Fig 3 with same forming condition and the radius of bending punch and die are 1,0 mm. It is seen that the material property affects on the tangential strain, but it is not much. The strain between bending punch and die are minus, •xoi' No*No3No*. No5Neb' No'No8-
-.Me3£-Q3 -5bd3l*(ii -67?C£*OL .6e5*t*Q2 .I3C3£-03 .i9?I£*01 .?535E>Q3 000«>OQ
No?< -o556E*03 N 0 3 -3656£»03 No* -;568E«CS No5> j;J3E-03 No6 5M8C*03 NO?' 744JE.03 NoS MOOE'OD
fT7~-s
Fig 3. Variation of the tangential strain et when Rp = Rd = 1.0mm and the depth of bending h = 4.0mm for different materials (a) 20 (b)LY12 (c) H62
470 that means the thickness of the blank is thinning after constrained bending process and the thinning are different according to the material and radius of tooling. The smaller radius increases the bending deformation and the thinning is greater.
3.
The formula of calculating the unfold length for Fl-type constrained bending process
For calculating the unfold length of a bent part, usually the bent part is divided into deformation area and non-deformation area and used different calculation method in different area, it is shown in Fig 4. The calculating formula is shown in follows: l
A L = 1,+ 12 +hab + 13 + 14
Normally the empirical formula is application to single bent part. This Il-type constraint bent part is a multi-bend and its way to form is different from single bent part, it can be divided three area, they are (1) non-deformation area (li and 12), (2) deformation area (13 and 14) and (3) thinning area (hab). For this constraint bent part the new calculating
Fig 4, Division of blank unfolding calculation
formula is needed. For non-deformation area, the unfold length is just the length of this area. In deformation area to calculate the unfold length the curvature radius of strain neutral layer is key factor. In the simulation of Il-type constraint bent part we know that there are greater deformation at bending punch and die radius and between them than other parts. From the simulation results of bending process for Fl-type constraint bent part the relation between curvature radius of strain neutral layer and relative bending radius are found, they are shown in follows: LY12 Al:
pEp = 1.001840r/t + 0.151910 pEd = 0.993623r/t + 0.343599
(in punch radius) (in die radius)
Brass H62:
pEp = 0.985617r/t +0.236465 pEd = 1.002220r/t + 0.300318
(in punch radius) (in die radius)
Steel 20:
pEp = 0.996763r/t +0.143295 (in punch radius) pEd = 0.979220r/t + 0.309408 (in die radius) where pE is curvature radius of strain neutral layer, r is radius of tooling, t is the blank thickness. For the thinning area, from the simulation results we can find the following function: whenr/t
471 So from these analysis the formula of calculating the unfold length for Il-type constraint bent part is shown in follows: Vi L = 1, + 12 +p Ep a+ p e d a + hab(l--n) where L is the unfold length of blank, 1] and 12 are the length of non-deformation area, pEp and pEd are curvature radius of strain neutral layer at punch and die radius, a is bending angle, hab is the length which is shown in Fig 4, and T| is thinning percentage of blank at partofhab. For verification this formula of calculating the unfold length for Il-type constraint bent part, 3 material which are used in simulation are tested, the measured and simulated results are shown in table 2. From the table 2, it can be seen that the simulated unfold length is a little bit bigger than experimented, the difference come from curvature radius of strain neutral layer at punch and die radius, because this curvature radius are given by linear regression analysis method used simulation data. Table 2 Experimentally measured results of Il-type constrained bending (R„ = Rj = 2.0mm) (mm) Material
L,
12
LY12 H62 Steel 20
33.34 4.48 49.94
31.84 45.10 42.10
Length of Bottom 19.18 19.16 19.18
hi(lr])
h2(lrt)
7tpp
7tpd
6.32 8.42 7.36
6.29 7.76 7.57
6.77 6.94 6.71
7.32 7.24 7.12
Measured thinning percentage 5.26% 5.37% 5.41%
Simulated length of blank 111.06 139.10 139.98
Measured length
Difference
111.0 139.00 139.94
+0.06 +0.10 +0.04
Even though there are difference between simulated and experimented, the results simulated and experimented for Il-type constraint bent part coincided well and the difference are very small, this formula can be used in manufacture. 4.
Conclusions
1. The deformation features and deformation regularities of constraint bent part are found by means of FEM method to simulate Il-type bent part and it is useful for understanding multi-bending process. 2. According to the simulation results of Il-type bent part, the formula of calculating the unfold length for Il-type bent part is proposed first time: l
A L = 1, + 12 +pep a+ p£d a + hab( 1 -r\) whenr/t
5.
Reference
1. Li Shuoben, The Technology of Stamping, 13-17, Jixie Gongye Press, Beijing, 1982. 2. R. Hill, Mathematical Theory of Plasticity, 142-177, Oxford, London, 1962. 3. M. Kawka and A. Makinouchi, "Shell-Element Formulation in the Static Explicit FEM Code for the Simulation of Sheet Stamping, ", Journal of Materials Processing Technology, 50 (1-4), 105-115, 1995. 4. Eiji Nakamachi, "Sheet-forming Process Characterization by Static-Explicit Anisotropic Elastic-Plastic Finite-Element Simulation", Journal of Materials Processing Technology, 50 (1-4), 116-132, 1995.
OPTIMISING THE DIMENSIONS OF CYLINDRICAL ULTRASONIC MOTOR YANG QUANGANG Data Storage Institute, DSI Building, 5 Engineering Drive 1, NUS. E-mail: YANG [email protected] LIM SIAK PIANG Department
of Mechanical Engineering, National University of Singapore E-mail: [email protected]
Despite the simplicity of the structures, the current difficulties in designing high performance ultrasonic motors are associated with the lack of complete and accurate models and well-understood design rules because motor's parameters are time varying and load-condition dependent. In this paper, an expression, which describes the relation between the length and radius of the ceramic transducer of cylindrical ultrasonic motor, is presented after the theoretical analysis of its longitudinal and flexural vibrations. It can be used as a basis to choose the dimensions of the ceramic transducer at the design stage.
Introduction Recently, a lot of R&D efforts have been directed to the understanding and optimisation of ultrasonic motors because of their advantages over the conventional electromagnetic motors. However, despite the simplicity of the structures, the current difficulties in designing high performance ultrasonic motors are associated with the lack of accurate models and wellunderstood design rules because the motor's parameters are time varying and load-condition dependent. A special feature of ultrasonic motors is their two-stage energy conversion mechanism. Electrical energy is first converted into high frequency mechanical oscillations, and it in turn is rectified into macroscopic unidirectional motion of rotor at the second stage. The dimensional optimisation of the piezo-ceramic transducer plays an important role in increasing the conversion efficiency at the first stage. Cylindrical ultrasonic motors have been increasingly studied recently [1-3]. However, they didn't deal with the choice of design parameters. In this paper, the vibrations of the ceramic tube are investigated. By equating the flexural frequency to the longitudinal frequency to obtain the maximum vibration amplitude during excitation, the optimal dimensional relationship between the radius and length is obtained. It can be used as a basis to choose the dimensions of the transducer at the design stage. Simplified Piezoelectric Equations A piezo-ceramic tube is shown in Figure 1. The poling direction is aligned along the r-direction. Cylindrical coordinate is used and its origin is chosen at the center of tube and z coincides with the axis of the tube. 472
473
Utilizing the hypotheses described in [4], the d-type piezoelectric strain equations can be simplified and reduced to S
r
=
S
13T8
S
\3Tz+d13Er
s T +s T +d
e
(1)
+S
E
(2)
= n e n z 3i r
S
(3)
=SnT6+SUTz+dUEr-
z
From equations (2) and (3), we have Tz = YE [Sz + nSe - d3l (1 + pL)Er ]/(l - n2)
(4)
(5) T8=Y0E[Se+nSz-d3l(l + ii)Er]/(l-H2), E E E E where Y l/s x is the elastic modulus and ji = - s 2/s x is the Poisson's ratio.
Tier
Figure 1. Piezo-ceramic tube.
Figure 2. The bending of piezo-ceramic tube.
Longitudinal and Extensional Vibration of Piezo-Ceramic Tube In piezo-ceramic cylinder, if we isolate an infinitesimal arc element with angle 3d and height dz, the differential equations can then be established in the radial and axial direction as p(d%/dt2) = -Te/R (6) p(d2Zz/dt2) = dTz/dz. (7) where p is mass density and £ is displacement component. Under harmonic vibrations, equations (6) and (7) can be simplified to pm%=TB/R (8) -pco2Zz=dTz/dz, (9) in which co is the angular frequency. Differentiating equation (8) with respect to z, and considering Sg = £r/R and Sz = d£Jdz , we obtain d%/dz2+k%=0. where k =
CO
(l-^Xco/co.f-l
(co/coy-i
(10) CO,
, and c 2 = YE Jp . c is the
wave propagating velocity, and cox stands for the angular frequency of the radial vibration without considering the piezoelectric effects.
474
As }X is about 0.3, (1-/L12) 1/2~1.05. Therefore, under the conditions fw/ft), < 1 < „ ,,„, the solution of (10) can be written as {(0/(0, > ( l - / i )"V
= pi2(o\
(13)
where a>2 = TIC/I is the angular frequency of longitudinal vibration without considering the piezoelectric effects. This is a coupled frequency equation of longitudinal and radial vibrations. In our discussion, only the fundamental vibration (n=l) is considered. Moreover, as described in [5], if the length of the cylinder tube is greater than half of its perimeter, the lower eigenfrequency will correspond to the longitudinal vibration. Hence, the fundamental longitudinal frequency can be obtained {(o2 + a2)--yjiaf -co2)2 +Aji2(o2(o z v (On = , — ' ^ — 11 2 2(1- n )
.
(14)
Flexural Vibration Analysis of the Ceramic Tube For the cylindrical ultrasonic motor, the cylinder ceramic tube is uniformly segmented into four quadrants. With two opposite electrical sources applied to one pair of quadrants facing each other, one quadrant will contract while the other expend, this will bend the tube, as shown in the Figure 2. If the applied voltages are R.F. signals, bending vibration will be excited. For simplicity, the bending vibration can be described by Euler beam model and its fundamental flexural frequency can be written as
in which R0 and R. are the outer and inner radius of the tube. Dimensional Relation of the Ceramic Tube To obtain the maximum vibration amplitude during excitation, it is important to make the longitudinal and flexural frequencies equate to each other. Let con = (Ofl, and note that Y0E = E and R = (Ro + Ri )/2, then
475
250.3x(/? o 2 +/g,. 2 )(l-^ 2 )
2 2
i
=
i
n—-
2
2
(16)
4 4 ;r 2 ( le^V { 7t (fl 0 +fl,.) 2 / 2 "\| (* 0 +/?,) 2 /2 (K+Rfl2 This expression can be used to choose suitable tube dimensions during the design of new cylindrical USM. We can fix either the radius or the length, and then determine the other. As an example, the transducer in [1] has 2.4mm outer diameter and 1.4mm inner diameter. Utilizing equation (16), if the Poisson's ratio of PZT is 0.36, the calculated tube length with same diameters will be 5.1 mm. It is about half of their chosen length of 10mm. Summary In this paper, longitudinal and extensional vibrations of piezo-ceramic cylindrical tube have been analysed based on the simplified piezoelectric equations and the differential equations of equilibrium. Their frequencies are obtained after decoupling the frequency equation initially involving both longitudinal and extensional vibration. The flexural vibration of the tube was described by Euler beam bending model. By equating the flexural frequency to the longitudinal frequency, the optimal dimensional relation between the radius and length of tube has been attained. This will provide a basis to choose the dimensions of the ceramic transducer at the design stage. References 1. T. Morita, M. Kurosawa and T. Higuchi, Cylindrical Micro Ultrasonic Motor Utilizing Bulk Lead Zirconate Titanate(PZT), Jpn. J. Appl. Phys. Vol. 38, (1999), pp 3347-3350.Part 1. No.5B 2. T. Morita, M. Kurosawa and T. Higuchi, A Cylindrical Microultrasonic Motor, Ultrasonics 38 (2000) 33-36 3. Lu Pin, Kwok Hong Lee, Siak Piang Lim, Wu Zhong Lin, A Kinematic Analysis of Cylindrical Ultrasonic Micromotors, Sensors and Actuators, A87 (2001) 194-197 4. J. F. Haskin and J. L. Walsh, Vibrations of Ferroelectric Cylindrical Shells with Transverse Isotropy. I. Radially Polarized Case, The Journal of the Acoustical Society of America, V29, 1957, 729-734 5. Stefan Markus, The Mechanics of Vibrations of Cylindrical Shell, Elsevier, 1988
DESIGN AND ANALYSIS OF A HIGH-EFFICIENCY MR VALVE W. H. LI, H. DU AND N. Q. GUO School of Mechanical & Production Engineering, Nanyang Technological 50 Nanyang Avenue, Singapore 639798 E-mail:mwhli @ ntu. edu. sg
University
This paper presents an optimized design of a high-efficiency magnetorheological (MR) valve using finite element analysis. The MR valve composes a core, a wound coil, and a cylinder shaped flux return. The core and flux return form the annulus through which the MR fluid flows. The effects of magnetic field formation mechanism and MR effect formation mechanism on the MR valve performance are investigated. The analytical results of magnetic flux density in the valve indicate that the saturation in the magnetic flux may be the core, the flux return, or the valve length. To prevent the saturation as well as to minimize the valve weight, the dimensions of the valve are optimally determined with the finite element analysis. In addition, this analysis is coupled with the typical Bingham plastic analysis to predict the MR valve performance.
1
Introduction
Recently, a very attractive and effective approach to developing controllable hydraulic devices makes use of MR fluids. Devices using these materials have many advantages, including: valves have no moving parts, eliminating the complexity and durability issues in conventional mechanical valves, providing a direct transduction from an electrical control signal to a change in mechanical properties [1]. Among these semiactive MR devices, MR valve is a key component, which plays a significant role in contributing to the device performance. Despite many investigations on semi-active MR devices, research work on systematic or optimal design of MR valves is relatively rare. In particular, the coupling between magnetic field formation mechanism and MR effect formation mechanism still lacks much attention. Conveniently, the design and manufacturing of MR valve is based on trial-and-error method, the valve performance with such empirical approach depends much on the designer's practical experience. It is quite time-consuming and the cost is very high for the industry to implement mass production. As such, it is crucial to develop systematic design algorithms for manufacturing high-efficiency and low cost MR devices. The objective of this work is to design a small-sized, and highefficiency MR valve. For this purpose, an efficient magnetic circuit was designed using ANSYS for magnetic field analysis [2]. Once the MR fluid characteristics and core material property are incorporated into the analysis, and the response of the magnetic circuit as a function of applied current has been determined, a highly efficient design for the magnetic
476
477
circuit can be achieved. In addition, this finite element analysis is coupled with the typical Bingham plastic model to predict the valve performance. 2
FEM Modeling and Analysis of the MR Valve
2.1 New Modeling The schematic of the proposed MR valve is shown in Figure 1. This valve consists of a core and a flux return, and an annulus through which MR fluid flows. The bobbin shaft is wound with insulated wire. A current applied through the wire coil around the bobbin creates a magnetic field in the gap between the core and the flux return. Then the magnetic field increases the yield stress of the MR fluid between the core and the flux return. At the design stage, many parameters, including fluid gap, bobbin shaft diameter, flux length, the thickness of flux return as well as the wire numbers, should be considered. To achieve an efficient MR valve, the flux density in the fluid gap should be maintained constant. The relative permeability of the MR fluid is far smaller than that of low-carbon-steelbased bobbin and flux return; consequently, the smaller fluid gap distance will be better. Practical gaps typically range from 0.25 to 2 mm for ease of manufacture and assembly. In this study, the gap is set to 0.5 mm. The other optimum parameters will be determined by using finite method with the help of ANSYS package. Due to structural symmetry, the MR valve is to be analyzed as a 2-D axisymmetric model, as shown in Figure 2. The main dimensions of the valve compose of Dcore, Lcore, Din, Dout, Lactive, Lretum and g.
Figure 1. Schematic of the proposed MR valve
Figure 2. Axisymmetrical model of the MR valve
478
2.2 Analytical Results By using ANSYS simulation, the effects of the saturation phenomenon both in steel path and fluid gap are studied. The effects of the design parameters are evaluated consequently. Throughout this analysis, an optimal model of the MR valve is introduced. One of the basic requirements for the optimal valve is that the flux density across the fluid gap can reach its maximum value at its operating point. The bobbin shaft diameter is the most sensitive design parameter limiting the magnetic performance. In Figure 3, the maximum magnetic flux density, Bmax, in the gap is plotted versus the core diameter, Dcore- It can be seen from this figure that when the core diameter is smaller than 14 mm, it is impossible for the MR fluid to reach its operating point. In other words, the maximum flux density cannot reach 0.8 Tesla no matter how big the coil current is. The reason for this is due to the saturation of the core. Figure 4 shows the trend of the magnetic flux density, Bmax, in the gap as a function of active core length, Lactive- It can be seen that at small active core lengths the flux density in the gap can reach and maintain its maximum value 0.8 Tesla until 3 mm before it decreases steadily with increasing active length. Decreasing the core length results in higher magnetic flux density in the fluid gap by reducing magnetic saturation in
Figure 3. Maximum magnetic flux density, Bmax, in the gap versus the bobbin shaft diameter, Dcore
Figure 4. The maximum flux density, B max, *1S a
function of active core length, Lacuve
the bobbin shaft. Considering all the influencing factors, the optimum MR valve is proposed and its dimensions are listed in Table 1. Table 1. Optimal dimensions of the MR valve
Item Dimension
Ucore
'-'core
Din
Dout
'-'active
J-return
g
mm
mm
mm
mm
mm
mm
mm
14
16
22
28
3
2.5
0.5
479
3 Valve Evaluation It is assumed that the MR fluid is incompressible and fluid inertial is assumed to be negligible, the governing equation for the laminar flow in the valve is given by: 24r]QLa '. i A active AP = (1) y
, -
—•
*~~
"*
MRF-132LD
1800 1600
_—*—-*——K
1400 * _ _ — K - — • * 1200 i
<
*"
" *
"
"
t,
A
*
*
*"Z -»*-"I^ -«-1.CBmp
^
1000 800
4
-*
•*
*
*
600 400 200 0
B -»—-*—-»—"" ^ ___-^—*———-" . -» * *
" ^ "~~*
B
*
Figure 5. Flow characteristics of the MR valve: pressure difference, AP, versus flow rate, Q
4
Conclusions
Considering the MR fluid properties and the magnetization curve for steel path, a maximum magnetic flux density in the fluid gap was achieved with optimized design and was verified with simulation. References 1. Gavin, H. P. Annular Poiseuille Flow of Electrorheological and Magnetorheological Materials. Journal of Rheology, 45 (2001), pp. 983-994. 2. http://www.ansys.com/webdocs/University/gettingstarted/tutorials/prin t_emag.htm.
COMPUTER DESIGN AND VISUALIZATION ON NEW LOOP WORM TRANSMISSION QINGSHENG LUO & BAOLING HAN Mechanical & Electrical Department, Shantou University Shantou City, Guangdong Province, PR. China E-mail:
[email protected]
This paper presents a methodology for parametrical design New Loop Worm Transmission (NLWT) and Visualization the design result, based on Computer Virtual Reality Technology. Considering NLWT's complicated structure and precise surface, the authors are analyzing NLWT by numerical value method and carrying out a parametrical design for NLWT. NLWT Design result will be modeled and visualized for designer's inspection and modification. In this paper, All parameters of analysis NLWT and a procedure of design NLWT are also proposed for further research.
1
Introduction
Contrasting to Common Column Worm Transmission (CCWT), Loop Worm Transmission (LWT) possesses some virtues, such as its transmission capability is good and its tempo is fast, therefore, LWT is called New Loop Worm Transmission (NLWT). In engineering, NLWT has much strongpoint, for instance, in Single Bundle Loop Worm Transmission (SBLWT), the tool used to machining worm can be greatly predigested, and the tool interchangeability is very good, thus, it is applicable to the single piece and small batch production. In Double Bundle Loop Worm Transmission (DBLWT), if the proper design and manufacture may be down, the excessive tooth mesh and the double contacting route contact will be realized, and the angle between the contacting route and the relatively sleek velocity direction is able to reach 60° ~ 90° currently. Owing to the inducement curvature radius between tooth surface is large, thus, NLWT can greatly improve the carrying capacity of Worm Transmission (WT). The practice proves that different DBLWT having various kings of tooth forms possesses very impressive carrying capacity, it is one of WT possessing excellent capability, its carrying capacity is about 2 ~ 3 times the carrying capacity of CCWT. So, NLWT is now finding more and more extensive application in the correlation engineering fields[l]. 2 Parameter design & entity modeling According to the knowledge of Graphics, it is known that the outline of Loop Worm is formed by a concave arc generating line rolling the worm axes to turn, and its tooth surface is a track face that takes a beeline or a curve as a generating line to be formed, and or its tooth surface is a bundle face that takes a plane or a camber as generating line to be formed. Fig. 1 is the principle sketch map in which a beeline is taken as a generating line to form a Loop Worm. In Fig.l, the plane P passes the axes OlOl, and rolls it for turning at the same angular velocity. At the same time, in plane P, there is a straight generating line u—u, it is tangent with the circularity which takes rb as its radius, and the generating line u—u rolls the point 02 for turning at the same angular velocity. Therefore, the straight generating line u—u 480
481
Fig.l a Straight Loop Worm forming sketch map
Fig.2 a Plane Bundle Worm forming sketch map
forms a trace camber in space, it is the screw tooth surface of the Straight Loop Worm, and it is a frank grain bend surface which could not be opened up [1]. If the straight generating line is replaced by the plane E (3 ), something shown as Fig.2 will appear, the plane E ( 3 ) can bundle out a tooth surface of Loop Worm when it turns in above rule. Such worm is called the Plane Bundle Worm, it is the bundle bend surface of a plane family, and it is a frank grain bend surface which could be opened up. If the straight generating line is replaced by involute helicoids, thus, the Involute Bundle Worm will be gained, and its tooth surface is a bundle bend surface of the involute helicoids family. Therefore, LWT can be fallen into the Straight Loop Worm Transmission (SLWT) and the Plane Bundle Loop Worm Transmission (PBLWT) and the Involute Bundle Loop Worm Transmission (IBLWT) according to the generating line or generating line which forms the Worm curve. In Graphics, LWT can be fallen into SBLWT and DBLWT according to the formative principle of tooth surface of worm and worm wheel. In SBLWT, worm wheel is a column gear, and the worm wheel tooth surface is the bundle surface of the column gear. When the worm wheel tooth surface is a plane, it gains the Plane Single Bundle Loop Worm Transmission, and when the worm wheel tooth surface is a involute helicoids, it gains the Involute Single Bundle Loop Worm Transmission. For DBLWT, because its worm wheel tooth surface is formed by bundling of the worm tooth surface, therefore, it is called the double bundle. In practice, PBLWT generally refers to the Plane Once Bundle Loop Worm Transmission (POBLWT) and the Plane Bis Bundle Loop Worm Transmission (PBBLWT). In POBLWT, there is the difference of the Straight Tooth Plane Bundle Loop Worm Transmission and the Tilted Tooth Plane Bundle Loop Worm Transmission. The Beeline Loop Worm Transmission and PBBLWT are the multi-tooth contact and the double touching line contact, thus, it enlarges the touching area of the tooth surface, and improves the forming conditions of the oil film, and increases the relative curvature radius between the tooth surfaces, it is the reason that the drive efficiency of NLWT possesses is rather high and the carrying capacity of NLWT possesses is very strong. Although POBLWT is the single touching line contact, it possesses the virtue such as the multi-tooth contact, therefore, its drive efficiency and its carrying capacity are greatly exceed CCWT. Simultaneity, PBLWT is more easy to realize the accurate machining which completely accords to its mash principle, and the computer entity modeling technique can be used to conduct a dummy design for it, these will create the conditions to the extend mobilization of NLWT.
482
Passing the analysis on forming principle and structure specialty of New Loop Worm, it can find that the tooth shape and the lead angle of loop worm are the key factors for conducting computer entity modeling, so we should put high recognition and labor on it, and the basic point of analyzing and disposing should expand on particular parameters. According to the transmission conditions, it is known: worm transmission power P=7.2KW, worm rotate speed n=1452r/m, transmission rate i=37, and the work and start of this gearing are very frequently, so, according to the handbook[2], we confirm the interrelated design parameters as follow: 1. worm material taking 40Cr, to adopt adjusting matter management, surface rigidity HB=250~300; worm wheel material choosing ZQSnlO-1, to adopt sand casting, precision taking 8 class. 2. according to Pc=P/(KlxK2xK3xK4)s£[P' ], checking the table to gain the various coefficients, then, we know: Pc=7.2/(lxl.06x0.8xl)=8.5kw, checking the table again, we can inform the center distance a= 150mm. 3. adopting worm thread number Z l = l , thus, according to the request of transmission rate, we know that the worm wheel tooth number 72=31. In order to accomplishing the three-dimension entity modeling of NLWT, we have used some aided design and aided modeling software, they are 3DS max 4.0 and AutoCAD 2000. Relatively speaking, the third dimension and texture sense and reality sense of the entities adopting 3DS max 4.0 software for modeling are very good, and the veracity and accuracy and cooperation of the entities adopting Autocad2000 software for modeling are very good, thus, we use 3DS max 4.0 and Autocad2000 in the fashion of interlude using, and then we bring the function of the software into play greatly. The entity modeling of NLWT may be divided into two parts, one is the modeling of loop worm, and the other is the modeling of loop worm wheel qualified with loop worm. In the modeling process of loop worm, it is the key step for us that the modeling of the loop worm axes section is or not nicety. Thus, we specially adopt Autocad2000 software to draw the loop worm axes section and to construct the loop worm screw lines, its particular process shows as follows: 1. Choosing the screw line function of the establishing panellil, and then protracting a screw line, its parameters shows as follows: height equals 85mm, ring number equals 4.58, radius equals 39.89 mm, and its screw direction is in clockwise. 2. Adjusting the configuration shape of the screw, and making it shown as Fig.3. Its particular parameters is that the length of the above line and the below line is equal to 89.6 mm, and the radius of the left arc and the right arc is equal to 121.47 mm0 3. Using the free changing function of cylinder in the modify paneffl, we set the number of reference points to equal 6 X 1 2 X 7 , and then we conduct the screw modeling in term of the thread frame shown as Fig.3, the tool that we used in this process is the geometric proportion zoom. 4. It needs to adopt the shape sampling means for modeling the loop worm tooth, this operation exerts serious influence to the modeling result. Because of the screw created in the above process is constructed in order to the graduation round track of loop worm tooth, so, we must do some works in advance of the time when we conduct the shape sampling to the tooth. One of these works is to put the center of
483
Fig.3 creating process sketch of loop worm axes section
Fig.4 sampling sketch map of loop worm screw
Fig.5 tooth shape sampling sketch map of loop worm
tooth cutaway view on the tooth graduation round, and makes it locate in the center axes of the tooth, and ensures anyone coordinate of the tooth to coincide with the center axes. Another one of these works is using the get shape function to conduct the tooth shape sampling in choosing the screw that has been modeled (shown as Fig.4) . 5. Here, we may find that the practical entity modeling effect gained in term of above steps differs to the expectant entity modeling effect, so, we must modify the shape sampling entity, the method is to turn the shape sampling entity, and we can gradually modify to the control line till getting the needed effect. The essential of operation is to make the adjusted degree of the two successive control points as the integral times of 45, its adjusting result shows as Fig.5: 6. After modeling of the loop worm tooth, we can model the else parts of loop worm according to the practical needs of the gearing. Owing to its modeling is very easy, so, the particular modeling steps do not any longer narrate. In 3DS max4.0 software, we use the Boolean operation to compose the various parts of loop worm into a whole, its result is shown as Fig.6. 7. Finally, we conduct the material editor to the loop worm that has been modeled in 3DS max4.0software, and then we can gain the modeling effect shown as Fig.7, which can be used for transforming of the later numerical control tool track and for testing of the entity modeling effect. Generally speaking, only when loop worm and worm wheel realize exact cooperation, the transmission may be ensured to go with a swing. Same as loop worm, the modeling of worm wheel possesses its particularity too, and the tooth of worm wheel shows itself as arc shape (see Fig.8), in this way, worm wheel can realize a exact mesh transmission with loop worm. The modeling process of worm wheel is introduced as follows:
HH Fig.6 loop worm entity modeling sketch map
• i wssm
Fig.7 loop worm entity modeling effect map
Fig.8 worm wheel tooth sketch map
1. We use AutoCAD 2000 software to accurately draw the radial section plane of the worm wheel tooth according to the design data. 2. We draw the sampling path of the worm wheel tooth. The method is drawing a round whose radius equals 22.45 ™ firstly, and we use a rectangle whose width equals 40 mm to cut a arc as sampling path secondly, and then we sampling to the tooth of worm wheel radial section plane once again, so, we can get a tooth of worm
484
Fig.9 the worm wheel tooth sampling sketch map
Fig. 10 the worm wheel tooth ring sampling sketch map
Fig. 11 the worm wheel entity whole sampling sketch map
wheel in this way. But it does not a final tooth, the reason is worm wheel having a thread angle (6° 51' 54" )„ Therefore, we can use the revolving knob HI to make the tooth rotating round its axes for 6° 51' 54" , its effect likes the white tooth shown in Fig.9: 3. We move the part plotting axes of the white tooth to the center position of Fig.9, its operating step is using the axes in the level panel H to attack the adjusting axes, and making it effective only on axes, then, we use the knob H to adjust the axes to appropriate position, at this time, we press the shift knob to let the tooth rotating 9.729° round its axes. 4. We copy the tooth of worm wheel which has been modeled for 36 shares firstly, and then, we make them arrange in term of the rule shown as Fig.9. The particular step is to choose the copy function in menu, copy number chooses 36 (because of oneself has a tooth, and worm wheel has 37 teeth in all) . Owing to each tooth of worm wheel is independent, we can put them together to copy for convenience. 5. Finishing above steps, the modeling of worm wheel tooth ring has fulfilled, relatively speaking, the later entity modeling and the material editor are quite simple, so, we do not expatiate any more. The modeling sketch maps of worm wheel tooth ring and worm wheel entity are shown as Fig. 10 and Fig. 11 respectively. 3 Epilogue NLWT has huge value of extending use in modern industry field, but its part structure is too complex and its machining technology is too vexed, all these restrict the popular application of NLWT in certain extent. Passing the computer graphics researching on NLWT, we complete the parameter design and entity modeling of NLWT, and lay a foundation for the following numerical control machining and the tool track translating, and explore some means and skills to use computer virtual reality technology conducting the entity modeling of complex transmission parts. REFERENCES 1. Fu Shaoze.((New Worm Transmission)).Shanxi Science and Technology Publishing Company. 1990 2. ((Mechanism Design Handbook)) .Chemistry Industry Publishing Company. 1992
A COMPUTER-AIDED OPTIMISATION APPROACH FOR THE DESIGN OF COOLING CHANNELS AND SELECTION OF PROCESS PARAMETERS IN PLASTIC INJECTION MOULDING L.Y. ZHAI, Y.C. LAM, K. TAI, AND S.C. FOK School of Mechanical and Production Engineering, Nanyang Technological Avenue, Singapore 639798 E-mail: [email protected]
University, 50 Nanyang
Cooling process in plastic injection moulding has a direct impact on both productivity and product quality. This paper integrates an evolutionary algorithm with CAE (Computer-Aided Engineering) technology to launch a computerised system that can guide the design of cooling channels and selection of process parameters. A genetic algorithm (GA) based optimisation system has been developed as an external routine to the suite of MOLDFLOW® programme-a commercial CAE software package for injection moulding simulation. The objective of the optimisation is to achieve uniform cavity surface temperature so as to ensure product quality.
1
Introduction
The cooling process in plastic injection moulding plays a crucial role in determining both the productivity and the quality of an injection moulded part. Productivity prefers fast cooling whereas product quality requires uniform temperature distribution. Hence, the overall requirements for cooling a part will always be a compromise between uniform cooling to assure part quality and fast cooling to minimise part cost. In cooling system design, design variables typically include the size and location of cooling channels, temperature and flow rate of the coolant, as well as the packing time, clamp opening time, and cooling time, etc. With many parameters involved, the determination of the optimum cooling system design is difficult. A tool integrating the cooling simulation and optimisation techniques into the design process will be helpful. There have been many optimisation techniques successfully used in various engineering applications, including conventional techniques such as the classic gradient-based optimisation methods [1] and the recent stochastic techniques such as the simulated annealing algorithm. However, most of these methods show various drawbacks when applied in cooling design optimisation [2]. In contrast, genetic algorithms (GAs) [3] show great advantages when handling cooling optimisation problem due to its inherence-parallel population-based search engine. GAs have been widely used in various engineering optimisations including injection moulding [2,4]. However, hitherto there is no literature on integrating GAs with mould cooling simulation software package to achieve the optimisation of cooling system design. 485
486
2
Optimisation system
Unlike most existing investigations, this study focuses on the development of the optimisation system instead of the algorithm for the calculation of heat transfer using finite (or boundary) element analysis [5]. This has the advantage of easy integration with the prevailing commercial software with well-developed cooling simulation technology. The framework for mould cooling optimisation is constructed as shown in Figure 1. As revealed by some researchers [6], genetic algorithms, in general, could locate the near global optimum (or a group of near global optimums) instead of the exact unique optimal solution, especially in problems with multiple variables and a large search space. As such, the well-known Hooke and Jeeves [7] search method is adopted for the refined search to zero in on the exact optimal solution following the GA search. Cstart J Initial generation of chromosomes MOLDFLOW® cooling analysis for each chromosome Constraints evaluation
t
Standard deviation of cavity surface temperature for each chromosome
Crossover, mutation
'
4
'
Selection of good chromosomes +
>
r
Refined search ( E n d ) Figure 1. Framework of the prototype system
3
A case study
Figure 2 shows the part and mould investigated. Design variables are shown in Table 1. For ease and simplicity, these variables take only integer values. Each chromosome is represented as an integer string and single site crossover is adopted. Mutation is implemented by replacing the selected gene with an integer randomly generated. This real-coded chromosome representation has the advantages of faster, more consistent from run to run, higher precision and intuitive [8].
487
I Figure 2. Plastic part and mould with 3 cooling channels
As mentioned earlier, the standard deviation of cavity surface temperature is chosen as the objective function. In finite element model, the cavity surface temperature is represented by element temperatures and MOLDFLOW® 3D cooling analysis divides the cavity surface into top and bottom surfaces. As such, the objective function can be mathematically defined as:
£A(7;(x)-r(x))2
/(x) = "Top
""" "-Bottom
(1)
i'=l
where x is the design variables vector; N is the total number of elements; Tt (i = 1,2, ..., 2A0 is the element temperature; ATop and ABottom are the areas of the top and bottom cavity surfaces respectively; and At is the area of the element. The average cavity surface temperature is defined as: _ i IN 2
n*)=———iv, oo "Top
T
"-Bottom
()
i=l
The constraints are limited to geometric constraints and value ranges of process parameters. Geometric constraints are the minimum distance for the spacing between the cooling channels, and the spacing between each channel and the mould boundary (including the boundary of cavity surface). If a chromosome represents a design not satisfying the constraints, it will be 'repaired' by replacing the gene(s) with the nearest allowable value(s). Value ranges of design variables are summarised in Table 1. Table 1. Design variables and their value ranges
Design Variable X Co-ordinate (mm) Y Co-ordinate (mm) Z Co-ordinate (mm) Cooling channel diameter (mm) Packing time (s) Clamp open time (s) Cooling time (s) Circuit inlet temperature (°C) Circuit flow rate (L/Min)
Value Range (integer) [-180,180] subject to constraint [-130,130] subject to constraint [-200,-20] subject to constraint [4,10] [5,15] [4,8] [10,20] [20,40]
[3,8]
488
Based on some preliminary trials, the population size of GA was set at 60 and crossover and mutation rates were set at 0.85 and 0.25 respectively. The genetic algorithm converged well within 110 generations. With 4 subsequent refined search iterations, the final optimal solution was found (Table 2) with the standard deviation of the cavity surface temperature at 4.33°C. Table 2. Optimal design of cooling channels and process parameters
Design variable Location of channel 1 (mm) Location of channel 2 (mm) Location of channel 3 (mm) Inlet temperatures (°C) Flow rates (L/Min) Channel diameters (mm) Clamp open time(COT), Cooling time(CT), and Packing time(PKT) (second)
Optimal design xi=-115,yi=-90,zi=-85 x2=-33,y2=-25,z2=-47 X3=55,y3=-29,z3=-90,X4=-55,z4=-200 T,=40,T2=39,T3=40 Ri=5,R2=7,R3=8 di=6,d2=10,d3=6 COT=8, CT=20, PKT=15
Figure 3 is the temperature distribution contour of the top and bottom cavity surfaces. The optimisation procedure was repeated 10 times with different initial populations and the optimal solutions found were identical, which verified that the optimal cooling system design obtained was reliable. OPTEMPEBATUR! 3O.i70
to
n.m '
50,410
:)RE[cleg.C)
to
mmy
J A
* '—
51.139
49.554 •£•
56.565 « i 1
m l
70.475
I
70.589
Figure 3. Temperature contours of the top and bottom cavity surfaces
4
Discussion and conclusion
Cooling system optimisation is basically a non-convex optimisation problem. Traditional optimisation methods are likely to be trapped in a local optimum. The approach proposed in this study is reliable in locating the global optimal solution. It successfully utilised the advantages of GA in optimisation and CAE for cooling simulation.
489
5
Acknowledgement
This project is supported financially by Moldflow Corporation and the Academic Research Fund, Ministry of Education, Singapore. The authors are grateful for the stimulating discussion with Mr. Peter Kennedy and Mr. David Astbury of Moldflow Corporation. References 1. Lam Y.C. and Seow L.W. Cavity balance for plastic injection moulding. Polymer Engineering and Science. 40(6)(2000): pp. 12731280. 2. Ye H and Wang K.K. Optimization of injection-molding process with genetic algorithms. In Proceedings of Annual Technical Conference, Society of Plastics Engineer-5(1999), pp.594-599. 3. Goldberg D.E. Genetic algorithms in search, optimisation and machine learning (Reading, Mass.: Addison-Wesley Pub. Co. 1989). 4. Kim S.J., Lee K. and Kim Y.I. Optimisation of injection moulding conditions using genetic algorithm. In Proceedings of SPIE - The International Society for Optical Engineering. Vol.2644(1996): pp. 173-180. 5. Park S.J. and Kwon T.H. Optimal cooling system design for the injection moulding process. Polymer Engineering and Science. 38(9) (1998):pp.l450-1462. 6. Young W.B. Gate location optimization in liquid composite molding using genetic algorithms. Journal of Composite Materials. 28(12)(1994):pp.l098-1113. 7. Hooke R. and Jeeves T.A. Direct search solution of numerical and statistical problems. Journal of Association for Computing Machinery. 8(1961): pp. 212-229. 8. Janikow C.Z. and Michalewicz Z. An experimental comparison of binary and floating point representations in genetic algorithms. In Proceedings of the 4' International Conference on Genetic Algorithms (1991) pp. 31-36.
WAVELETS-BASED MULTIRESOLUTION REPRESENTATION AND MANIPULATION OF CLOSED B-SPLINE CURVES GANG ZHAO, SHUHONG XU, WEISHI LI Institute of High Performance Computing, 1 Science Park Road, #01-01 The Singapore Science Park II, Singapore 117528 [zhaog, xush,
Capricorn,
liws]@ihpc.a-star.edu.sg
XINXIONG ZHU Beijing University of Aeronautics
& Astronautics, 37Xue Yuan Road, Hai Dian Beijing, P. R. C. 100083
District,
[email protected] Multiresolution curve representation, based on wavelets, provides more flexible methods for curve editing at different resolution levels, curve smoothing and curve data compressing. It requires no extra storage beyond that of the original control points. The closed B-Spline curve is a special type of B-Spline curves. A cubic closed curve with C2 continuity needs special processing at its boundaries when wavelets are applied to decompose or reconstruct it. This paper introduces, from the point of geometry view, the principles and methods of wavelets-based multiresolution representation of C2 cubic closed B-Spline curves. An extended method of multiresolution manipulation for closed BSpline curves is also presented.
1
Introduction
Multiresolution curve representation, based on wavelets, provides more flexible methods for curve editing at different resolution levels, curve smoothing and curve data compression [1]. The conventional construction of wavelets takes place on the whole real line, but applications in computer graphics require wavelets on a bounded interval. This problem can be solved by: (l)setting the data to be zero outside the interval; (2)making the given data periodic; (3)using reflection at the boundaries; (4)constructing special boundary elements [2]. The closed B-spline curve is a special type of B-spline curves. A ^-degree closed B-spline curve with Cr~ continuity has the different presentation with the unclosed curve. It needs special processing at its boundaries when wavelets are applied to decompose or reconstruct it. In this paper, we adopt repetition approach for the Multiresolution Representation Analysis (MRA) of C2 cubic closed B-Spline curves. An extended method of multiresolution editing for closed B-Spline curves is also presented. 2
Multiresolution representation of closed B-spline curves
2.1 B-spline scaling functions and wavelet functions of closed B-spline curves A fc-degree unclosed B-spline curve y(u) with 2L + k control points C/1 (i = 0,1,..., L 2 +k-l) is described as follows:
7(«) = 2 f c / % ( « ) , «e[0,l] i=0
490
(1)
491 where B^k(u) (i = 0,1,..., 2L+k-l) are the B-spline basis functions defined on knot vector [u0, «i, ..., u2L+2k ]. The functions B^k(u) ,...,B2\+k_lk(u)
form the basis for the
L
space V . The level L represents how many times the vector space, knot vector in this case, can be subdivided. To use the united formula as the unclosed B-spline curve, Shi [5] presented a simple method to process control points of the ^-degree closed B-spline curve with C*"1 continuity at its boundaries: C 2 V M _, + r + ,.=Ct,; = 0 , l , . . . , * - r (2) In general wavelet transform theory, we denote 6 as the scaling function, 0 as the wavelet function. Then with (1) and (2), the cubic closed uniform B-spline curve with C2 continuity at its boundaries can be described as (3), 2L-\
y(«)=£C ( L 0, L ( M ), u s [0,1], where 0,L («) = B& (II) + B^L
3 („)
(3)
(i = 0,1,2), 6f{u) = B&u), (i= 3,4,...,2L-1).
Wavelets offer an L level hierarchical basis for the space V'. According to [2] [3] [4], B-spline basis functions dL at level L can be transferred as two-part basis functions [OL~\0L~l ] at level L-l, there are the following relationships between them,
Of\u)=
^ H A ' W
(4)
k=2j-2 2j+6
0/~V>= X gk-ifiiW
(5)
k=lj-A
wi
m
0^{u)= IV 2 t Or'( M )+ X^_ 2i 0 t "(«)
(6)
where the sequence h, g, h and g are given in the appendix. Now the C2 cubic clc closed B-Spline curve can be represented with respect to the twopart basis functions [ O i_1
y(«) = 'Scf-'O," (II) + 2 XV"'0 " («) i=o
(7)
;=o
2.2 Wavelet on a bounded interval The construction of B-spline wavelets described by (4), (5) and (6) takes place on the whole real line, the index j goes from -~,..., +«>. For computer graphics, however, we are interested in problems confined to an interval. In general, there are five difference approaches for an MRA on a bounded interval: spatial windowing, reflection, repetition, multiple knots and <^-> B-spline control points: C(f Gram-Schmidt [4]. I I I I Because the C2 ...670 1 2 3 4 5 670 1 2 3 4 5 6 7 0 1 2 3 4 5 670 123... cubic closed B-Spline I I interval I I curve is periodic, and in Wavelet control points: Z)0L />£_, order to get an efficient I I I I and elegant computation ...670 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3... scheme, we adopt the I I interval | | repetition method for its . j _ Repetition method for closed B-spline curves Fig
492 MRA on an interval. The extension pattern of B-spline and wavelets control points is shown in Fig. 1. 2.3 Mutiresolution representation of closed B-spline curves With (3), (6) and (7), the control points needed to express the curve in the two-part basis [ 6Li , 0 L'x ] can be found by using the following formula 2/+5_
XV 2 A L
cr=
(8)
t=2i-5 2i+3
(9)
D k=2i-\
The process of splitting the control points C[ into a low-resolution version C,L~' and detail £>/•"' is called decomposition. Alternatively, with (3), (4), (5) and (7), the control points with respect to the original B-spline basis Ol can be found with
ckL = X**
(10)
+ Is k-2i i u
Recovering Cf from C,L_1 and D,t_1 is called reconstruction. If the decomposition procedure of C\ is applied recursively C,i_1, the original curve y(u) can be expressed as a hierarchy of lower-resolution y'(u) and details /3'(«).i.e.
r(«) = y"(«)+/J"(«) =rL-2M+pL-2M+pL-\u)=-- = y0M+p0(u)+--+iiL-l(u) wherey0(u),p°(u) , ..., /JL"'(«)is called multiresolution representation of y(«), y°(«), / ' ( M ) , . . . , / 4 - 1 ^ ) is the multiresolution approximation of y(«) at different resolution levels. The wavelet basis with the corresponding multiresolution representation y°(«),)30(j<) , ...,pL~l{u) can be described as follows \p\ Fig.2 described the multiresolution approximation of the C2 cubic closed uniform B-spline curve y(u) witn il control points. 3
0'], (0
(a)
(b)
(11)
(c)
(d)
Fig.2. Multiresolution representstion of closed B-spline: (a) Original curve with 32 control points; (b),(c),(d) Low-resolution curve at level 4,3,2.
Multiresolution manipulation of closed B-spline curves
The wavelet representation allows the user to specify whether local detail changes are desired, or whether broader changes to the entire sweep of the object are intended. Fig.3 illustrates the effects of the original curve at the full
(a) (b) (c) Fig.3 Editing control points at different level
493
resolution when modify the B-spline control points at the different resolution level. Fig.4 shows an example for changing the overall form of the curve while preserving its details.
(a) (b) (c) (d) Fig.4 Changing the overall form of the curve while preserving its details Fig.5 shows an example for changing the curve's character without affecting its overall sweep.
(a) (b) (c) Fig.5 Changing the curve's character without affecting its overall sweep 4
Conclusions
This paper describes a multiresolution representation for C cubic closed B-spline curves, and extends the multiresolution manipulation method to it. Although the method is illustrated with the special case of C2 cubic closed B-Spline curves, it is suitable for the case offc-degreeclosed B-spline curves with C^"1 continuity. References 1 2
3 4
5
Finkelstein, A., Salesin, D., H., Multiresolution Curves, Computer Graphics (Proceedings of SIGGRAPH' 94), Vol.28, 1994, pp. 261-268 Chui, C. K. and Quak, E., Wavelets on a bounded interval, in "Numerical Methods of Approximation Theory, Vol.9" (D. Braess and L. L. Schumaker, Editors), pp.5375, Birkhauser Verlag, Basel, 1992 Cohen, A., Daubechies, I., and Vial, P., Wavelets on the Interval and Fast Wavelets Transforms, Applied and Computational Harmonic Analysis, Vol.1, 1993, pp.54-81 Gortler, S. J. and Cohen, M. F., Hierarchical and Variational Geometric Modeling with Wavelets, in Proceedings of the 1995 Symposium on Interactive 3D Graphics , ACM, New York, May 1995, pp.35-42 Shi, F.Z., "CAGD&NURBS," High Education Press, China, 2001
Appendix
A PIPING MODELING AND CALCULATION SYSTEM GAO HUIXING Senior Lecturer Ngee Ann Polytechnic 535 dementi Road Singapore 599489 Email: [email protected]
Piping design is an important but tedious job in engineering industries. Computer-aided design is an effective way to raise the productivity and quality. The author has developed a PC-AutoCAD based graphic piping CAD/CAM system called GPIPE and it has been applied in Singapore industries for many years. The functions of this software cover 3D modeling of piping and equipment, clash checking, various pipe production parameter calculations and documentation. The detailed functions of this system and how these functions were realized will be described in this paper.
Keywords 3D Model A 3D model is a description of a three dimensional object created by computer software that can be studied as though it really existed in space. GPIPE A graphic piping CAD/CAM (Computer-Aided Design/Computer-Aided Manufacture) system developed by Ngee Ann Polytechnic. AutoCAD It is one of the most widely used CAD and drafting software products on the market. AutoDesk Inc, USA, develops it. ARX The AutoCAD Runtime Extension is a compiled-language programming environment for developing AutoCAD applications. ARX includes C++ libraries for developer to create AutoCAD external applications that operate just like native AutoCAD commands.
1. Introduction Piping systems are an important integral part of any engineering, such as marine, oil processing and civil. All pipes must go through three stages: design, fabrication and installation. The main tasks of design stage are: arrangement, calculation and drafting. It is a very complicated work to reasonably arrange all pipes into a limited space without any interference with structural members, equipment and other pipes. After arrangement, the layout drawings of piping systems are the basis for further production design in which calculation of fabrication parameters and drafting various kinds of drawings for fabrication and installation will be performed. In recent 10 years, AutoCAD is widely used in industries. But most engineers are still doing the traditional way that uses one or several 2D views to represent 3D objects. Because the views are created independently, the possibility of error and ambiguity is inevitable. To improve your design and therefore to get information for calculation, we should create true 3D model instead of 2D drawings. Based on this requirement, 3-dimensional
494
495
AutoCAD-based graphic piping CAD/CAM system named GPIPE has been developed and applied in Singapore industries. 2. Functions of GPIPE GPIPE is an integrated graphic piping CAD/CAM software system built up to meet the needs of piping design in various engineering. The system covers the entire processes from pipe layout to fabrication and installation. 2.1 3D Modeling of Equipment and Piping In various engineering design area, there are a lot of interactions between the different design disciplines with these activities happening concurrently. It is essential to build a product model using 3D CAD system for simulation, virtual reality and for providing production information. By using the powerful AutoCAD graphic capability, GPIPE system can do a lot of sophisticated interactive operations to create 3D model. - Machinery, equipment, major structural members A parametric definition is adapted. There is a standard equipment library defined in GPIPE. Users need only to key in some key values. A 3D model of such equipment is created. Users can also use AutoCAD commands to construct 3D object of non-standard equipment, GPIPE can add pipe inlet and outlet on it and convert it to be a GPIPE 3D model of equipment. - Pipe and components User can select any plan view or isometric view to make pipe and component 3D model. There are a lot of interactive operations to facilitate the pipe modeling, including add, delete, move, query, etc. 2.2 Calculations The information of a 3D product model is stored in database. GPIPE can pick the information for various calculations. - Interference checking This is to check the interference among pipes and pipe with equipment. The pipe crashed with other pipe or equipment will be displayed in white color, so that users can modify it. - Pipe piece calculation This is to calculate each pipe piece and produce a pipe piece drawing for fabrication and installation. The calculations include developed length, feeding length, bending angle and rotation angle of each segment.
496
-BOM The bill of material of various pipe systems is automatically generated in GPIPE. It is provided for costing, procurement and production preparation. - Pipe nesting It indicates how to reasonably distribute a raw pipe for making several pieces in order to save pipe material. 3. DEVELOPMENT CONSIDERATIONS When an application CAD/CAM system is planned to be developed, some factors should be considered. The following factors have been considered in developing GPIPE system: 3.1 Environment Hardware and supporting software are the most important factors when application software is developed. Considering the fact that AutoCAD is widely used in almost all Singapore companies, AutoCAD should be the basic graphic tool for piping design. There is a powerful C language interface called AutoCAD Runtime Extension (ARX) [1] in AutoCAD. C++ language combined with ARX is the programming language of GPIPE. 3.2 Sophisticated Data Structure GPIPE has its own sophisticated data structure. There are two main databases in the system. One is the piping part library, in which various pipe fittings and valves are stored. Another is a pipe route database, in which pipe specifications and route data are stored. All pipes are organized as many groups. The pipes within the same fitting zone, in the same system, with the same material and water testing pressure can be merged in a group called a batch. User can take a batch or several batches for process in a design session. For mathematical describing and memorizing the position of a pipe, various pipe nodes should be defined and stored. 3.3 User-friendly Interface As application software, user interface is a very important factor. All operations in GPIPE are menu-driven. There is a main menu which controls the running of whole system. In the AutoCAD there is a menu file called acad.mnu. A sub-tree menu of GPIPE has been inserted into this file, which forms a customized AutoCAD menu. User can pick any AutoCAD menu items as usual for normal graphic operation. If users click the GPIPE item, a GPIPE pull-down menu and/or GPIPE menu tree can be displayed.
497
After clicking a pipe menu item, an ARX application program starts. Some AutoCAD-like dialogue boxes will be displayed for selecting operation and/or inputting data. The dialogue boxes can be grouped as Radio Button, Edit Box, Image Button and List Box. To define these dialogue boxes, AutoCAD Dialogue Control Language (DCL) [2] is used. 3.4 Graphic Process Although GPIPE is running in AutoCAD, most of normal AutoCAD drawing commands such as LINE, ARC cannot be used directly for pipe modeling. It is because these commands can only generate a geometry object. When users click this geometry, AutoCAD can only tell you what and where it is. A special data structure should be adapted for creating the relationship between a geometry and pipe data. Each pipe piece is drawn as an AutoCAD block which has a unique name. A pipe data file is created when a pipe drawing is drawn. All pipe information including specification, position and the block name are written in this file. During the interactive operation, when a pipe (actually a block) is clicked, ARX will return the block name. Thus a relationship between the pipe and the data file is established. 4. CONCLUSION Piping systems are an important integral part of any engineering. Their design, construction and operation reflect on the quality of work done by a company. To improve the efficiency and productivity of pipe design, fabrication and installation, a graphic piping CAD/CAM system GPIPE has been developed. This system has been applied in many Singapore companies. It has been clearly verified through practices that the GPIPE has offered an agreeable service in the increasing of design efficiency, improvement of quality, reduction of the work in piping design, fabrication and installation, therefore promotion of production in the related industry. Efforts should be put in the future to keep up with the latest development of computer technology, and to expand the application of GPIPE to more industries. REFERENCES 1. Charles Mc Auley, Programming AutoCAD 2000 Using ObjectARX, Thomson Learning, USA, 2000 2. Sham Tickoo, Customizing AutoCAD R14, Chapter 14, Autodesk Press, USA, 1998
OPTIMIZATION OF INJECTION MOLDED PART BASED ON THE CAE SIMULATION LIYAN Institute of High Performance Computing,! Science Park Road, Singapore 117528 E-mail: yanli@ ihpc.a-star.edu.sg
#01-01,
CHUNRONG PAN Mechanical & Electrical Engineering Department, Shantou Shantou City, Guangdong Province, PR. China 515063 E-mail: [email protected]
University
This paper presents a systematic approach using injection molding simulation analysis to identify the root caused of an injection molding problem and optimise the plastic product. Advanced commercial Computer-Aided Engineering (CAE) molding simulation software (MOLDFLOW), which can be used to analyse injection molding simulation, cooling simulation and warpage simulation, is a very powerful tool for engineer to evaluate the plastic part at an early design stage. In this paper, some factors which affect the quality of injection molded part will be described, and an optimised methodology has been developed to improve injection molded part quality based on CAE simulation.
1.
Introduction
Injection molding is the most widely used plastic process to make large quantities of parts with geometry versatility and cost effectiveness. However, many parts manufactured by injection molding suffer from a wide range of defects [1]. The defects may include warpage, black streak, blister, blush, bubbles, burn mark, flash, poor weld lines, short shots, etc. In the past, injection molding problems were addressed through a conventional trial-and-error process which used different materials, part designs, and modifications and through a Design of Experiments (DOE) which was run with different molding process parameters. This approach is very expensive and time-consuming, and the end solution may compromise part quality [2]. It is very hard to use this approach to obtain optimum part quality. With advanced CAE technology, such as injection molding simulation, cooling simulation and warpage analysis, it is now possible to evaluate part design, mold design, and the injection molding machine process parameters long before a mold is manufactured [3]. By interpreting the injection simulation results, people can gain a much better understanding of the inside causes of injection part defeat and get an optimised one. Based on different function of injection molded part, the main criterion for their quality is different. Some parts, such as coverings and toys, only need the good-looking appearance. Even a wide tolerance can be acceptable. Other industrial parts, such as connectors, which need to assembly with other parts exactly, must hold a very tight tolerance. This paper presents a systematic approach using injection molding simulation analysis to identify the root caused of an injection molding problem and optimise the plastic product.
498
499
2.
Approach
The basic model makes the following assumptions: Figure 1 shows the part geometry for Connector. Figure 2 is its mid-surface model.
• . • • \ ^
•••'.«
•XJ Fig 1: Connector Part
Fig 2: Mid-surface Model
>
'
j f
: • %.ir
Fig 3: Mesh Model
Material: the material used for the part was LCP Zenite 6130L Black Mold: The part was molded in a four-cavity mold with a sub-marine gate. Normal processing conditions for part: Injection time = 0.2s; Hold time = 1.00s; Cool time = 4.00s; Melt Temp = 330* C; Mold Temp = 60* C. Simulation Software: MoldFlow Insight 2.0 Simulation Models: A mid-surface finite element model generated by MoldFlow (see Figure 3). Then the cooling channels and runner system were built in so as to run a complete shrinkage and warpage analysis.
The key quality factor of connector is to hold very tight tolerances. To meet the functional requirement, the overall warpage must control in 0.5mm along in length direction. After initial simulation under the above parameter, its result showed the distortion is approximate 1.42mm (see Fig 4), similar to the one observed on the real part. Generally, the warpage is represented by nodal displacements from the cavity dimension upon ejection in the use of warpage simulation. It is formally represented as F(Xi)= aX+bY+cZ (1) Where Xi=[Xl,....,Xn]; n is the number of design variables to be considered; a, b, and c is weighting factor for X, Y, and Z, respectively; and X= an additive function of a maximum displacement, Y= an average of top 10 percentile displacements, Z= an overall average displacement Warpage comes from differential shrinkage, which is a function of differential pressure, differential temperature, differential residual stress, molecular and plastic orientations in fill and post-filling stage, as well as inherent and geometric stiffness of the part. These parameters do not act independently, they affect each other. Changing any parameter almost always causes two opposite effects on final results. For example, increasing packing pressure will decrease shrinkage and increase differential pressure in the part. The former could decrease warp which came from geometric differential stiffness, and the latter may increase warp by resulted higher density difference. So the final effect of increasing packing pressure on warpage depends on which of the two opposite effects is dominant. Because of the complexity of the causes of warpage, there is no a rule of thumb which can be simply followed in part design or mold design to minimize warpage. With CAE simulation tools, we can find out to which parameters warpage is more sensitive. Quite often, based on restricted computer time and analysis time, you will have a situation which only allows you to run a certain number of iterations to get the final answer. You have to find out as much information as possible from each iteration to determine the causes of warpage, then change the most sensitive parameters to reduce it.
500
As plastic material and some process parameters are fixed, the simulation focus on changing gate location, cooling arrangement, different pressure and wall thickness. From the simulation result, we try to get the main factor affected warpage and optimize it to get the minimized wapage part. 3.
Simulation and Results
Several different gate locations were evaluated. Different gatings caused quite different plastic orientations in the part. However along the length direction, the warpage was not affected by the changes of plastic orientation. Several cooling analysis were done by varying the coolant temperatures on both sides of the mold. Basically, mold temperature effects on warpage have two aspects: the temperature difference across the part thickness and at different locations along the part. The temperature difference through the thickness will cause differential residual stress from each surface to the mid-plane. This forms a moment causing the part to bend. Temperature differences at different locations cause uneven area shrinkage on different portions of the part, which will also contribute to the warpage of the part. The simulation results in this case showed that no matter what the cooling conditions were (within the recommended processing range), the trend of the warpage was the same and the magnitude was almost not affected. The differential magnitude of warpage between isothermal mold condition and with cooling analysis, in which both temperatures effects had been considered, was less that 7% of total warpage. This prediction indicated that in this case, the warpage was not caused by mold cooling.
•i" .
Fig 4: Before Simulation
Fig 5: After Simulation
What left was the non-uniform shrinkage, caused mainly by pressure difference. Several variations were simulated, such as increasing packing pressure, longer cooling time, changing wall thickness, etc. The most effective way, from simulation, to reduce the warpage was to change the part thickness in certain areas. The warpage was decreased to 0.414mm (see Fig 5), within functional requirement range. The methodology of optimizing part wall thickness was introduced as following. 4.
Optimization Methodology
To optimize part wall thicknesses, two characteristics may be considered. One is the warpage is used as a measure of the overall quality of the part, hence constituting an objective function value. Therefore, the objective function of this problem is numeric instead of analytic. The other one is a considerable amount of computing time is required to evaluate the objective function.
501 The optimization technique is primarily classified into two search methods: the direct search method, and the gradient-based method [4]. The former uses only function values to reach the minimum; the latter uses the gradients of the objective and constraint functions. The gradient-based method is generally considered superior to the direct search method in its efficiency and effectiveness for most of the functional optimization problems. However, since the objective function of the proposed problem is not in a functional form, difference approximation must be employed to obtain gradient information. The gradient-based methods employ the following iteration procedure:
r(*+1) =TW +akdik)
(2)
Where CCk is the line search parameter, and a
is the search direction for the design
variables. To calculate the search direction, the gradients of the objective function are approximated using a forward finite difference method as follows:
3/(0,
1T'='°=
/(( 0 +At,)-/(t 0 )
£"
,3>
Where i = 1 to n number of design variables. In the above equation, to obtain a good estimate of the derivative, it is important to properly choose the finite difference step size At for each design variable, which depends highly on f(t) and t0. Also, to determine the search parameter CCk from equation (2) requires implementation of a line-search algorithm. Design variables tt, i = 1 to n, number of different wall thickness. Accordingly, for this type of injection molding problem, the gradient-based method will require a numerical experimentation, with a large number of function evaluations, which results in increased computing time. After optimising, the warpage was reduced from 1.42mm to 0.414mm, less than 0.5mm which is connector functional requirement. The optimized part has a much uniform frozen layer thickness and melt front advancement. And the bulk temperature distribution in the optimized part is more uniform. Therefore the final result leads to a consequent lower warpage value. 5.
Summary
In this paper we have presented a method on using injection molding simulation analysis to identify the root caused of an injection molding and an optimised methodology to improve injection molded part quality. 6.
References
1. Edward A. Muccio, Plastic Part Technology, ASM International, Ohio, 1991. 2. John Moalli, Plastics Failure Analysis and Prevention, Plastics Design Library, NY, 2001. 3. B.H. Lee, Optimization of Part Wall Thicknesses to Reduce Warpage in Injection Molded Part Based on the Modified Complex Method, ANTEC, p692-698, 1996 4. Robert A. Malloy, Plastic Part Design for Injection Molding, Carl Hanser Verlag, NY, 1994.
Evaluating Plane-strain Forging of M a g n e s i u m Alloy A Z 3 1 Using Finite Element Analysis S.C.V. Lim, M.S. Yong and C M . Choy Singapore Institute of Manufacturing Technology, 71 Nanyang Drive, Singapore 638075
Finite element (FE) simulation was used to evaluate plane-strain forging of magnesium alloy AZ31 at temperatures of 150°C, 250°C and 300°C. A commercially available finite element package, ANSYS/LSDYNA, was used in the finite element analysis. From the simulation, forging load and strain distributions were computed and determined. Physical forging experiment was carried out on a rectangular billet of dimension, 100mm in length by 20mm in width by 5mm in thickness. The possibility of numerical prediction was evaluated by comparing the predicted forging loads with the empirical results. The deformation profile generated from the FE stress-strain distributions were compared to the macrographs of actual forged parts.
1.
Introduction
There has been increasing use of magnesium alloy for lightweight structural and functional parts in the automotive and electronic industries. This is largely due to the fact that magnesium alloys have the lowest density among structural metals, high specific strengths, excellent machinability and good damping capacity and electromagnetic interference shielding. Most parts made of Mg alloys are produced by diecasting and thixoforming but the products tend to have inferior mechanical properties compared to forged parts [1]. It is desirable for Mg alloys under solidus condition to be formed using stamping, extrusion or forging processes but compared among the different forming methods, forging has a number of advantages particularly in the area of strengthening and increasing the reliability of the component. Many influencing parameters such as temperature, strain-rate, friction and pre-form shape can affect the forging process [2] and the effects of such parameters also vary with different materials used. There are few studies done on the effect of temperature and strain rate on the forging of axisymmetrical parts of Mg alloys [3-4] but very little or none on the plane-strain forging of Mg alloys. Some work has also shown that the workability of magnesium alloy can be effectively improved by increasing the working temperature above 300°C [5] but little work has been done to evaluate the formability of Mg alloy below 300°C. Therefore in this study, we will investigate the possibility of plane-strain forging a Mg alloy, AZ31 billet, of dimensions 20 mm in width, 5 mm in thickness and 100mm in length through a backward extrusion process. The simulation study uses ANSYS/LSDYNA finite element analysis software to evaluate the effect of different forging temperature (within the warm forging temperature range of 150°C to 300°C). Empirical experiments were conducted and the results were compared with the FEM predictions. 2.
Methodology
2.1 FE modeling The 2D plane-strain FE model of length 1 mm (in the z axis direction) consists of a forming punch, billet material and die insert (Figure 1). The half 2D model was found to be sufficient for the simulation in our previous study [6] and was used to cut down computation time. The punch and die were simulated as rigid bodies (MAT_RIGID) while the AZ31 billet material subjected to the backward extrusion was modeled as a strain-hardening plastic body (MAT_POWER_LAW_PLASTCITY). The material constants of AZ31 used in the FE program, a = Ke", for the respective forming temperatures selected are shown in Table 1. An auto-remeshing logarithm (CONTROL_ADAPTIVE) was performed to avoid substantial damage in the deformed material and to provide a more accurate simulation. As thin-wall magnesium parts are desirable, the simulation was carried out for the actual plane strain forging of a billet with dimension 100 mm in length by 20 mm in width from a thickness of 5mm to a thickness of 1 mm.
502
503 A DEC/ALPHA 600 series 64-bit workstation with an OSFl version 4.0 operating system was used to perform the finite element modeling. ANSYS/LS-DYNA version 5.6 was used for processing of the model. Table 1: Material Constants of AZ31
Temperature (°C)
150 250 300
0 = Ke" K 178 85 59
N 0.1 0.024 0.021
2.2 Experiments and simulation This study was divided into three stages. In stage I, different coefficient of friction values (0.1,0.12, 0.15, 0.2 and 0.3) at a fixed working temperature of 250°C were used as process parameters in FE simulations. In stage II, actual plane-strain backward extrusion experiments were carried out on Mg alloy AZ31B billets of dimension 100 mm by length, 20 mm by width and 5mm thickness. The billets were coated with a fine layer of graphite to act as lubricant prior to forging. The forging was done using a 50 tons hydraulic press. The initial forging was carried out at temperature of 250°C with forging loads of 30, 40, 50, 53 and 56 tons. Forging load is limited to 56 tons as that is the maximum machine capacity. The thickness of the billet was measured after each forging. The results were plotted and used to match the FE simulation results obtained in phase I to approximate the coefficient of friction of the process. In stage III, with the approximated |i value, FE simulation was carried out for the plane-strain forging at 150°C and 300°C. The forging load required was to be evaluated from the simulations and stress and strain distribution were studied from the FEM analysis for the three different working temperatures used. Forging using temperature of 300°C with forging loads of 20, 30, 35, 37, 40, 45 and 50 tons were conducted. 2.3 Microstructural analysis The actual parts forged at 250°C and 300°C were mounted, polished and etched to study the grain morphology. The metallurgical samples were examined using an inverter microscope at a xlOO magnification. 3.
Results and Discussions
From the FEM simulations, forging load vs punch displacement graphs were tabulated and shown in figure 2. From the graphs, it can be observed that there is a substantial increase in load over the initial punch displacement. This is due to force needed to overcome friction and more so to reach the yield strength of material. After the yield point is reached, smaller increase in load is sufficient for the material to undergo plastic deformation. The build-up in load at the final stage of deformation (at about 3 mm to 4mm region) can be attributed to the hydrostatic pressure build-up which is caused by greater difficulty of material flow at the final stages. The forging load was also observed to increase with an increase in coefficient of friction value since more load will be required to overcome higher frictional force.
504 Table 2: Punch displacement obtained with different forging load.
Forging Load (Tons) Punch Displacement (mm) Forging Load (Tons) Punch Displacement (mm)
30 0.1 20 0.08
Forging Temperature of 250°C 50 40 0.25 0.73 Forging Temperature of 300°C 30 35 37 0.5 1.0 0.25
53 1M 40 2.25
56 3.13 45 3.65
50 4.0
The punch displacement with respect to the different forging load using forging temperature of 250°C and 300°C are shown in Table 2 while the actual parts formed are shown in Figure 3. The experimental results of forging load vs punch displacement for the working temperature of 250°C was compared with that predicted by the simulations (see Figure 2). It was observed that the experimental result matched best with the simulation using p. value of 0.12. This value is indicative of the value of the coefficient of friction for the forging process carried out in this study. Experimental test would be done to verify the value obtained in subsequent studies. Using p. = 0.12, simulations of forging at 150°C and 300°C were carried out and the forging load vs punch displacement are shown in Figure 4 along with the empirical results for 250°C and 300°C. Actual experiment for the forging at 150°C was not conducted as the forging load evaluated from the simulation required for substantial punch displacement ( - 1 0 0 tons, see Figure 4) is higher than that of the capacity of the hydraulic press. It can be observed from the graph that the experimental results for forging at 300°C is in good agreement to that predicted by the FEM simulation. The load capacity of the hydraulic press used was sufficient in forging the original billet from 5 mm to 1 mm in thickness using a temperature of 300°C but not so for the case using 250°C. A load of approximately 70 tons is predicted by the simulation for the original billet to be forged to 1 mm thickness using a temperature of 250°C. From the graph, a trend can be observed where the forging load increases when the working temperature decreases. The increase of load though from 300°C to 250°C is quite marginal when compared to that of the situation when working temperature decrease from 250°C to 150°C. This is expected, as the strength and strain-hardening coefficient for working temperature of 150 is much higher than those of 250°C and 300°C (see Table 1). These values coincide with the theory that magnesium having a HCP structure will have another slip system when formed at temperatures 225 and above [7]. The deformation profiles generated by the FEM were matched with the actual forged part and found to be similar and generally accurate (see Figure 5). This indicates that the model used can adequately predict the deformation profile of forged part. Through the microstructural analysis, small refined grains were observed generally throughout the part formed with some areas having coarse grains (see Figure 6). Such small refined grains can be attributed to dynamic recrystallization occurring and this can be substantiated with studies carried out by Mwembela et al. [5]. Having small refined grains is desirable as the mechanical properties of materials having such microstructure are generally better in terms of strength and ductility. It was observed that the small refined grains were found in areas of high deformation and flow stress. Further studies are being done to correlate and investigate the effects of stress and strain on the development of microstructure in the deformed material. Stress and strain distributions for the different working temperatures were analyzed (see Figure 7). It was observed that the stress and strain distribution patterns are similar for the different forging temperature which implies that temperature has little effect on the stress and strain distribution pattern. The maximum equivalent stress (Von Mises), o"y stress and pressure predicted by the simulation for the three different forging temperatures are shown in Table 3. Both stresses and pressure values have similar trends, where the values increase with a decrease in forging temperature used. From the results, it is important to note the predicted maximum 0 y and pressure values for the forging at 150"C as the values are higher than normal worked hardened steel used for die manufacturing. The strength of normal worked hardened steels is approximately 1300 MPa. The maximum o"y and pressure values for forging at 250"C and 300°C are below 1300 MPa which indicates that the tooling could be operated safely.
505 Table 3: FEM prediction of maximum Von Mises stress, a y stress and pressure values for the different forging temperature Temperature (°C) 150 250 300
Von Mises Stress (MPa) 206 89 61
a, (MPa) 2225 1076 716
Pressure (MPa) 2110 1031 680
In the actual experiments carried out, the edges of some forged parts were found to have sheared off (see Figure 8). This can be attributed to occurrence of material dead zone (no material flow) as indicated by simulation using nodal vector displacement and the plastic strain distribution which reveals the shear zone Y (see Figure 9b and 9c). Comparing the nodal vector displacement for different u. values used in simulating the forging at 250°C (see figure 10), it was observed that the area of material dead zone decreases with a decrease in |l values. This phenomenal can be attributed to less friction between the die wall and billet as well as between punch and billet which leads to more homogeneous and greater ease of material flow. Thus, decreasing the friction coefficient may not reduce the forging load as significantly as compared to increasing temperature (see Figure 3 and Figure 4) but it can help in reducing area of material dead zone especially so for the present geometry used in this study. 4.
Conclusions
The following are conclusions drawn from evaluating plane-strain forging of magnesium alloy, AZ31 using finite element analysis: 1. 2. 3. 4. 5. 6. 7.
By matching the actual experimental results with simulations using different u. values, we obtain the coefficient of friction value for the process to be 0.12. The actual forging load and deformed shape are in good agreement with the load and deformation profile predicted by simulation. Actual parts of 5mm thickness could be formed to a thickness of 1mm using a forging temperature of 300°C and a 50 tons load. Small refined grains were found in the grain morphology of the forged parts. Material dead zone and material flow could be identified from the FEM analysis. Decreasing the friction coefficient does not reduce the forming resistance as significantly as compared to increasing the forging temperature. Reducing friction coefficient is significant for the reduction of material dead zone.
References: 1.
2. 3.
4. 5. 6. 7.
H. Hoffmann, A Toussaint, Strategies and future developments in the manufacturing process of lightweight car bodies, in: Proceedings of the 6lh international Conference on Technology of Plasticity, Nuremberg, Germany, 1999, pp. 1129-1140. Y.H. Kim, T.K. Ryou, H.J. Choi, B.B. Hwang, Journal of Materials Processing Technology, 123, (2002), pp. 270-276. Yasumasa Chino, Mamoru Mabuchi, koji Shimojima, Yasuo Yamada, Cui'e Wen, Kenji Miwa, Mamoru Nakamura, Tadashi Asahina, Kenji Higashi and Tatsuhiko Aizawa, Materials Transactions, 42, (3) 2001, pp. 4 1 4 - 4 1 7 . N. Ogawa, M. Shiomi and K. Osakada, International Journal of Machine Tools and Manufacture, 42, 2002, pp. 607 - 614. A. Mwemebela, E.B. Konopleva and H.J. McQueen, Scripta Materialia, 37, (11), 1997, pp. 1789-1795. S.C.V. Lim, M.S. Yong and C M . Choy, in Proceedings of the 4th ASEAN ANSYS User Conference 2002 Singapore, in-press. E.F. Emley, "Principles of Magnesium Technology", Oxford, New York, Pergamon Press 1966, pp. 483-488.
COMPARATIVE STRUCTURAL EVALUATION OF PROTECTIVE HELMETS USING THE FINITE ELEMENT METHOD A. SUBIC AND M. TAKLA Department of Mechanical & Manufacturing Engineering, RMIT PO Box 71 Bundoora, Vic 3083, Australia
University
C. MITROVIC Faculty of Mechanical Engineering,
University of Belgrade,
Yugoslavia
Every protective helmet on the market must first undergo rigorous testing procedures that are typically very time consuming and costly. For a protective helmet to be sold on the market, the design must be tested according to appropriate international and national Standards, which require that the helmet can withstand a certain level of energy absorption. Rigorous testing is also required during the design and development process if the helmet is to meet such standards when manufactured. Clearly, it is of paramount importance from the design point of view to develop and implement equivalent testing and analysis procedures within a virtual design environment prior to prototyping and manufacturing in order to reduce the time and cost associated with the development of new and improved designs. This paper presents computational design approaches and results obtained through modelling and analysis of energy absorption and penetration effects of protective helmets during impact using the Finite Element Method (FEM). The models developed encompass the nonlinear behaviour of helmet designs involving material, geometrical and contact nonlinearity using the Arc-Length method in conjunction with the Newton-Raphson method. A relatively new Arc-Length method was used rather than the more traditional displacement control method which cannot be applied successfully to structures showing snap-back effect. Analysis is done for both helmet and head-form. Computational results show close correlation with experimental tests. The developed methodology allows for more effective design optimisation of protective helmets compared to traditional approaches currently applied in industry.
1
Introduction
One of the main safety problems in road transport is the head protection of motorcycle riders. Even in most Western countries where helmets are compulsory by law, head injuries are a leading cause of fatalities. This problem has gained increased attention world wide, hence as a result of this concern a wide range of improved helmet designs have emerged in recent years. For a protective helmet to be sold in Australia, the design must be tested according to the Australian Standards (e.g. AS 1698:1988; AS/NZS 2512:1998; 2512.3.1:1999), which require that the helmet be subjected to three different types of energy absorption tests (using different shaped anvils) and a penetration test. The standard test used to determine helmet's structural integrity is known as the Impact Energy Attenuation test, or more commonly known as the drop test. In this particular test the helmet is secured to the standard headform and the helmet is dropped in guided free-fall onto a flat steel anvil. The acceleration imparted to the assembly is measured. Tests required for compliance in case of frontal impact are even more complex [1]. The time and cost involved in such tests is considerable. Clearly, it is of paramount importance from the design point of view to develop and implement equivalent testing and analysis procedures within a virtual design environment prior to prototyping and manufacturing in order to reduce the time and cost associated with the development of new and improved designs. This paper presents computational design approaches and results obtained through modelling and analysis of energy absorption and penetration
506
507
effects of protective helmets during impact using the Finite Element Method (FEM). A relatively new Arc-Length method has been used for this purpose in conjunction with the Newton-Raphson method [1], rather than the more traditional displacement control method. A case study involving a comparative analysis of two different helmet designs for frontal impact compliance is presented. 2
Modeling Approach
The helmet model considered here is made from multilayered laminar composite materials and takes into account fiber orientation, possible impact directions and interlaminar-normalized value of dynamic strength. Finite elements of the thin laminar shell type are used in helmet discretisation. The nonlinear finite element method is applied, taking into consideration particular nonlinearities in geometry. The simulations presented here involve dynamic tests whereby accurate identification of the relations force - time, displacement - time and force - displacement is essential. Simulation has been done for different initial conditions and composites of different characteristics for different helmet models. The complete analytical model is formed by using a particular analytical solution, displacement control method, Arc-Length method and adaptive system stabilization method, while Newton-Raphson procedure is used for non-linear finite element analysis (coupled with the analytical model) [1-3]. In the same manner as in case of the displacement control method it is possible to express the displacement increment as (see Fig. 1)
,^A,_ ~
Ax}l)r-AX^' (1>
Ax}* • AX^' +<'>AA
Figure 2 shows the associated Newton-Raphson incremental solution for non-linear finite element analysis of a typical simply supported beam with a force acting mid-span.
Figure. 1 Iteration Path
Figure. 2 Newton-Raphson Incremental Solution
508
A comparative analysis of two types of helmets with different lower edge designs has been carried out in terms of their respective energy absorption capability and maximal impact force.
Figure. 3 Helmet Type A
Figure. 4 Helmet Type B
In this analysis the problem of deformed helmet during impact with a rigid obstacle is treated. This is based on the load limitation problem solution by the adaptive stabilization method. Therefore, both geometric and material nonlinearities are minimized. Simulation has been done for different initial conditions and composites of different characteristics for different models. Computer software package NISA II (EMRC - Engineering Mechanics Research Corporation, Michigan, USA) is used for the simulation of helmet impact with a hard obstacle [4]. 3
Discussions of Results
For the system exposed to progressive impact force it is very important to determine the force load limits acting upon the helmet and its actual behavior under load over the time period in question (Fig. 5).
Figure. 5 Helmet Deformation Under Progressive Impact Force
It can be seen from Figure 6 that during impact the material has remained in the elastic domain. Such helmet structure will recover its initial shape. Also, the comparative impact
509 force versus displacement curves for designs A and B confirm that design A exhibits significantly greater energy absorption capability and lower maximal impact force value than type B, which is more suitable considering the passive safety criteria. 900
m~m
800
'H
700
^
4
600
1
£.500
W
8
b 400 u.
1 300 Jf 200 100 0
4
^
*-*.
/
i 1
i
j/
-•— Type A Helmet «
Type B Helmet
i
(/ \ ^ 20
40
60
displacement [mm]
Figure. 6 Impact Force Versus Displacement
4
Conclusions
The paper introduced a new computational approach for structural evaluation of protective helmets. Numerical analysis based on non-linear FEM was used to test static and dynamic models. Results presented in this paper indicate that according to the theoretical considerations the developed approach can successfully be applied with high accuracy for helmet design taking into consideration real-life requirements. Based on the results obtained and validations achieved, the presented method can be considered equivalent to standard crash-tests used for compliance testing of such equipment. This allows for a significant reduction in time and cost involved. 5
References 1. Mitrovic C. and Subic A., Simulation of energy absorption effects during collision between helmet and hard obstacle, in Subic, A. and Haake, S. (Ed.), Sports Engineering. Research, Development and Innovation, Blackwell, (2000) pp. 389-398. 2. Dunn S. A., Issues concering the uodating of finite elements model for experimental data, NASA TM 109116 (1994). 3. Zienkiewicz O. and Zhu J., Adaptivity and mesh generation, International Journal of Numerical Methods in Engineering, 32 (1991) pp. 783-810. 4. ANSYS Theory Reference, Structural Fundamentals, SAS IP, Inc.
BUCKLING ANALYSIS OF COMPOSITE SPHERICAL PANELS WITH RANDOM MATERIAL PROPERTIES B. N. SINGH Department of Applied Mechanics, MNNIT (Deemed University), Allahabad 211004, India E-mail: bnsingh9@ rediffmail. com N.G.R. IYENAGAR AND D. YADAV Department of Aerospace Engineering, Indian Institute of Technology, Kanpur 208 016, India E-mail: [email protected];[email protected] Composite spherical panels with random material properties subjected to axial compressive load with all edges simply supported have been investigated for buckling. The system model incorporates first order and higher order shear deformation theories. A probabilistic approach in conjunction withfirstorder perturbation technique has been outlined. Results for mean and standard deviation of buckling response for cross-ply panels have been presented.
1
Introduction
The increasing need of lightweight and optimised structures has lead to widespread adoption of thin walled composite laminates. Buckling in any form either precipitates or hastens the collapse of such structures. Very limited literature is available on composite structures with random material properties. Free and forced vibration [1], stability of columns [2], beams [3], flat and cylindrical panels [4-6] have been investigated with uncertain parameter and material properties. This paper presents an analytical approach to stability of spherical panels with random material properties. It outlines a stochastic approach using first order perturbation (FOPT) for solution of random characteristic equation of buckling evolved from random variation of material properties. The transverse shear effects have been incorporated in the formulation. 2
Basic Formulations
The governing equations with higher order and first order shear deformation theories [HSDT and FSTD] proposed in [7] as applied to spherical panels under compression can be taken from the said reference for the present study (not presented here to economise for space). For all edges simply supported cross-ply laminates, exact Navier type solution is possible. The displacements along the axes and rotations about X2 and xi satisfying the boundary conditions can be seen from ref. [7]:
510
511
Substitution of the above mentioned equations [7] in the system equation turns it into a homogeneous set. Nontrivial solution of the deflection yields the eigen value formulation for the critical load Ncr. This depends on the stiffness matrix elements atj ' s . These elements dependent on the material properties are random in nature, leading to the buckling load being also random. 3
Perturbation Approach for Buckling Load Statistics
Any random variable can be split up as the sum of its mean and the zero mean random part. For example, with over bar denoting the mean value, superscripts 'R' the random variable and 'r' the zero mean random part N«=Ncr+Nrcr
(1)
Substitution of Equation (1) in the characteristic equations, expanding, collecting same order of magnitude terms and retaining only up to first order, yields in a symbolic form Zeroth order: Ncr=F(aij) (2) First order: Nrcr=F(a^,Ncr) (3) Equation (2) is deterministic relating only the mean quantities and the mean critical load is obtained from this by any standard procedure. Using Taylor's rule and manipulating expressions, we get expansion: •McrJ=F(aik,aikJ,Ncr)
^
(4)
where , j denotes the partial derivative with d* evaluated at dj Now Ncr is obtained by using above Equation. Its variance is evaluated by taking its appropriate expectation. 4
Results and Discussion
The above procedure is employed for the second order statistics of critical buckling loads of symmetric and anti-symmetric cross-ply spherical panels with axial compressive load and all edges simply supported. The mean values for material properties of graphite/epoxy composite used are [7]: Eu = 40E22, Gl2 = G13 = 0.6E22, G23 = 0.5E22, vI2 = 0.25 , with the shear correction factor for FSDT as 5/6. Results are generated for b/a=l and 2, R/b=5, b/h=10 and 100, and two lay-up [0°/90°] and [0°/90°/900/00]. However, results for only [0°/90°] are presented due to space considerations.
512
4.1 Buckling Load: Composite Spherical Panels Validation Study Validation of the present technique has been carried out by comparison with Monte Carlo simulation (MCS). Table 1 shows the results by FOPT and MCS. Only En is taken to be random. It is observed that the results obtained from FOPT are in good agreement with MCS. Second Order Statistics Mean buckling loads The nondimensionalised mean buckling loads are presented in Table 2. It increases with increase in b/a and b/h ratios. HSDT gives slightly higher values compared to FSDT for both values of b/h. These differences in the predictions are small for thin panels. The mean load increases as the aspect ratio increases. Variance of buckling loads Table 3 presents the SD of Ncr with SD of all the basic RVs changing simultaneously. Dispersions in buckling loads with FSDT and HSDT models are of comparable order of magnitude. There is slight change in dispersion as the b/a ratio increases for both the stacking sequences. 5
Conclusion
The following main conclusion can be drawn from the present study: The buckling loads for symmetric and anti-symmetric cross-ply laminates show almost equal changes in scatter for the two theories FSDT and HSDT and in general, the two predict comparable results. Table 1: Comparison of spherical panel buckling load from MCS to the present approach, [0/90], with R/b=5, b/a=l and b/h=10 SD/mean, E n Method MCS: HSDT FOPT:HSDT
.05
.10
.15
.20
0.0252 0.0252
0.0441 0.0444
0.0636 0.0640
0.107 0.110
Table 2: Mean buckling loads, Ncr
= N
b2 / ( £ 2 2 / l 3 ) , N
= N,
for all edges
simply supported composite spherical panels with R/b=5 Theory Aspect [0°/90°] [0°/90u/90°/0u] ratio b/a b/h=10 b/h=100 b/h=100 b/h=10 FSDT
12.26
58.55
19.37
62.67
513
1 2
HSDT FSDT HSDT
12.47 32.82 35.30
58.57 74.51 74.56
18.37 38.03 34.32
62.64 114.28 114.16
Table 3: Sensitivity of SD of spherical panel buckling loads with SD of basic material properties with all basic material properties changing simultaneously, [0°/900]. Method SD/mean, material properties R/a=5 &b/h=10 .05 .10 .15 .20 0.0582 0.120 HSDT: b/a=l 0.0271 0.0810 0.0530 0.0750 0.104 FSDT :b/a=l 0.0268 HSDT :b/a=2 0.0210 0.0426 0.0680 0.100 0.0418 0.0630 0.091 FSDT :b/a=2 0.0200
References 1. Ibrahim, R.A., Structural dynamics with parameter uncertainties. Appl. Mech .RevAto (1987) pp.309-328. 2. Zhang, J. and Ellingwood, B., Effects of uncertain material properties on structural stability. ASCE J. Struc Engrg ,121(4) (1995) pp. 705716. 3. Jeong, J.D., Critical buckling load statistics of uncertain column. In Proc. 6th speciality conf. Probabilistic Mech. And Struc. And Geo. Tech Reliability (1995), pp. 563-566. 4. Singh, B.N., Yadav, D. and Iyengar, N.G.R., Initial buckling of composite cylindrical panels with random material properties. Compos. Struc. 53(1), pp. 55-64. 5. Singh, B.N., Yadav, D. and Iyengar, N.G.R., Stability analysis of laminated cylindrical panels with random material properties. Compos. Struc. 55(1), pp. 6. Singh, B.N., Iyengar, N.G.R. and Yadav, D., Effect of random material properties on buckling of composite plates. ASCE J. Engrg. Mech. 127(9), pp. 873-879. 7. Reddy, J.N. and Liu, C.F., A higher order shear deformation theory of laminated elastic shells. Intl. J.Engrg. Sci. 23(3) (1985) pp.319-330.
NUMERICAL ANALYSIS OF ADHESIVELY BONDED CYLINDRICALLY CURVED LAP JOINTS CHENGYU QIAN AND LIYONG TONG School of Aerospace, Mechanical and Mechatronic Engineering University of Sydney, NSW 2006,Australia E-mail: cyqian @aeromech.usyd.edu.au This paper investigates the effect of curvature on adhesive stresses in cylindrically curved bonded single-lap joints using both analytical and finite element analysis methods. In the analytical method, shear and peel stresses are assumed to be constant across bondline, and the governing equations are derived and then solved using a multiple shooting method. 2-D plane strain finite element analyses are performed using Strand 7 to validate the present analytical models through studying the effect of curvature on stress distributions for the joints subject to three-point bending. Preliminary results show that there exists a good correlation amongst the shear and peel stresses predicted using the present analytical approach and the finite element method.
INTRODUCTION Adhesive bonded joints and repairs have been increasingly used in joining and /or repairing lightweight structures, particularly in fuselages and wing and control surfaces in airframes. There exists a large amount of research on stress analysis of adhesively bonded lap joints with straight adherends [1-2]. However, only very limited studies were devoted to investigation into the effect of curvature on adhesive stresses in adhesively bonded joints with curved adherends. Sun and Tong [3] investigated the effect of curvature on the performance of actuators and sensors for curved smart beams. This paper aims to address this issue by determining adhesive stresses in cylindrically curved beams bonded with a single-sided patch. In this paper, an analytical model is formulated using the curved beam theory [4] and the constant shear and peel strain assumptions [1]. Solutions for the present model are obtained by using the multiple-segment shooting method, and then validated via comparison with finite element analysis results. Numerical results for the selected joints subjected to three points bending are presented with various radii of the curved beams and patches to illustrate the effect of curvature. ANALYTICAL MODELING Consider a curved thin host beam with a single-sided patch as showed in Figure 1, it is assumed that the shear and peel strains in the adhesive layer are constants across the bondline. The entire curved beam can be divided into two regions, namely, the host beam region and the overlap region. Using the curved beam theory degenerated from the deep shell theory [4], the middle surface strains and curvatures of the host beam and patch are: ax
Ri
dx
Rt dx
The strains and the longitudinal displacements at an arbitrary point can be expressed as €,& = {£?+zx,),
ut(z) = ut + z(%--^), R, dx
514
(»=1,2)
(2)
515 Where the subscripts 1 and 2 represent the host beam and the patch respectively; u and w are the longitudinal and transverse displacement in the mid-plane; R is the radii of the curvature of the beam and patch.
Adhesive Layer
Figure 1 The adhesively cylindrical curved beam with lower patch
The equilibrium equations in the overlap can be derived as follows: TU+QJRI -bT2+f,(x) = 0, QU -TjR, -ba2+f,(x) = 0 Mu+br1hl/2-Q,=0, T2x+QjR2+bT2=Q Q2,*-T2/R2+ba2=0, M2x+br2h2/2-Q2=0 (3) where h denotes the thickness of the beam and patch, b is the width of the curved beam, T,Q,M are the axial force, transverse shear force and bending moment respectively, rand a are the shear and peel stresses of the adhesive layer. The axial force and bending moment are: (1=1.2) 12 E is the Young's modulus of the beam and the patch. Shear and Peel stress in the adhesive layers are defined as
T^E.bh.e?
2
1 i,
•
,
-2
V 1 1.x
i.»,-», 2.x)
.
1
2\
—=-«! H \
R
—U. 2
R
(4)
J
a =£ v /((l-v*2KXw,->v 2 ) (5) hv is the thickness of the adhesive layer, £v and Gv are the Young's and shear moduli, v is the Poisson's ratio of adhesive. Equations (2)-(5) together with relevant boundary and continuity conditions form a boundary value problem, which can be solved by rewriting them in a set of first-order ordinary different equations and using multiple shooting methods [5].
ILLUSTRATIVE EXAMPLE Consider a single-sided lap joint of 25 mm wide clamped at both ends and subjected to point load at the middle point of the host beam as schematically shown in Figure 1. Numerical analyses are conducted using the present analytical model and 2Dlinear elastic plane strain finite element analysis (FEA) models with STRAND7 [6]. In the FEA, 8node isoparametric elements are used to model both adherend and adhesive layer. In the through-the thickness direction, one element is used to model the bondline whereas four
516
elements are used to model the adherend of host beam and the patch. Along the circumferential x or 6 direction, elements of equal arc length 0.25 mm are used. It is worth noting that the arc length of adhesive element near adhesive free edge is slightly less than the height of the element, which is equal to the adhesive thickness 0.3 mm. The applied load P is chosen as 0. lkN. Table 1 Physical properties and dimensions of the curved joints Item Host beam Patch Adhesive Young's modulus (GPa) 70 70 2.234 Poisson ratio 0.334 0.334 0.31 1.0 0.3 Thickness (mm) 1.0 Arc length (mm) 500 100 100 RESULTS AND DISCUSSION Figure 2 plots the transverse deflection of the mid-plane of the host beam at the loading point versus the curvature of the host beam predicted using the present analytical models and the finite element methods. This deflection at the loading point is the maximum deflection occurs in the joint. It is clear that the maximum displacement falls sharply as the curvature varies from 0 to 0.25, and then decreases slowly as the curvature is increased from 0.25 to 3.33. The straight lap joint has an extremely small bending resistant stiffness and thus undergoes a substantially large deflection, whereas the curved lap joint, even with a very small curvature, gains significantly in overall bending resistant stiffness and hence experiences a small deflection. Evidently, there is an acceptable agreement between the FEA and analytical results for the peak deflection of the host beam with the actual difference ranging from 14 to 24%. 0.3
E" 0.25 , ysis
>placerr
0.2 1 0.15
— * — FEA
0.1
h 0.05 \ 0
I
1
2 3 Curvature(1/R)
4
Figure 2 Comparison of maximum displacement vs. curvatures predicted by the analytical model and FEA
Figures 3 and 4 depict variation of the peak shear and peel stresses versus the curvature of the host beam. Similar to the observation for the maximum transverse deflection in Figure 2, both shear and peel stress decrease dramatically as the curvature becomes nonzero and then plateaus overall as the curvature is increased. However, there exist a range of curvature between 0.25 and 0.5, in which shear stress increases slightly from its valley. It is evident that there exists a good agreement between the FEA and analytical results for the peak shear and peel stresses in the adhesive layer. The actual difference between the FEA and analytical analysis ranges from 2.6 to 9.7% in peak shear stress and 0 to 3.4% in the peak peel stress.
517 175
s 150 i
100 80 co CO
,
60
to
40
I 20
L
-
••""•Analys
11 25 00
s
co 0)
—FEA
'
75 a> 50 o. 25
11 1
1
1
2
3
2
3
4
Curvature(1/R)
Curvature(1/R)
(a)
(b)
Figure 3 Peak shear and peel stresses vs. curvature predicted by analytical and FEA
CONCLUSION In this paper, an analytical model is presented for thin cylindrically curved beams with adhesively bonded single-sided patch to allow investigation into the effect of curvature on adhesive stresses. 2D plane strain FEA models are used to validate the present analytical model by considering the selected illustrative example. The present preliminary numerical results show that introduction of curvature in the host beam can significantly reduce the peak adhesive stresses compared to those without a curvature. ACKNOWLEDGEMENTS The authors are grateful to the support of the AOARD/AFOSR and the University of Sydney. The authors would also like to acknowledge the help from Dr. D. Sun and Mr. Michel Wood. REFERENCES 1. Goland M. and Reissner E. (1944) The Stress in Cemented Joints. ASME J. of Applied Mechanics 66, A17-A27. 2. Tong L. & Steven G.P. (1999) Analysis and Design of Structural Bonded Joints, Kluwer Academic Publishers, USA. 3. Sun D. & Tong L. (2002) Modeling and analysis of curved beams with debonded piezoelectric sensor/actuator patches. Int. J. Mechanical Sciences 44, 1755-1777 4. Qatu M.S. (1993) Theories and Analysis of Thin and Moderately Thick Laminated Composite Curved Beams. Int. J. Solids Structures 30 (20) 2743-2756. 5. Stoer J. & Bulirsch R. (1980) The Multiple Shooting Method. Introduction to Numerical Analysis, pp483-519. 6. Introduction to the strand 7 Finite Element Analysis System (G+D) Company Pty Ltd, Sydney, Australia, 1999).
NUMERICAL ANALYSIS OF THE EFFECT OF INTERPHASE ON THE DEFORMATION OF PARTICLE-REINFORCED COMPOSITES W.X. ZHANG, F.P. YANG AND T.J. WANG Department
Of Engineering Mechanics, Xi'an Jiaotong University, Xi'an 710049, E-mail: [email protected]
China
Finite element analysis is carried out to analyze the interphase effect on the macro- and microscopic deformation behavior of SiC particle reinforced Aluminum matrix composites. The macroscopic stress and strain curves for the SiC/Al composites are obtained for different interphase stiffness, thickness and Poisson's ratio. Also, the distributions of microscopic stress and strain are presented.
1
Introduction
It is well-known that particle reinforced metal matrix composites (MMCs) are being widely used in engineering. Soppa, et al.[l] investigated the effects of microstructure variations on the macro-, meso-, and microscopic deformation of SiC particle reinforced Al matrix composites. Numerical analysis were carried out by Xu, et al.[2] to study the effects of geometry factors on the mechanical behavior of particle and fiber reinforced MMCs. It is assumed in [1,2] that the reinforcing phases are perfectly bonded to the matrix. However, there will be interphases between reinforcing phases and matrix of MMCs, and the mechanical behavior of composites will be greatly affected by the characteristics of interphases, e.g. stiffness, thickness, Poisson's ratio etc.. Wang and Yang [3] studied the energy dissipation in particle reinforced MMCs under cyclic external loading. In this paper, effects of interphase stiffness, thickness and Poisson's ratio on the macro- and microsopic deformation of SiC particle reinforced Al matrix composites are numerically studied. 2
Micromechanical Methods
It is assumed that spherical SiC particles are uniformly distributed in Al matrix. An axisymmetric representative unit cell mode and ANSYS code are employed in analysis. Here, we assumed that the distances between the particles in the directions of Z-axis and R-axis are H and B, respectively, as shown in Fig.l. A displacement is applied in Z direction. The boundary displacements at z=H and R=B are always uniform. Such that the boundary conditions can be expressed as, uz = 0 at Z=0; uz = UZ at Z=H;
(la)
UR = 0 at /?=0; Fz = 0
(lb)
at R=B.
The macroscopic axial strain is expressed as,
Ez=ln{H/H0). The corresponding stress can be calculated as,
518
(2)
519
*z=^Lv
(3)
SiC particle is taken as an elastic body. Matrix Al is taken as an elastic-plastic body obeying von Mises yield criterion. The isotropic strain hardening law for the matrix is as follows,
(Jf=
Fig. 1 Unit cell model,
(4)
where at and a0 are flow and initial yield stresses, respectively, h and n material constraints, e^ equivalent plastic strain. It is assumed that the constitutive law of interphase layer has the same form as matrix Al [3], namely, '/int
'Oint
+ h^(eS
(5)
where a^^Pi1 represent no interphase, soft and hard interphases, respectively. In what follows, we assume H0=B0=\, and consider five different values of /? (=0.01, 0.5, 1, 3, 5 and 10) and three different values of interphase thickness Ah (=0.01, 0.025 and 0.05). Material parameters are taken from [2-4], £m=69GPa,
Numerical Results and Discussions
Fig.2 shows the macroscopic stress and strain curves of the composites with particle volume fraction f= 10% and interphase layer thickness Aft=0.025, from which one can see the significant effect of material parameter /? on the deformation of composite, but this effect becomes very small as p>\. Effect of Poisson's ratio of the interphase on the deformation of composite is shown in Fig.3 for/=10%, /?=0.01 and A/i=0.01. Figs 4 and 5 show the effect of the thickness of interphase layer on the deformation of composites with f=\Q% and /?=0.01. It is clear from Fig.4 that the interphase thickness has significant effect on the macroscopic deformation of composite as P«\, i.e. a super soft interphase case. However, one can not see this effect in the composite with hard interphase layer, e.g. /?=10fromFig.5. Effect of the material parameter /? on the distributions of microscopic von Mises stresses and equivalent strains in the composites are shown in Fig.6 and Fig.7, respectively. It is seen that stress distributions in matrix are similar for the composites without interphase and with hard interphase, as shown in Figs 6(a) and 6(b), but it is totally different for the composite with soft interphase, as shown in Fig.6(c). Figs 7(a) and 7(b) clearly show that deformation concentrates in the matrix for the composites without interphase and with hard interphase, but this phenomena is totally different for the composite with soft interphase layer, as shown in Fig. 7(c).
520
5T 50 |
«o
— • - -v=0.01
30
—•--v=0.3 -v=0.49 —*— o - - no interphase
20 10
Fig.2 Effect of the material constant /? on the macroscopic stress-strain curves of composites with f=\0% and Afc=0.025.
Fig.3 Effect of Poisson's ratio of the interphase on the macroscopic stress-strain curves of composites with/=10%,/3=0.01 andA/i=0.01.
- a - no interphase S 30 20
h=0.01 - A - h=0.025 -ir- h=0.05
-«-
10
0 004
strain
Fig.4 Effect of the thickness of interphase layer on the macroscopic stress-strain curves of composites with^=10% and ^=0.01.
4
Fig.5 Effect of the thickness of interphase on the macroscopic stress and strain curves of composites with/=10%and / 8=10.
Conclusion
Material parameter J3 has singnificant effect on the macroscopic stress-strain curves of MMCs, but this effect becomes very small as /fc>l i.e. hard interphase case. It is seen that effects Poisson's ratio and the thickness of interphase layer on the macroscopic deformation of MMCs are also significant as /?<1 i.e. soft interphase, but the interphase thickness has almost no effect on the deformation of MMCs as /fc>l. It is clear from the distributions of microscopic stresses and strains in the MMCs with and without interphase that deformation concentrated within the soft interphase layer and stress concentration occurs in the hard interphase layer. 5
Acknowledgements
This work was supported by the National Natural Science Foundation of China (No. 10125212) and the funds from The Ministry of Education of China.
521
(a) without interphase
(b) with hard interphase, yS=10 and AA=0.025
(c) with soft interphase, yS=0.01 and AA=0.025
Fig.6 Distributions of microscopic von Mises stresses in the MMCs with particle volume fraction f= 10% while the initial yield occurs in the matrix.
(a) without interphase
(b) with hard interphase, /?=10 and A/i=0.025
(c) with soft interphase, /?=0.01 and A/i=0.025
Fig.7 Distributions of microscopic equivalent strains in the MMCs with particle volume fraction^ 10% while the initial yield occurs in the matrix.
References 1. Soppa E., Schmauder S. and Fischer G., Influence of the microstructure on the deformation behaviour of metal-matrix composites. Comp. Mater. Sci. 16 (1999), p.323-332 2. Xu D., Schmauder S. and Soppa E., Influence of geometry factors on the mechanical behavior of particle- and fiber-reinforced composites. Comp. Mater. Sci. 15 (1999), p.295-301. 3. Wang J.C. and Yang G.C., The energy dissipation of particle-reinforced metal-matrix composite with ductile interphase. Mater. Sci. Eng. A303 (2000), p. 77-81. 4. Zhang J., Perez R.J., Wong C.R. and Lavernia E.J., Effect of SiC and graphite particulates on the damping behavior of metal matrix composites. Mater. Sci. Eng. R, 13(1994),p.325.-390.
NUMERICAL FINITE DEFORMATION ANALYSIS ON SOLID PROPELLANT GRAIN USING FINITE ELEMENT M E T H O D
YANG YUECHENG*, QIANG HONGFU, XU GUIMING AND ZHAO HAISHENG Xi'an Hi-Tech Research Institute, Hongqing Town, Xi'an, Shaanxi, PRC, 710025 Email address: [email protected] In the paper, a numerical simulation of finite deformation on solid propellant grain under the cuing process is studied. A three-dimensional numerical simulation for shell-grain model is carried out by using a commercial software ANSYS, which is a finite element code with Lagarange processor, and is especially suitable for modelling nonlinear quasi-static/transient problems. The material models in ANSYS, however, are mainly those applicable to metals or other materials. They are not exactly suitable for viscoelastic materials such as solid propellant. In this paper, a new nonlinear viscoelastic constitutive relation model which includes effects of strain softening and strain rate, effects of Poisson ratio's time-dependent or time-independent and effects of time-temperature equivalent and compressibility or incompressibility for solid propellant, a viscoelastic large deformation variational equation based Dirichlet-Prony series representation of the relaxation modulus Total Lagrange (T.L.) method, are developed. The developed models are implemented into the ANSYS code through its user subroutine function. Finally, the 1/8 scale Solid Rocket Motor (SRM) 3D numerical examples are executed, numerical results of deformed area, as well as stress field distribution are obtained and compared with those from independent fields tests.
1
Introduction
It is very important to study viscoelastic properties for solid propellant accurately and effectively due to structural complexity of SRM and nonlinear geometrical distortion in service process. Many endeavors have been done [1-3, 5-6] for the viscoelastic constitutive model of solid propellant grain, they were all single integral forms using kernel function. Amongst them, Swanson's nonlinear constitutive law is better than others, it has a very sound engineering background, strain softening and rate-dependent effects are considered and formula is very simple. Unfortunately, the compression for material behavior is not considered when Poisson ratio is time-dependent, in another words, it has coupled phenomenon among the shearing modulus and bulk modulus, as well as Poisson ratio. Hence authors propose a new modified version of Swanson's model which considered the demerits of it. In present paper, a new improved nonlinear viscoelastic constitutive model is presented based on Swanson's model, the strain rate effects and Poisson rate timedependant are considered. Incrementalization is accomplished in closed form. With virtual work equation of T.L. method, 3-D viscoelastic FE algorithm to quasi-static / transit problems is derived. Finally the 1/8 scale 3-D model is meshed and its numerical simulation is performed, the simulating results agreed well with published data, it proved that the mechanical model is sound academic. 2
3-D unified viscoelastic constitutive relation
Based on single integral form nonlinear viscoelastic constitutive relation proposed by Swanson et al. [6] and corresponding small-strain theory, a modified nonlinear viscoelastic constitutive model is presented, which considered Poisson ratio time-
522
523
dependent or time-independent and time-temperature equivalent effect for solid propellant, as well as compressibility. S ^ g i E ' ^ l G i t - O m ^ r
+S t j i m - O m ^ d A
(1)
where S-- is Kirchhoff stress tensor under finite deformation; Ey is Green strain tensor under finite deformation; >(T) is modified function with respective to strain rate, especially refers the effect of temperature, when temperature variation is neglect, we can take
HE'is the
AT =T —T0 is temptation variation for material; G(
dr
\
and P' = r
« r (n)
^
are defined as reducing time, assume it is thermo-
^rCi)
rheological simple material; aT is time-temperature shift factor, and is defined by W.L.F. equation, , 10]
_ C,(T — TR) where TR is reference temperature, C, and C 2 are constants
C2+T-T, determined by experiments. Hence all the factors associated with time-temperature equivalent effects, compressibility and strain-softening, strain rate variations as well as heat strain are taken into account in equation. (1), so equation (1) is a unified nonlinear viscoelastic constitutive model. It can be degenerated to special cases under different conditions. When material property is approximately incompressible, bulk relaxation modulus K{t) = const is taken, when it is completely incompressible, the bulk relaxation modulus K(t) = oo is assumed. In additional, another property parameter in Eq. (1) is common used as shearing relaxation modulus G(t) • When bulk relaxation modulus K(t)^ const is considered, Eq. (1) is also stand. When its nonlinearity in the Eq. (1) is ignored, i.e. strain-softening function #(£') = 1 or modified function with respective to strain rate 0(T) = 1, then Eq. (1) is reduced to linear thermo-viscoelastic constitutive model. 3
Incremental form of unified 3-D nonlinear viscoelastic constitutive model
In order to analyze finite deformation for solid structure, we have to derive integral incremental constitutive relation from equation. (1). In brief, let the bulk relaxation modulus be represented by a Dirichlet-Prony series as,
524 (2)
*(f) = 5XexpHV) where K
and )3 are relaxation coefficients and characteristic time in p' order, is
summation of Dirichlet-Prony series expansion, so the incremental relation at time tn is derived as i*S»l=
g.(Ol " £1
/ ^ ; 0,A A '»
{AEtt} (3)
+ £[£„(£') exp(-j3p B„Af„) - g„_, (£')]{ 4 } , - , . , *=1 ffl
+ £„(£')£({/C t t }„_lp - { / „ }._li,)0exp(-^B„Ar11) where the first term on the right side denotes the stress increment induced by the strain increment {AEkk} at time interval Atn ; the second term on the right side denotes relaxation stress increment at time interval Atn ; the third term on the right denotes influence on stress increment by strain rate yielded by temperature variation at time interval Atn. With help of principle of virtue work using T.L. method, we obtain
J0y A^C^JA^dv +
+ J0v (AL0ErsC:siJ80N% + LfEnCnl]S^Ev
AfErsCrsiJSAN0LE,J)dv
= S'+A'W-j0vC0Su+,0S!
+ lvC0SlJ +'Q
S^ +'0 S^SAfE^dv
(4)
^S^SA^dv
this is its corresponding finite deformation formulation for finite element analysis, the significance of variables above on are ignored concrete .due to length limits. All the features of the model are coded and incorporated into the ANSYS commercial software using the user defined subroutine interface. 4
Numerical example
Based on the symmetric configuration of the SRM, the 1/8-scale propellant grain coupled shell structure is targeted to analyze during the cooling process from 70°C to 20°C. The mechanical model mentioned above is employed. The mechanical property parameters of SRM are referred to [4].The results are plotted in vector graph, in which MX and MN denote the maximal and the minimal value position in the grain, see figure 1-figure 2, in which A, B and C represent the radial, circumferential and axial value distribution respectively, D represents the equivalent value distribution, following analysis in detail. • In the X-axes, the displacement MX and MN locates at the inner and outer of the rocket motor, and the volume reduction can read from the figure, which is agreeing with the engineering practice. The stress MX locates at the conjoint place of the alaslot and ala-slice, and MN locates at the outer of the ala-slice. • In the Y-axes, the displacement MX and MN locates at the two sides of the ala-slice respectively, and the value of the MX equals to MN with the opposite direction. This
525
•
•
is the much according with the symmetry assumption and the engineering practice. Stress distribution is similar with the x-axes. In the Z-axes, displacement MX and MN locate at the two ends of inner surface respectively. Stress MX and MN locate at the end of the ala-slot, MX locates at the conjoint place of the ala-slot and ala-slice, MN locates at the bottom of the ala-slot. Equivalent stress MX locates at the conjoint place of the ala-slot and ala-slice, MN locate at the conjoint area of the propellant grain and the cylinder of the shell of the rocket motor.
A
B
C
D
Figure 1. Translation displacement vector-graph.
A
B C Figure 2. Stress field distribution vector-graph
D
5
Conclusions
1.
The comparison between the simulating results and the engineering data [4] proves the mechanical model is reliable and academic. Regarding experimental data for grain [4], the tensile stress is much more dangerous than the compressive one. It can be seen from the results, the tensile stress is much higher than the compressive stress. To guarantee the safety of SRM, the tensor stress value at the centralization location of the tensor stress should be diminished. The displacement during the cooling down of the solid rocket motor is higher than the design value. This phenomena show that the inner ballistic trajectory property would be affected by the higher displacement.
2.
3.
526
References 1. Burke M.A., Woytowitz P.J. and Reggi G., Nonlinear viscoelastic constitutive model for solid propellant. Journal of Propulsion and Power, 8 (1992) pp. 586-591. 2. Jung G.D., Youn S.K. and Kim B.K., 2000. A three-dimensional nonlinear viscoelastic constitutive model of solid propellant. International Journal of Solids and Structures, 37 (2000) pp. 4715-4732. 3. Lai J. and Bakker A., 3-D shapery representation for non-linear viscoelasticity and finite element implementation. Computational Mechanics 18 (1996) pp. 182-191. 4. Qiang H.F., Numerical analysis and experimental researches on solid rocket motor grain structure integrity. Ph.D. Dissertation, Dept. of Engineering Mechanics, Xi'an Jiaotong University, Xi'an, PRC, January 1999 (in Chinese). 5. Shapery R.A., On the Characterization of nonlinear viscoelastic material. Polymer Engineering Science, 9 (1969) pp. 259-310. 6. Swanson S.R. and Christenson L.W., A Constitutive formulation for high elongation propellants. Journal of Spacecraft Rocket, 20 (1983) pp. 559-566.
INVESTIGATION ON THE COUNTER-INTUITIVE PHENOMENON OF ELASTIC-PLASTIC BEAMS Y. M. LIU AND G. W. MA School of Civil and Environmental Engineering, Nanyang Technological University, Singapore 639798 E-mail: [email protected], [email protected] Q. M. LI Department of Mechanical Aerospace and Manufacturing Engineering, UMIST, P.O. BOX 88, Manchester M60 1QD, UK E-mail: ainsming. li@ umist.ac. uk Counter-intuitive phenomena of pin-ended and fix-ended elastic-plastic beams subjected to impulsive load are simulated using finite element method. The effects of element type, mesh size and the magnitude and time duration of the impulsive load on the phenomenon are studied. Sensitivity analysis of the counter-intuitive phenomena based on the Shanley model and finite differential method is also carried out.
1
Introduction
Counter-intuitive phenomena were first observed in elastic-plastic beams subjected to impulsive pressure by Symonds and Yu [1]. The counter-intuitive phenomenon means that the residual mid-point deflection of the beam lies on the same side of the applied impulse. The same phenomenon was also observed when studying a fix-ended uniform beam under concentrated impulsive load [2]. To explain the abnormal behavior of beam dynamics, a Shanley model was adopted to capture the counter-intuitive response feature of the pin-ended beam [1]. Results showed that the counter-intuitive phenomenon is very sensitive to the magnitude and duration of the applied load [1-4]. In the present study, both a pin-ended elastic, perfectly plastic beam subjected to impulsive pressure load and a fix-ended elastic, perfectly plastic beam under concentrated impulsive force are simulated using the finite element code LS-DYNA. The effects of the element types, mesh size, magnitude and time duration of the impulsive load are studied. The derived results are compared with the results obtained by other researchers. Sensitivity analysis on the counter-intuitive behavior based on the Shanley model and finite differential method is also performed to catch the transition points. 2
Pin-ended beam problem
A uniform beam with rectangular cross-section and pin-jointed at both ends is simulated. The impulsive pressure po has a rectangular shape with time duration to. The material is elastic, perfectly plastic. The half span, the width and the thickness of the beam denoted as L, b and h are respectively 100mm, 20mm and 4mm in the present study. E,CT0,p and v are the Young's modulus, the yielding stress, the mass density and the Poisson's ratio of the beam material with values of 80GPa, 300MPa, 2700kg/m3 and 0.3, respectively. Fig. 1 shows the final deflection at the midpoint of the beam obtained by using shell element in LS-DYNA. In Fig. 1, the time duration to is fixed at 0.5ms and the magnitude of the impulsive pressure p 0 varies from 0.8 to 1.1 MPa. When the impulsive pressure is in a window approximately between p]=0.90 MPa and p2=0.99 MPa, the final deflection
527
528
of the mid-point of the beam is negative, which indicates the counter-intuitive phenomena. The numerical results obtained by using 3-D solid elements in LS-DYNA with different sizes are shown in Fig. 2. It is seen t-~-7"—""" from Fig. 2 that the location and the width of the window of the pressure pulse depends on element types used. There exist two windows within which the counter-intuitive behavior occurs when solid elements with four layers are adopted along the thickness. However, the first •o •6 v window disappears when the mesh size along ~—J the thickness become smaller. Because of the 0.80 0.85 0.90 0.95 1.00 1.05 1.10 highly parametric sensitivity of the problem, the Impulsive pressure load p0 (MPa) mesh must be fine enough to ensure the Fig. 1 The final deflection of the beam accuracy of the results. The time duration of the impulsive pressure load also plays an important role on the counter-intuitive response of the beam. Three cases corresponding to to values of 0.01, 0.1 and 0.5 ms are studied with a mesh of 100x10x6. Fig. 3 shows that the magnitude of the load when the phenomenon occurs increases significantly as the time duration of the impulsive pressure reduces. The vibration of the mid-part of the beam is more significant than that near the ends when the time duration becomes shorter. .
*-*—•—*
f:.~-
-0.98 D
"So.96 §0.94 20.92 I o u .20.90 - i
—|g—p2
—A-Pi
BO.88
0 1 2 3 4 5 6 7 8 9 10 Cases of mesh Fig. 2 Case 1: shell element with a mesh of 100X10; Case 2-9: solid element with meshes of 60X6X4, 80X8X4, 100X10X4, 60x6X6, 80X8X6, 100X10X6, 100X10X8, 100X10X10 respectively
3
Cases of time duration of the impulsive load Fig. 3 Effect of the time duration of the impulsive load on pi and p2 (Case 1-3: to=0.01, 0.1, 0.5 ms)
Fix-ended beam problem
A fix-ended beam is then simulated with a refined 3-D solid element mesh 100x10x5 for a quarter of the beam. The beam is uniform and fully constrained against displacement and rotation at both ends. A rectangular impulsive force is applied at the mid-point of the beam. The material is assumed again elastic, perfectly plastic. The material and geometry parameters adopt the same values as those in [2]. The window of force within which the counter-intuitive phenomenon occurs is found in the range between 560 and 700 N. This range differs slightly from the results calculated with ABAQUS in [2] which is in the range from 520 to 600 N. It is shown that not only the location but also the width of the
529 window in which the counter-intuitive phenomenon occurs are different when different finite element codes are used. Figs. 4 and 5 illustrate the variation of the deflection time history and the effective plastic strain accumulation at the midpoint of the beam when the counter-intuitive phenomena occur.
• | 0.035 " 0.030 o '£ 0.025 "C 0.020 > 0.015 J& 0.010 W 0.005
N
[F
J
r
0.000 0.002
0.004
0.006
0.008
0.'
t(s) Fig. 4 Comparison of the deflection of the mid-point of the beam
4
F=700N F=710 N
0.002
0.004
0.006
0.008
0.010
t(s) Fig. 5 Comparison of the effective plastic strain of the lower element at the mid-span
Sensitivity analysis
Shanley model has been successfully used to analyze the counter-intuitive behavior of the pin-ended beam [1]. As shown in Fig. 6, the SDoF Shanley model has a deformable cell , i Arj,/2 connected by two rigid bars. For small ! AoV2 ~ M rotations, there exists tan
TTTUtT
n / A,
e In order to simplify the theoretical procedure, it is assumed that the pulse loading -Ob (C) on the model is very short and intensive, Fig. 6 The Shanley model which implies that the recovery response phase of the model is only determined by the maximum rotation angle. Therefore, the external loading can be simply represented by the initial condition cp=
^
(Ir
+ 5 1 (^ + | ) + 5 2 ( ^ - | ) = 0
(2)
The above equilibrium equation can be simplified and rewritten as the following ip+K(p=F (3) Sensitivity analysis (SA) is usually characterized by the gradient of the response with respect to the system parameters. Eq.(3) is differentiated with respect to cp0 to obtain the following sensitivity equation, ip^ +K(p'm =F'n -K'm
530
The general form of the response equation, Eq.(3), and the sensitivity equation, Eq.(4), can be solved simultaneously using Newmark-P method to obtain the time histories of cp and (p cp° under their respective initial conditions. Fig. 7 shows the final vibration of the Shanley model. Three critical initial angles (pcrA, cpcrB and (pcrc, are 0.08523, 0.09254 and 0.09728 rad, respectively. In order to compare response sensitivity when cpo varies in a range, the computation time is limited to 4 ms. The maximum value of the absolute response sensitivity ld(p/d(f>olmax in 4 ms is calculated for different (p0, which is shown in Fig. 8. The range of cpo is from 0.04 to 0.14 rad, and 57 simulations are carried out with a terminal time t=4ms. It has three obvious peaks, corresponding to the values of cp0 equal to 0.08556, 0.09222 and 0.09778 rad, respectively. These values agree well with the critical initial angle (pcrA, cpcrB and (pcrc. By introducing sensitivity analysis method, the range of the initial angle within which the counter-intuitive phenomenon 0.20 0.15
|
(pmax
0.10
1/
I 0.05
/
(pmin
/
s
/
V
#0.00 -0.05
r *""
I (Pc
'.
C
0.03
-0.1Q 00
0.05 0.10 0.15 Initial angle displacement ipo: rad
0.20
Fig. 7 Thefinalvibration of the Shanley model occurs can be determined.
5
0.05
0.07
0.09
0.11
0.13
0.15
Discussion and conclusions
The counter-intuitive phenomenon is very sensitive to the finite element codes, element types, meshes and time duration of the impulsive load. Both the location and the width of the window within which the counter-intuitive phenomenon occurs depend on the selected element types and finite element codes. By using sensitivity analysis method, the narrow counter-intuitive window can be found.
Reference 1. Symonds, P. S. and Yu, T. X., Counterintuitive behavior in a problem of elasticplastic beam dynamics, ASME J. ofAppl. Mech., 52 (1985) pp. 517-522. 2. Symonds, P. S. and Lee, J. Y., Anomalous and unpredictable response to short pulse loading, Recent Advances in Impact Dynamics of Engineering Structures, D. Hui and N. Jones, eds., ASME, New York, AMD 105 (1989) pp. 31-38. 3. Li, Q. M., Zhao, L. M. and Yang, G. T., Experimental results in the counter-intuitive behavior of thin clamped beams subjected to projectile impact, Int. J. Impact Engng., 11(3) (1991) pp. 341-348. 4. Li, Q. M. and Liu, Y. M., Uncertain dynamic response of a deterministic elasticplastic beam, Int. J. of Impact Engng. (in press).
COMPUTATIONAL MATERIAL TESTING OF PRE-DAMAGED METALS USING DAMAGE MECHANICS MODELS Y. TOI AND
S. HIROSE
Institute of Industrial Science, University of Tokyo, 4-6-1 Komaba, Meguro-ku, Japan (E-mail: [email protected])
Tokyo
153-8505,
The elasto-viscoplastic constitutive equation is formulated, based on the concept of continuum damage mechanics. The constitutive modeling is identified, based on static/dynamic tensile tests and fatigue tests for SM490A. The identified model is used to predict the dynamic, tensile behaviors of pre-strained SM490A. The predicted results have agreed well with the corresponding experimental results.-
1
Introduction
Computational material testing considering elasto-viscoplasticity and material damage in the constitutive equations based on continuum damage mechanics [1-3] is conducted in the present study. All of the material constants of undamaged metals concerning elasto-viscoplasticity and material damage are determined, based on the quasi-static tensile, dynamic tensile and fatigue test (S-N curves) results [4-6]. Subsequently, the whole process of the pre-damaging of specimens by the pre-strains or pre-fatigue and the following dynamic tensile tests of the pre-damaged specimens is simulated, using the identified material testing simulators. The validity of the computational prediction for the effect of the pre-strains and pre-fatigue on the dynamic tensile fracture behavior is demonstrated by the comparison of the calculated results with the test results [4-6]. The method of computational material testing as proposed in the present study, which is applicable to other sorts of materials and other forms of damage, can be effectively used for the evaluation of mechanical properties of pre-damaged materials based on a limited number of test results for undamaged materials and the lifetime prediction of structures. 2
Formulations
The following elasto-viscoplastic constitutive equation considering damage based on Ihe strain equivalence hypothesis [1] is used:
m = [».]{£') =[D.]({£}-{ev}) (1) e vp where {cr}: effective stress, {£•}: total strain, {s ): elastic strain, {s }: viscoplastic strain, [De]: isotropic, elastic stress-strain matrix. The effective stress {a} is expressed as
531
532
{a} = {a)l{\-D)
(2)
where {a}: nominal stress, D: scalar damage variable. The following viscoplastic strain rate {svp}, which is the extension of Perzyna's viscoplastic constitutive equation [7] by Murakami [8] to the damage analysis, is used:
K
vr-
A" V)
(3)
\(l-D)^+(x0-qy/3^} / fy-D) where J2: the second invariant of deviatoric stresses {crd}, s eq P equivalent viscoplastic strain, y, q,x0,p,m: material constants. The following form of unified damage evolution equation proposed by Lemaitre [l] is employed: V
(4)
D= where D=0 D>0
(5a)
when £eq < ePd v/heneeq>epd
andaeq>af
(5b)
0
(6)
R., = j ( l + v)+3(l-2v;
(7)
-7 =
2E(l-D)2
where
eq J
J
in which E: Young's modulus, v : Poisson's ratio, Rv: triaxial function, (TH : hydrostatic pressure. The von Mises equivalent stress oeq and the total strain rate s are expressed as a eq
V^=ffMV)T.
F
=
WH
(8)
The material constants S (5 = 1.0), epd and Dcr depend upon materials, temperature and types of damage [9]. Then, S is assumed as S = S0e for elasticity, S = S0P (1 + cseeg) for plasticity (9) As for spd and Dcr, the following equations are assumed: £
pd
~£pd0\(1
+ cee„),
Dcr = Dcr0 (1 + cDseq)
(10)
533
3 Numerical Results Sixteen material constants (Table 1) contained in the formulation of the preceding section have been determined so as to fit the material test results [4-6] for steel specimens under static, dynamic tension and repeated tensile loading (Fig. 1). Table 1 Material constants of SM490A
y [sec"1]
210 0.3 2300
q [MPa]
750
JC0 [MPa]
420
P
5.0
E [GPa] V
m S0
[MPa]
1 600
S0'
[MPa]
0.47
e
s Cs
1.0 0. 3625X10"3
[sec]
0.110
£
pdO
-0.3750X10" 4
C£ [sec]
0.530
Dcr0
0
C
D
Gf
s
550
%
500
2
450
399
[MPa]
' •
° •
experinent model
o o •o o o
I
400
-
E o
:z 350 10" 10s 106 107 Number of Cycles to Failure N
Fig.l Identified S-N curves for SM490A under repeated, tensile stress
534
The identified model has been applied to the prediction of dynamic tensile behaviors of pre-strained SM490A specimens [4-6]. The dynamic stress-strain behavior of pre-strained SM490A has been reasonably predicted by using the identified computational model (Fig.2).
800 .-. 700 S «
600 500
CD
^ c
300
"e
200
Y
\
experiment
o
•
100 0
............ 0
0. 1 0. 2 0. 3 Nominal Strain
0. 4
: 0. 5
Fig. 2 Predicted dynamic stress-strain curves for SM490A after pre-straining of 5%
4
Conclusion
The elasto-viscoplastic constitutive equation has been formulated, based on the concept of continuum damage mechanics. It employs the viscoplastic strain given by Perzyna and extended by Murakami to consider the effect of damage. The unified form of damage evolution equation given by Lemaitre has been extended to consider the effect of types of damage and strain rates. The constitutive modeling has been identified, based on static/dynamic tensile tests and fatigue tests for the steel SM490A. The identified model has been used to predict the dynamic, tensile behavior of pre-strained SM490A. The predicted results have agreed well with the corresponding experimental results. Other results are contained in Ref. [10].
References 1. Lemaitre, J., A Course on Damage Mechanics, Second Edition, Springer, (1996). 2. Skrzypek, J. and Ganczarski, A., Modeling of Material Damage and Failure of Structures (Theory and Applications), Springer, (1999). 3. Krajcinovic, D., Int. J. Solids Structures, 37, 267-277, (2000). 4. Itabashi, M. and Fukuda, H., Technology, Law and Insurance, 4, 37-44, (1999). 5. Itabashi, M. et al., Sino-Japanese Symp. Deformation/Fracture of Solids, 41-48, (1997). 6. Itabashi, M. and Fukuda, H., Journal of Materials Processing Technology, 117-3, (2001) 7. Perzyna, P., Arch. Mech., 32-3,403-420, (1986). 8. Murakami, S., Transactions of the JSME, 60-578, 230-235, A(1994). 9. Lemaitre, J., Comput. Methods Appl. Mech. Engrg., 51, 31-49, (1985). 10. Toi, Y. and Hirose, S., Transactions of the JSME, (2002), in print.
STUDY OF THE INFLUENCE OF THE SUSPENSION PARAMETERS ON SUSPENSION KINEMATICS CHARACTERISTIC DINGHUA ZHUMAOTAO XIACHANGGAO School of Automobiles and Transportation, Jiangsu University, Dantu Road 301 Zhenjiang City Jiangsu Province China E-mail: dineh@sina. com, en Suspension is one of the most important parts in vehicles, and independent suspension is the most widely used style in now day's vehicles. Its Design level will influence the performance of the vehicle; especially the drive ability of the vehicle and lifetime of the tire. It is important to select the parameters of the independent suspension to get the ideal suspension performance. Based on the theory of multibody kinematics, one McPherson suspension multibody kinematics model is built by using ADAMS/Car software and the influence of suspension parameters on the suspension kinematics characteristic studied. Through studying on suspension kinematics characteristic, this article researches the methods for overcoming the steering wheel wear. During constructing the model, it has been checked with the measured front wheel alignment parameters. And the model was consummate with these parameters. The model study is performed to look for the sensitive parameters of tire wear through separating the parameters. The influence of structure parameters (deviation angle of control arm, pitching angle of control arm, height of control arm inner bearing) on front wheel alignment parameters is studied. We can get the ideal model through optimizing these parameters.
1
Introduction
McPherson suspension is the most widely used suspension style in now day's vehicles. Performance of suspension is influenced by suspension parameter. Based on the theory of multibody kinematics, this paper constructs one McPherson suspension multibody kinematics model by using ADAMS/Car software and studies the influence of suspension structure parameters on the suspension kinematics characteristic. 2
ADAMS Software
ADAMS is the world's most widely used mechanical system simulation software. It enables you to produce virtual prototypes, realistically simulating the full-motion behavior of complex mechanical systems on their computers and quickly analyzing multiple design variations until an optimal design is achieved. This reduces the number of costly physical prototypes, improves design quality, and dramatically reduces product development time. ADAMS provides a full suite of modeling, analysis, and visualization capabilities. With ADAMS, we can quickly and easily create a complete, parameterized model of our mechanical system, building from scratch or importing parts geometry from your preferred CAD system. We then apply forces and motions and run this model through a battery of physically realistic 3D motion tests. ADAMS/Car is a specialized environment for vehicle modeling. It allows virtual prototypes of vehicle to be created subsystems and the virtual prototypes analyzed much like the physical prototypes.
535
536
3
Building of the Model
3.1
Analyses the Structure of the Model
The McPherson suspension is composed of wheel, control arm, steering arm and shock absorber. With ADAMS/Car, we can build the mold easily. The connection between the wheel and is revolute joint, the steering arm connect to control arm with spherical joint, control arm connect to the vehicle body with revolute joint, the steering arm connect to the shock absorb with Cylindrical joint and the shock absorber connect to the vehicle body with hook joint. Suspension Joints Styles and Characters show in Table 1. Table 1. Suspension Joints Style and Character.
Style of the joint
The number of Restrict freedom translation revolute
The number of the joint 1/2 suspension
Full suspension
revolute joint
3
2
2
4
hook joint
3
1
1
2
Cylindrical joint
2
2
1
2
Spherical joint
3
0
1
2
There are 4 parts (wheel, control arm, steering arm and shock absorber): n=4 So the 1/2 suspension restricts equation: m=2*5+1*4+1*4+1*3=21 Freedom of the 1/2 suspension: K=6*n-m=6*4-21=3 Three freedoms are up and down of the suspension; revolve of the wheel and swim of the kingpin. 3.2
Build the Model
Figure 1. Suspension Kinematics Model
Build the model with the tested suspension structure data. ADAMS/Car provides the convenience and quickly building tool, so we get the model like Fig.l. 3.3
Check the Model
Caster angle and kingpin inclination angle were decided by the suspension structure, and they can be test in the real vehicle. We test the Caster angle and kingpin inclination angle at the same time testing the suspension structure date. On the other side Caster angle and kingpin inclination angle can be finding on the model. Through compare the real vehicle test data to model data, it confirm that the model is truly and credibility.
537
4
Study of the Suspension Characteristic
The suspension carries through the parallel travel analyses using the ADAMS model. And use ADAMS/Post processing, we can get the result of suspension kinematics characteristic. Through changing the suspension parameter, we got a serious of curves of suspension character. We define the angle of plan of left control arm and XOY plan as pitching angle and the angle of plan of left control arm and YOZ plan as deviation angle. We study the influence of pitching angle and deviation angle of control arm to suspension character. F i g u r e 2 Suspension Pitching Angel and (Fig.2) Deviation Angle 4.1 Change the Pitching Angle of Control Arm As right hand rule, the pitching angle of the original model is negative. We simulate the model on three conditions: the negative pitching angle, the positive pitching angle and the zero pitching angles, and keep the other parameter unchanged. CatlerAigfe From fig.3, we know that 125 changing the pitching angle did not influence the suspension camber *—zero d +—potty? angle and kingpin inclination angle. J* But it influenced the caster angle and US y j f toe angle. Especial caster angel, it ^ increased when the pitching angle is $ r * i * " 4 s ^ . *^- * - * positive; and decreased when the *.*-*-K^*~*-32*r pitching angle is negative. So we can .^#r... use this characteristic to change the !**•# vary tendency of suspension caster OH 25JD SDJD 75.0 tOOIl -1000 -75D -SOD -25.0 angle. When the pitching angle is lAlleelTrael near zero, suspension caster angle Figure 3. Caster Angle Vary Tendency vary very little. CatttAig*
4.2 Change the Deviation Angle of Control Arm As we study the pitching angle, we simulate the model on three conditions: the negative deviation angle, the positive deviation angle and the zero deviation angles, and keep the other parameter unchanged. From simulation result (fig. 4), we know that vary deviation angle
„
*JB -*?*-*
32S
4 r "f!
( IS !•
5 t.75
m
-100D -75.0 -600
-2sn
on
2sn
SOn
750
INIeelTrael
Figure 4. Caster Angle Vary Tendency
10QJ3
538
influence the changing tendency of the caster angel. When the deviation angle varies from positive to negative, the changing tendency of caster angle will be gently. 4.3
Change the height of control arm inner bearing
As before, we simulate the model on three conditions: original height of control arm inner bearing; decrease the height of control arm inner bearing, increase the height of control arm inner bearing, and keep the other parameter unchanged. From the results (fig. 5 and fig.6), we can find that the suspension parameters vary tendency are all gently. Especially, toe angle vary tendency changed notability. It is important to the lifetime of the tire.
Toe Aigfe
100
i
9
00
~ " * ~ ^ ^
-S0
I
!
)
\
i
i Oft] t i l l
j
—
| .. !
+-i;~,:^xw
^
-1DJD • —
j -150 -ram -75J0 SOD
1
-250 00 250 WleelTiacH
600
1
ISO 1000
Figure 5. Toe Angle Vary Tendency CanberAigle
200
Conclusion The suspension parameters are influenced by suspension structure parameters. Suspension parameters will decide the performance of vehicle. Especially the toe angle vary range will influence the tire wear-life. We hope that the suspension -IDOJO -7SJ3 HSOD -2SJQ OJD 2SJD SDH 7 5 0 1000 parameters unchanged during wheel WleelTrael travel in ideal. But it is not reality, so Figure 6. Camber Angle Vary Tendency we try to reduce the vary range of the suspension parameters. So we can optimize the suspension through change the deviation angle, pitching angle and the height of control arm inner bearing. References 1. 2. 3. 4. 5.
ADAMS/View User Guiders, Version 10.1, Mechanical Dynamics, Inc., 2000. ADAMS/View Reference Guiders, Version 10.1, Mechanical Dynamics, Inc., 2000. ADAMS/Car User Guiders, Version 10.1, Mechanical Dynamics, Inc., 2000. M. magic, Automobile Dynamics, Peoples Traffic Pubic, 1992.3 (in Chinese). Yu Zhisheng, Automobile Theories, Mechanic Industry Public, 1994.5 (in Chinese)
COMPUTATIONAL STUDY OF VAPOR PRESSURE ASSISTED CRACK GROWTH AT POLYMER/CERAMIC INTERFACES C. W. CHONG, T. F. GUO AND L. CHENG Department
of Mechanical Engineering, National University of Singapore, Singapore E-mail: [email protected]
117576
Finite element analyses of vapor pressure assisted crack growth along polymer/ceramic and polymer/glass interfaces using computational cells are presented in this work. The ductile polymer film is bonded to stiff elastic substrates. Plane strain Mode I crack growth is studied under conditions of small scale yielding. Void growth and coalescence is described by the extended Gurson model [1, 2] which incorporates vapor pressure as an internal variable. Progressive crack growth along the interface is modeled by a row of Gurson cell elements. Crack growth resistance curves are computed for a range of vapor pressure loadings. A primary objective is to gain some understanding of the relation between vapor pressure levels and die macroscopic interface fracture toughness. The contribution of plastic dissipation in the film to the total work of fracture is investigated as well. Numerical results show significant reduction in both initial and steady state fracture toughness as vapor pressure levels increase. These findings provide some insights into the role played by vapor pressure in IC package failures by interface delamination and popcorn cracking [3, 4]. In addition, the effect of initial void volume fraction on the macroscopic fracture toughness is briefly discussed.
1
Problem Formulation
Figure 1(a) shows the schematic of a ductile layer sandwiched between two identical semi-infinite elastic substrates. A semi-infinite crack lies along the upper interface. The ductile layer of thickness h is assumed to be elastically isotropic with Young's modulus E and Poisson's ratio v. Its response is characterized by the J2 flow theory. The true stress logarithmic strain relation in uniaxial tension is specified by e=— E
for a < cr0, a
\V"
for a > a0,
where (j 0 is the initial yield stress and N is the strain-hardening exponent. The two elastic substrates are modeled as isotropic, linear elastic with Young's modulus Es and Poisson's ratio vs. To model void growth and coalescence along the interface, we adopt the methodology proposed by Xia and Shih [5]. Figure 1(b) and 1(c) show the finite element mesh for small-scale yielding analysis. The fracture process is confined to a planar layer of initial thickness D ahead of the crack. A row of uniformly sized voided cells, each of dimensions D x D and initial void volume fraction / 0 , is embedded ahead of the initial crack tip. The voided cells are governed by an extended Gurson constitutive law [1, 2, 6] for porous material with vapor pressure as an internal variable. This relation governs the progressive damage by hole growth and coalescence at the interface. The extended Gurson flow potential O takes the following form: ^(ae,am,a,f,P)
= (^A
+2qifCosh(3q^C7'"_+P>)-{l
539
+
(qJ)2)=0,
540
where <7e is the macroscopic effective Mises stress, am the macroscopic mean stress and <7 the current flow stress of the matrix, /denotes the current void volume fraction and p is the internal pressure introduced by Guo and Cheng [2]. qx and q2 are the adjustment factors introduced by Tvergaard [6] to improve the accuracy of the model.
Figure 1. (a) Schematic of a ductile layer joining two elastic substrates with an upper interface crack, (b) Finite element mesh of inner region, (c) Close-up view showing the row of void-containing cells embedded ahead of crack tip.
For a plane strain crack subjected to mode I loading, the Griffith energy release rate, G, is related to the mode I stress intensity factor, K, by Irwin's relation,
Under small scale yielding, the criterion for crack advance, Aa, is given by G = T, where r is the crack growth resistance. From dimensional analysis, the interface fracture toughness depends on the following non-dimensional quantities:
I^£) DCT0
= F[^£.3L
yD
E
^Nvv.f E
£s.\ a0J
Present study will focus primarily on the role of internal vapor pressure P<JOQ on the crack growth resistance.
541 (a)
20
30 Aa/D
(b)
p 0 /o 0 PoAo PoAo - Po/co
40
60
80
= = = =
0.0 0-5 1-0 1-5
100
Aa/D Figure 2. Resistance curves showing the effect of internal vapor pressure pjoo. (a) Initial porosities/o = 0.01 and (b)/ 0 = 0.05.
2
Results and Discussion
The computational results for initial porosities f0 = 0.01 and 0.05 are presented. The following set of material parameters is used: a^E = 0.01, EJE = 10, N =0.1, v= 0.4, vs = 0.3.
542
Figure 2 shows crack growth resistance curves for a range of internal vapor pressure. At the initial phase of crack growth, plastic dissipation is insignificant and the total work of fracture /"is effectively equal to the work of fracture process r0 [7]. However, plastic dissipation /p increases as growth continues and this contributes to a rising resistance curve. When crack extension reaches a critical size, steady-state propagation takes place and the corresponding fracture toughness is denoted by the asymptotic peak value 7^s. From Figure 2 it can be seen that internal pressure lowers the toughness of the interface. The work of fracture process r0 decreases (almost linearly) as po/Ob increases, for both/ 0 = 0.01 and 0.05. This shows that void-containing cells are pre-damaged and softened by the internal pressure and lesser energy is thus required for the onset of crack growth. During transient and steady state growth, /p, (and therefore r) is a nonlinear function of po/Ob- In the case of large initial porosity, f0 = 0.05, /J. decreases as the vapor pressure increases. For small initial porosity, f0 = 0.01, rP is nearly constant. That is, for small /o, an increase in internal pressure has little effect on the plastic dissipation /p; however the work of fracture process r0 is reduced. By contrast, for large initial porosity, both r0 and 7p are lowered as the internal pressure is increased. In summary, vapor pressure effects on resistance curves are significant when the interface porosity,/0, is large. At high vapor pressure, the resistance curves are nearly flat, exhibiting brittle like characteristics. By contrast, vapor pressure effects are minimal at low values of/0. As/ 0 -» 0, the resistance curves for the different values of po/ob under consideration approach the curve for po/ob = 0. The findings in this study also show that fracture toughness decreases as/ 0 increases, a result previously reported by Xia and Shih [5]. References 1. Gurson A. L., Continuum theory of ductile rupture by void nucleation and growth part I. Yield criteria and flow rules for porous ductile media. Journal of Engineering Materials and Technology 99 (1977) pp. 2-15. 2. Guo T. F. and Cheng L., Modeling vapor pressure effects on void rupture and crack growth resistance. Acta Materialia 50 (2002) pp. 3487-3500. 3. Galloway J. E. and Miles B. M., Moisture absorption and desorption predictions for plastic ball grid array packages. IEEE Transactions on Components, Packaging, and Manufacturing Technology 20 (1997) pp. 274-279. 4. Galloway J. E. and Munamarty R., Popcorning: a failure mechanism in plastic encapsulated microcircuits. IEEE Transactions on Reliability 44 (1995) pp. 362-367. 5. Xia L. and Shih C. F., Ductile crack growth - I. A numerical study using computational cells with microstructurally-based length scales. Journal of the Mechanics and Physics of Solids 43 (1995) pp. 233-259. 6. Tvergaard V., Material failure by void growth to coalescence. Advances in Applied Mechanics 27 (1990) pp. 83-151. 7. Tvergaard V. and Hutchinson J. W., Toughness of an interface along a thin ductile layer joining elastic solids. Philosophical Magazine A 70 (1994) pp. 641-656.
DISTORTION PREDICTION USING FINITE ELEMENT METHOD Y. C. TSE, P. LIU, Y. Y. WANG, C. LU Institute of High Performance Computing, 1 Science Park Road, #01-01 The Capricorn, Singapore Science Park II, Singapore 117528 E-mail: liuping @ ihpc. a-star. edu. sg, G. R. LIU National University of Singapore, 9 Engineering Drive 1, Singapore 117576 E-mail: [email protected] K. P. QUEK Sunstar Logistic Singapore Pte Ltd, 10 Science Park Road, #04-16/17 The Alpha Singapore Science Park II, Singapore 117684 E-mail: quek. kwang.peng @ unisunstar. com The entire heat treatment process, including induction heating followed by oil quenching, for sprockets made of S45C mid-carbon steel has been systematically analyzed. To study the effect due to different sprocket cutout dimensions and positions, the finite element method has been implemented and coupled thermal and structural analysis has been performed. Phase transformation and temperature dependent material properties are incorporated in the simulation. The simulated temperature distribution and distortion of sprockets have also been investigated, and the results show a good agreement with the experimental results.
1
Introduction
Induction hardening has wide applications in manufacturing industry and is particularly well suited to harden steel components. This method involves heating the component by induced current to a certain temperature at which the rate of formation of austenite is very rapid, and then quenching it to transform the austenite into martensite. The hardness thus formed is higher than that obtained by conventional methods of hardening. However, due to the high temperature gradient, the local plastic deformation and the phase transformation, severe distortion may occur during the heat treatment process. In the experimental study of heat treatment of a sprocket, it often induces distortion problems especially for designs with large cutouts. Deciding the cutout pattern designs, with acceptable distortion, at the design stage will save manufacturers a substantial amount of money and time. Detail analysis of the heat treatment process is very important. The computer simulation offers a powerful tool in designing sprockets and solving manufacturing problems as illustrated in [7-11]. The main objective of this work is to develop sprocket prototype rapidly for our client with maintaining certain accuracy. Pervious work has shown that the factors affecting the distortion of sprockets during the heat treatment process can be categorised into three types: the geometry of sprockets, the material properties and the heat treatment history. In the present study, the finite element packages, MSC/PATRAN and ABAQUS, are used in the analysis. Comparing with the experimental results, the simulated distortions of different sprocket designs after heat treatment are also discussed.
543
544
2
Simulation Model
In the induction heating process, a sprocket is heated by the induced current. This analysis should be simulated as coupled electromagnetic-thermal-structural analysis. However due to lacking of information on material properties, we simplify the electromagnetic-thermal transmission by applying a thermal boundary condition, heat flux. The proper value of the heat flux was obtained using trial fitting procedure. The sprocket is then immersed into an oil tank immediately, and convection has been simulated in the quenching process. The thickness of a sprocket is small compared with the other dimensions so that it is a plane stress problem. The governing equation for the heat transfer in cylindrical coordinate as well as the boundary and initial conditions are given as 1 3 (. 3T^| Id fkaT^I _3T q = h(Ts-TM) and T(r,0,t)lt=o=Tinjtial (2 & 3) The distortion is affected by the following critical factors, which are implemented into the finite element model. 1. The temperature dependent material properties, both thermal and mechanical, must be considered in the model. The available material data are given in [1-6] expect for the mechanical properties at high temperature of 600°C-1400°C. Thus a trial fitting is applied to estimate the proper material properties by matching the simulated distortion and the experimental one after the heat treatment. 2. Phase transformation occurs due to significant changes in temperature in the heat treatment process. In the present study, only Austenite and Martensite are considered and the local temperature determines the phases of the material. 3. The elastic-plastic constitutive relation has to be applied in the model to account for the local plasticity and is given as a/E £ =
(a y /E)(a/a y )"
a
o>oy
(4)
The analysis employs the Von-Mises yield criterion and isotropic hardening. As discussed above, there are two parameters that are required trial fitting, namely the heat flux and the mechanical material properties at high temperature. There are 12 different sprocket designs, in which the experimental results are provided, consisting of different sprocket sizes and cutout locations, which four of them are shown in figure 1.
a.) Nol (42 teeth)
b.) No2 (43 teeth) c.) No3 (45 teeth) Figure 1. Finite element models for 4 different designs
d.) No4 (47 teeth)
545
3
Simulation Results and Discussions
Figure 2 shows a comparison of temperature distribution between the experiment and simulation, in which the fitted heat flux is applied, for the model No.l after induction heating stage. The results are in good agreement that higher temperature is concentrated at the teeth and teeth bottom.
a.) experimental results b.) simulation results Figure 2. Comparison between experimental and simulation thermal results. The fitting procedure can then be Table 1 Comparison of experimental and proceeded to estimate the mechanical material simulation runout results properties at high temperature. The distortion is Model No. Teeth % error defined as the difference between the maximum No.l +8.0 and minimum teeth bottom runout, measured No.5 -19.0 from the sprocket centre. The material selection No.6 42 +16.0 No.7 -8.0 is based on the 42 teeth models, while other No.8 -17.0 models are used to verify the estimated material No.2 +11.0 properties. It has been shown in Table 1 that the No.9 43 -19.0 simulated teeth bottom runout with applying the No. 10 -19.0 fitted material properties can achieve an No. 11 +9.0 No.3 45 +9.0 accuracy of ±20% comparing with experimental No. 12 -3.0 runout results. 47 -13.0 1 No.4 The whole sprocket expands in the induction heating process, and the expansion of sprocket teeth located above the cutout is larger. Conversely, shrinkage occurs in Phase transformation quenching. However the contraction of the curve teeth located above the cutout is smaller. This is due to the deformation induced by local plasticity occurred at the sprocket teeth above Heat-treated the cutout during the heating process. Plastic deformation causes permanent dimensional Non heat-treated changes, which counteracts the shrinkage created during the quenching process. The maximum stress after the heat treatment occurs Cutout at the teeth bottoms which are not above the Figure 3. Sprocket Model cutouts. The distortion of sprocket depends on the cutout patterns and their positions. As identified from the experiments, the cutout length
546
and the cutout position (Figure 3) are the critical factors in affecting the distortions. The analysis is based on 43 teeth sprockets. Figure 4 shows as the cutout length increases, more heat is accumulated above each cutout which leads to more distortions. However when the cutout length is large enough, it restricts heat from conducting to the core rapidly. Thus the temperature at the teeth bottom becomes more uniformly distributed and distortions start to reduce. In the case of cutout position, the teeth bottom runout decreases significantly as the cutout position is increased. The phenomenon can also be explained by heat accumulations. Based on the present simulation results, to achieve small distortion, cutout length around 60mm and small cutout position should be avoided. 0.24
0.14
/^\
0.20
|
s i s
Runout /mm
0.12
0.04
"3 0.12 o
1006
/
\
0.04
0.02
20
*
0.16
40
60
Cutout Length (CL)
80
100
•
^^^•.
0.00
12
16
20
24
28
• 32
36
40
44
Cutout Position (CP)
Figure 4. Simulation results for models varying with different CL and CP
4
Conclusion
The plane stress model for coupled thermal-mechanical simulation is developed and the electromagnetic simulation is simplified by applying heat flux, and the error for the teeth bottom runout is within ±20%. The geometry of the sprockets has a considerable effect in the distortion. The cutout sizes and its locations affect the transient temperature distribution in induction heating and the oil quenching processes, and therefore, causing the distortion in sprockets. Cutout lengths and cutout positions are important parameters in determining sprocket distortions.
References 1. Bauccio, M.L. ASM Metals Reference Book, ASM, 1993. 2. Fletcher, A.J. Thermal Stress and Strain Generation in Heat Treatment, London and New York, 1989. 3. Harvey, P.D. Engineering Properties of Steel, ASM, 1982. 4. Prabhudev, K.H. Handbook of Heat Treatment of Steels, New Delhi, Tata McGrawHill, 1988. 5. Gur, C.H. - Tekkaya, A.E. - Ozturk, T. Proceedings of the Second International Conference on Quenching and the Control of Distortion, Cleveland, Ohio, 1996, p. 305. 6. Henriksen, M. - Larson, D.B. - Van Tyne, C.J. Proceedings of the First International Conference on Quenching & Control of Distortion, Chicago, Illinois, USA, 1992, p. 213. 7. Petrus, G.J. - Krauss, T.M. - Ferguson, B.L. Proceedings of The First International Conference on Quenching and Control of Distortion, Chicago, Illinois, USA, 1992, p. 283.
547
8. Tszeng, T.C. - Wu, W.T. - Semiatin, L. Proceedings of the Second International Conference on Quenching and the Control of Distortion, Cleveland, Ohio, 1996, p. 321. 9. Fuhrmann, J.D. - Homberg, - Uhle, M. The Int. J. Computation and Mathematics in Electrical and Electronic Engineering, 18(3), 1999, p. 482. 10. Wang, K.F. - Shandrasekar, S. - Yang, H.T.V. J. Materials Engineering and Performance, 4(4), 1995, p. 460. 11. Zgraja, J. - Pantelyat, M.G. Int. J. Applied Electromagnetics and Mechanics, 10(4), 1999, p. 303.
INTERFACE PRESSURE DISTRIBUTION IN AUTOMOTIVE DRUM BRAKE A. TOMA, M. TAKLA AND A. SUBIC Department of Mechanical & Manufacturing Engineering, RMIT PO Box 71 Bundoora, Vic 3083, Australia
University
J. ZHAO PBR International,
Melbourne,
Australia
In the conceptual design stages of automotive dram brakes, it is necessary to estimate the performance and durability of the brake system, which are significantly affected by the interface pressure distribution between the drum and lining surfaces as well as the temperature changes in the brake system. Conventional (theoretical) analysis of the drum brake has to ignore the geometric distortions and thermal stresses generated during the application of brakes. Finite element analysis (FEA) techniques were adopted to overcome this limitation. This enabled the calculation of the brake output to take into account the three-dimensional geometric distortions. This paper presents the results of a three-dimensional analysis of a single-shoe automotive dram brake using FEA. The brake system is modeled using brick elements having both mechanical and thermal degrees of freedom, which allows for simulating the heat flow associated with brake application. Frictional finite sliding contact and coupled thermo-mechanical effects were taken into consideration. Also, both brake shoe and drum were modeled as deformable bodies, which allowed for calculating the effects of geometric distortions on the performance of the brake system. Mesh-related problems and their effects on the accuracy of the calculated contact pressure distribution are discussed. This work represents a significant step towards the innovative approach of the virtual development of a brake system.
1
Introduction
Accurate calculation of the interface pressure distribution between drum surface and lining pads leads to a better estimation of the brake output. Conventional (theoretical) analysis of the drum brake assumes both shoe and drum to maintain original geometry during brake application. Finite element analysis techniques have been adopted [1-8] to overcome these limitations. This enabled the analysis to include elastic strains of the brake system. Two-dimensional analysis of the drum brake output [1] assumes constant pressure distribution in the axial direction. Hub fitting at one end of the drum significantly raises the stiffness at that end, producing uneven distortion, and accordingly uneven pressure distribution, in the axial direction. Estimating equivalent stiffness of the shoes and drum with two-dimensional model [1] or adjusting drum stiffness to a mean value [2] would not compensate the model for pressure variations. Single shoe mechanism is not symmetric in axial dimension (Fig. 1). Accordingly three-dimensional analysis is required to enable prediction of axial pressure variation. Three-dimensional thermo-mechanical analysis yielded unrealistic fluctuation of the interface pressure distribution along the circumference of the linings pads [4],[5]. 2
Modeling Considerations
The subject of this investigation is a single-shoe leading-trailing drum brake with single actuator (Fig 1). The shoe is one unit, which constitutes a central body extended on both sides to include leading and trailing parts, where lining pads are attached. In the FEM
548
549
model, the lining pads and the shoe are integrated but have different material properties. The outer surfaces of the linings and inner surface of the drum are assumed to be initially in perfect contact. These two simplifications reduce the required CPU without having significant effect on the accuracy of the results.
Figure 1. Drum Brake Assembly
Both drum and shoe are modeled as deformable bodies. In contact interaction, the internal surface of the drum is the master and the lining pads are the slaves. Constant static friction coefficient of 0.3 is assumed. Geometry and orientation of the master surface as well as the location of the slave node with respect to master surface affect the calculation of contact pressure. The whole model is assumed to have a constant initial temperature of 50 °C. The main body of the shoe is fixed and both sides of the shoe are forced initially to expand against the drum by applying opposite forces of 1400 N to the shoe-tips. This simulates the force applied by the brake piston. Keeping the connecting body of the shoe fixed and the load applied, the drum is then rotated with an angular velocity of 55.2 rad/sec, which is equivalent to a vehicle speed of about 60 km/h. This simulates to a great extent applying the brakes to a vehicle going down hill in order to keep its speed steady. In mechanical analysis, the drum, lining pads and the shoes are meshed with eightnode linear brick element. The main body of the shoe is meshed with four-node tetrahedron element. Coupled thermo-mechanical analysis requires elements with both temperature and displacement degrees of freedom. However, elements with only displacement degrees of freedom can be used simultaneously in zones with negligible heat flow. The drum could be easily meshed with brick elements. The shoe is geometrically complex. Shoe and linings could also be meshed with brick elements, which have temperature and displacement degrees of freedom. The connecting body of the shoe is meshed with tetrahedral elements to accommodate its geometric complicities. This region is having negligible heat flow, which allows meshing with an element having only displacement degrees of freedom, thus saving CPU time. 3
Results and Discussions
Preliminarily mechanical analysisignoring heat flow, was first conducted to investigate unexplained results in literature [4, 5]. Initial static application of load produced periodical fluctuation of contact pressure along the circumference of the linings (Fig. 2).
550 This was found to be caused by using first order (flat surface) elements to define the curved contact surfaces. Rotating the drum causes pressure fluctuation areas to move along the circumference of the drum depending on the drum position. This was found to occur due to using different mesh sizes for the contact surfaces, which produced two sets of artificially prismatic surfaces with different element sizes, rotating relative to each other. That fluctuation could be eliminated (Fig. 3), by using the same mesh size for both contact surfaces and controlling the analysis to calculate the contact pressure, when the drum and lining's nodes coincide during drum rotation. Initial pressure values varied from 2.75 MPa at the ends of linings to 0.12 MPa in the centres. During drum rotation, the pressure values increased to 4.3 MPa at the ends of the leading pad and to 0.5 MPa in its centre. Pressure values decreased slightly in the trailing pad with drum rotation. Pressure values and contour shapes as well as the value of brake factor remain unchanged with continuing drum rotation.
II L.;:::;;::— Figure 2. Fluctuating Contact Pressure
Figure 3. Contact Pressure Distribution
Coupled thermo-mechanical model considers the combined effects of mechanical loads and the heat generated by friction. Initial pressure distributions are identical with the mechanical model. Initial increases in pressure due to drum rotation are identical to those of the mechanical model. With further drum rotation, the brake factor decreases slightly due to the effect of the heat generated by friction, which raises the temperature of the contacting surfaces, causing thermal expansions, basically in the drum, which in turn causes the internal drum diameter to increase, thereby reducing brake factor, which is further decreased due to uneven thermal expansion at the surface of the linings. The stress distribution in the shoe due to expanding the shoes with fixed drum, is almost identical in both leading and trailing shoes with maximum Mises stress of about 71 MPa at the points of load application of the shoes and about 28.5 MPa at the elastic joining points. The maximum stress in the drum is 7.08 MPa. Rotating the drum increases the stress in the elastic joining point of leading shoe to about 123 MPa. It also produces local temperature increases in the drum, accordingly inhomogeneous thermal expansions, resulting in gradual increase in the stresses Heat generated by friction is proportional to contact pressure areas. Consequently temperature distribution takes the same pattern as the contact pressure distribution at the lining surface. This temperature and pressure distributions are expected to cause similar pattern of wear in the contacting surfaces of the linings. Temperature distribution at the internal drum surface is more homogeneous due to rotation, which causes continuous changes in the contact pressure for whole contact area. Maximum temperature in the drum reaches about 91 °C after five seconds of braking
551 4
Conclusions
Different patterns of periodical fluctuation of contact pressure occur due to using first order element to mesh the master surface as well as meshing mating surfaces with different mesh sizes at the interface. These fluctuations could be eliminated by using same mesh size for both surfaces and controlling the analysis to calculate the contact pressure, when the drum and linings nodes coincide. Pressure distributions resulting from both mechanical and thermo-mechanical models are identical at the beginning of rotation. In the thermo-mechanical model, long period of brake application causes gradually a slight decrease in contact pressure (brake factor) as well as gradual increase in stresses and temperatures of the drum. There is also a slight increase in the stresses in shoes and linings. The pattern of the pressure distribution of the thermo-mechanical model changes with braking time due to thermal expansion at the interface. Results can be enhanced by implementing second order elements, which can define curved surfaces. Further research can also include introducing initial gap between the contact surfaces, investigating thermal effects on material behavior including friction properties, considering heat dissipation to the environment due to convection and radiation, as well as considering inertia effects. 5
Acknowledgements
The authors acknowledge the Victorian Partnership for Advanced Computing (VPAC) for financial support through expertise grant as well as access to their HPC facility. PBR International is acknowledged for providing technical data. 6
References 1. Schafer D., Estimation of the friction interface pressure distribution in automotive brakes, BBA Friction Research & Development (UK). 2. Day A. J., Harding P. R. J. and Newcomb T. P., A finite element approach to drum brake analysis, Proc. Inst. Mech. Engrs. 193 (1979) pp. 401-406. 3. Watson C , New Development in Drum Brake Analysis, SAE Technical paper 902249 (1990). 4. Hohmann C , Schiffner K., Oerter K. and Reese H., Contact analysis for drum brakes and disk brakes using ADINA, Computers & Structures 72 (1999) pp. 185-198. 5. Watson C. and Newcomb T. P., A three-dimensional finite element approach to drum brake analysis, Proc Inst Mech Engrs 204 (1990) pp. 93-101. 6. Day A. J. and Harding P. R. J., Performance variation of cam operated drum brakes, Inst. Mech. Eng., conference on Braking of Road Vehicles, C10/83, (1983) pp. 69-77. 7. Rao R., Ramasubramanian H. and Seetharamu K. N., Computer modeling of temperature distribution in brake drums for fade assessment. Proc Inst Mech Engrs 202 (1988) pp. 257-264. 8. Thuresson D., Thermo-mechanical Analysis of Friction Brakes, SAE Technical paper 01-2775 (2000). 9. Loh W. R., D., Basch R. H., Li D. and Sanders P., Dynamic Modeling of Brake Friction Coefficients. SAE Technical paper 01-2753 (2000) pp. 7-16.
A NEW HIGH PRECISION DIRECT INTEGRATION SCHEME FOR NONLINEAR ROTOR-SEAL SYSTEM
J. HUA 1 , Z. S. LIU 2 , Q. Y. XU 1 AND S. SWADDIWUDHIPONG 3 Dept. of Engineering Mechanics, Xi 'an Jiaotong University, P. R. China, e-mail: [email protected]; [email protected] 2
710049,
Computational Mechanics Division, Institute of High Performance Computing, 1 Science Park Road, #01-01 The Capricorn, Singapore, 117528 e-mail: [email protected] 3 Department of Civil Engineering, The National University of Singapore, 10 Kent Ridge Crescent, Singapore, 119260; e-mail: [email protected]
In this paper, the nonlinear mechanics model of rotor-seal system is established with Muszynsca seal forces. A new, efficient and high precision direct integration scheme is proposed based on the 2N type algorithm for the computing of exponential matrix. The proposed model and numerical integration method are employed to investigate the nonlinear phenomena in the unbalanced rotor-seal systems. To study the influence of the seal to the nonlinear characteristics of the rotor system, the bifurcation diagrams associated with various rotor speeds are presented and the course of the system shifting from the steady state to the unsteady state is analyzed. The study demonstrates that the proposed high precision direct integration method can be effectively applied to nonlinear numerical analysis of rotor-seal system. The scheme is significantly less sensitive to the size of time step compared with other existing methods. A larger time step may be used and computing time substantially reduced.
1.
Introduction
The seal characteristic is one of the most important factors affecting the performance and behavior of the rotor system. Nonlinear phenomena in the rotor-seal system can be observed. Examples are (i) the appearance of double periodic motion, periodic motion or quasi-periodic motion when the steady state of the rotor-seal system is lost, and (ii) the possible occurrence of the enhancement of the unsteady speed when the unbalance parameter is increased properly. Therefore, it is imperative to understand the nonlinear characteristics of the rotor-seal system in the design of the system. In many cases, the seal forces may damage the rotor-seal system although the forces are smaller than the fluid film forces of the bearing [1]. Steam-excited problem become more crucial with the increase of rotating speed, medium pressure and the rotor flexibility and the decrease of seal gap. Research on the mechanism of fluid-solid coupling and the control of the steam excited vibration in rotor-seal system has become one of the key issues for modern turbomachinary design. Extended three-control-volume model [5] is adopted to demonstrate the dynamic characteristic of labyrinth seals. Most research work is limited to the calculation of the linear dynamic coefficients and the evaluation of stability. However, the nonlinear nature of the seal forces has to be considered when the mechanism of the instability is explored. Subharmonic mechanism and the influence of unbalancing for single disk rotor-seal system are analyzed using Muszynsca seal force [2]. The common disadvantages of some general numerical methods to solve structural dynamic equations are their sensitivity to time step and low precision. The precise integration method proposed by Zhong [4, 7] can be used to solve conveniently structural
552
553
dynamic equation with high precision. The scheme is significantly less sensitive to the size of each time step compared to other existing methods [3]. In this paper the nonlinear dynamic characteristic of rotor-seal system is studied using Muszynska seal force model. In order to improve computing precision and save computing time, the precise integration method is adopted in the algorithm. 2.
The model of unbalanced rotor-seal system
The equations of motion of the single disk rotor seal system shown in Fig. 1 are: 0 1 , fcoscof] m 0 f*l [n ol f*l K ol u r+ r+ \ •mg\ [sincofj 0 m 0 De
(1) w L ° *.J h where m is the mass of disk, D the damping coefficient at disk, K the stiffness of shaft
W
e
e
at disk, r the value of unbalanced mass and GO the rotating speed;. The Muszynsca seal forces acting on the disk are given by [6] TCdD D 2xmf(0 0 K-mt ror 2 -2tm w D K-m T(o { f ; -TCQD K = K0(l-e2y" . D = D0(\-e2y" , n=0.5~3 4
2
(2)
(3)
2 U2
) .» 0u < 6b << ll , . (-c,, (x f+ y y) i = x0.(ld - ee)° U „ < 0.5), 0.5 ;, e = (X
) /c/c
(4)
where c is the gap of seal; K , D and m / are stiffness, damping and inertia mass, respectively; T,K ,D are nonlinear functions about xand y [3]. 1i
_y
\ \
( [ y
• 4V
777T
X J
m
1' ^
Figure 1. Rotor-seal system
3.
The precise integration method
The n-dimensional structural dynamic equations can be written as: Mx + Cx + Kx = f(f,x,x), x(f0) = x 0 , x(f0) = x„ Introducing p = Mx + Cx/2, Eq (5) becomes V = HV + r,
V(0) = V0
(5) (6)
where V={x T
pT}\
H=
A B
D G
• = {0T f T } T
A=- - M C , B=-CMC-K, G=-icM', 2 4 2 The general solutions of the homogeneous equation, V = HV is V=exp(H r)V„
(7)
D=M
(8)
(9)
554
Let x represents a time step, then V(T)=exp(H-x)V,=T-V,
(10)
m
where T = exp(H • x) = [exp(H • x/m)]
(11)
Select m - 2" , (if AT = 20, m = 1048576) and hence Af = x/m is very small. e x p ^ - A / ) ) - I + HA/ + (HA/)72 + (HA/) 3 /3!+(HA?) 4 /4! = I + T
(12)
where T, =HAf + (HAf)2[l + (HAf)/3 + (HAf)7l2]/2
(13)
To is small in magnitude. In order to avoid loss of numerical precision, the following expression is adopted in computing implementation: T = (I + To)2" =(I + Taf''x(l + Ty' (14) The nonhomogeneous term, r, in Eq. (6) is assumed linear in the time step (tk,tk+l) . \ = U\ + rl)+ri(t-tk), r.=KVt-,.'*-.).
t = tk, V=Vk
(15)
r.^rCVV,,^,) at
(16)
Then V, can be written as Vk =T[Vk_, +H-'(r, +H" 1 r 1 )]-H'[r 0 + H r, +r, x] The following recursion is adopted to improve the precision of the results
(17)
Vt = T ( r i - f M ) - V M + / t . 1 ^ ! « X A a y ) . T ( r > ) l
(18)
7=1
where A, t, are extremum and integration point of Gauss quadrature, respectively. 4.
Numerical analysis
Based on the values of parameters given in Table 1 and adopting the precise integration method, the bifurcation diagram of rotor center is shown in Fig. 2. Note that x is the Poincare mapping point and s = a>/^Kjm is the nondimentional rate of rotating speed. The jump phenomenon of the response appears at a low speed of s =0.8. Increasing the rotating speed, synchronous periodic motion is observed. When 5=1.5, the stable period motion appears with the Floquet multipliers of -0.3439 + 0.701 li, -0.3439 - 0.701 li, 0.0012+ 0.0084i, -0.0012 - 0.0084i as shown in Fig. 3. Figure 4 illustrates the double periodic motion when s = 3.11 with the Floquet multipliers of -1.04936, -0.06868 + 0.0063H, -0.06868 - 0.0063H, -0.91485 where one main Floquet multiplier passes through unit circle at point (-1, 0). There exist two isolated points on Poincare map corresponding to the period doubling bifurcation and the half frequency whirl of the rotor-seal system. As s exceeds 3.4, quasi-periodic motion appears and a close curve is observed on Poincare map as shown in Fig.5. The (1/4) and (1/5) subharmonic motions are observed at s=4.0 and 5.0 as depicted in Figs 6 and 7 respectively. As rotating speed increases continuously, the motion of the system becomes more and more complex.
n
x
b
2.0
0.3
0.5
Table 1. Parameters and values used /J m z c(m) 0.079
0.25
0.1
0.0025
r(m) 0.0002
555
5.
Conclusion
The nonlinear model of rotor-seal system is established with Muszynsca seal forces. An efficient and high precision direct integration scheme is used to investigate the nonlinear behavior of the unbalanced rotor seal-system. The bifurcation diagrams associated with various rotor speeds are obtained to study the influence of the seal to the nonlinear
Figure 2. Bifurcation diagram of rotor center
(a) Trajectory diagram of rotor center Figure 3. Periodic motion
(b) Poincare map
(a) Trajectory diagram of rotor center (b) Poincare map Figure 4. Double periodic motion
(a) Trajectory diagram of rotor center (b) Poincare map Figure 5. Quasi-periodic motion
556
(a) Trajectory diagram of rotor center (b) Poincare map Figure 6. 1/4 subharmonic motion
(a) Trajectory diagram of rotor center (b) Poincare map Figure 7. 1/5 subharmonic motion
characteristics of the rotor system and the course of the system changing from the steady to the unsteady state is analyzed. Several nonlinear motions in the system such as periodic vibration, double periodic vibration, and quasi-periodic vibration are illustrated. The study demonstrates that the proposed high precision direct integration method can be effectively applied to the nonlinear numerical analysis of rotor-seal system.
References [1] Black H. F. and Cochrane E. A., Leakage and hybrid bearing properties of serrated seals in centrifugal pumps. Proc. 6th Int. Conference on Fluid Sealing, Munich, Germany (1973) pp. 61-70. [2] Chen Y., Ding Q., and Hou S., Stability and Hopf bifurcation of nonlinear rotor seal system. J. Vibration Engineering, 10(1997) pp.368-374. [3] Hua J., Nonlinear Dynamics Stability of Rotor-Bearing Systems. Ph.D Thesis, Xi'an Jiaotong University, (2002). [4] Liu J., Shen W, and Williams F.W., A high precision direct integration scheme for structures subjected to transient dynamic loading. Computer & Structures 56 (1995) pp.113-120. [5] Marquette O. R., Childs D.W., An extended three-control-volume theory for circumferentially grooved liquid seals. ASME J. of Tribology, 118(1996) pp.276-285. [6] Muszynska A., Improvements in lightly loaded rotor/bearing and rotor/seal models. J. Vibration, Stress, and Reliability in Design 110(1988) pp. 129-136. [7] Zhong W., Precise computation for transient analysis. Computational Structural Mechanics and Applications (in Chinese) 12(1995) pp.1-6.
DELAMINATION IDENTIFICATION USING PIEZOELECTRIC FIBER REINFORCED COMPOSITE SENSORS Ping Tan and Liyong Tong School of Aerospace, Mechanical and Mechatronic Engineering University of Sydney, NSW 2006, Australia E-mail:
pingtan®'aeromech.us\d.edu.au
In this paper, a dynamic analytical model is proposed to detect a delamination embedded in a laminated composite beam bonded with piezoelectric fiber reinforced composite sensors (PFRCSs). Subsequently, a numerical study is conducted to investigate the effect of piezoelectric fiber orientation angle 6 on the first three natural frequencies, sensor charge output distribution (SCOD) and normalized sensor charge output distribution (NSCOD). The influence of delamination length and location on the SCOD is also discussed. A comparison of the first three natural frequencies between the analytical and finite element analysis models is conducted for the cases of ft=15°, 45° and 75°. It is noted that there is a good agreement between these two models.
1
Introduction
Laminated composites are widely used as structural materials due to their high specific stiffness and strength. However, these advantages are often limited by their lower interlaminar fracture toughness which make them sensitive to delamination. Hence, in recent years, delamination detection has attracted a significant attention in composite community, because of its importance in evaluating the reliability of a laminated composite structure. It is noted from our literature review that pure piezoelectric materials have been widely used as sensors/actuators to identify a delamination embedded in a laminated composite beam [1-2]. However, application of piezoelectric fiber reinforced composites in delamination detection is very limited. In this paper, a dynamic analytical model is proposed to identify the presence, size and location of a delamination embedded in a cantilever laminated composite beam. A numerical study is carried out to investigate the influence of 0 on the first three natural frequencies, SCOD and NSCOD vs x location. The effects of delamination length ld and axial delamination location Xd on the SCOD is also discussed. A comparison of the first three natural frequencies between the present analytical and finite element analysis (FEA) models is conducted, and a good agreement is noted. 2
Model development
In this investigation, we consider a laminated composite beam bonded with identical PFRCSs on both top and bottom beam surfaces (see Fig. 1). For simplicity, a single delamination embedded in the beam system with PFRCSs is considered here. For the geometry shown in Fig. 1, the beam system can be subdivided into three major span-wise regions, namely region I, II and III, respectively. Each region is considered to be made up of beam and sensor segments, e.g., for region I, it consists of the upper sensor, host beam and lower sensor segment. For simplicity, it is assumed that there is no stress transferring between the upper and lower delaminated beam segments in region II. Each segment is modelled as an Euler beam, and thus the corresponding equation of motion for each segment can be obtained based on the classical beam theory. For example, the
557
558
corresponding dynamic equations of motion for the top sensor segment in all three regions (see their FBD in Fig. 2), can be obtained by ut Tu,,bt<
u
kus u A " ^kus ku s = _ 0 (1-3) A W =1~~Z iri- ++TTkub, PsA^kus =-%^ "+ C-kus kub' Ps s kus ~~Z " " °kub, " t o " ' ~~ a - ^ -w> ~ ox longitudinal displacement ox and w is the ox transverse displacement 2 where ukus is the for the km top sensor in the region k. The axial forces Tkus and bending moments Mkus are given by
Ps^kus
T -hYt 1 OI l kus ss
dUkus
-, OX
'
M kus
lvl
-
bYst
* ,~ 12
52wfa
_ T dx
" '
in which b, Ys and ts are the width of beam system, Young's modulus and thickness for a PFRCS. Under the constant shear and peel strain assumption [3], the shear and peel stresses between the sensor and host beam can be obtained using eqs. (10-11) in the Ref. [4].
I
Region I 'Itttjltlllllltttlll
Region II
Region III
ZBBmammnm
i n i i i i A i i i i f i i i i i i r
f= PFRCSs
X
JJJJJJJJJJJJJJJJJJJ
Figure 1 A schematic for a beam system with a delamination
t m»»»>i.»»»»»>»>»»h
Delamination
Figure 2 A free-body diagrams (FBD) for the top sensor in all three regions
For the considered cantilever beam system, there are 18 applicable boundary conditions and 42 continuity conditions at the interfaces between regions I and II as well as those between regions II and III. By numerically solving the required equations of motion together with their boundary and continuity conditions, the required natural frequencies and the absolute value of strain |£5(x,ffl)| on the top and bottom sensor surfaces can be obtained. Due to that the sensor charge output (SCO) can only be measured through the electrodes on it [5], a number of electrode strips are evenly distributed along the beam length to obtain a continuous distribution of the SCO vs x location. It is worth pointing out that the width of an electrode strip should be larger than the thickness of the PFRCS.
3
Numerical study
In this investigation, a cantilever beam system with length Li,=0.3m, width fo=0.02m and host beam thickness fi,=1.9mm is considered, in which a delamination with /rf=0.05m is located atX^=0.15m. The thicknesses of PFRCS and adhesive layer are selected to be 0.4mm and 0.15mm, respectively. The required complex modulus for the host beam,
559
piezoelectric fiber and adhesive layers are 65.68(1+0.01 li), 69.2(1+0.01 li) and 2.15(l+0.011i)GPa, respectively. The piezoelectric constant en for piezoelectric fiber is chosen to be 44.37C/m2. The densities for the host beam, piezoelectric fiber and adhesive layers are selected to be 1527.38kg/m3, 7600 kg/m3 and 1600 kg/m3 [4, 6-7], respectively. The fiber volume fractions of the host beam and PFRCSs are chosen to be 0.6. Using the present model, the variation trends of first three natural frequencies with 9 can be obtained and are shown in Fig. 3, from which it is noted that an increase in 9 results in a reduction of the natural frequencies. This is reasonable because the Young's modulus of PFRCS decreases with the increase in 9. The SCODs vs x location for the 1st vibration mode are obtained and plotted in Fig. 4 for the beams with and without delamination. The abrupt axial discontinuities in Fig. 4 clearly indicate the tip of the delamination and thus the presence, size and axial location of the delamination are easily outlined. The numerical study also reveals that for the considered cases of 9 =15°, 45°, 75°, the effect of 9 on the normalized sensor charge output (NSCO) distribution for the 1st vibration mode is minor (see Fig. 5), but the influence of 9 on the SCOD is obvious (see Fig. 6). Figure 6 also illustrates that the larger value of SCO is obtained at 9 =45°. This is expected due to the fact that e31s attains to its maximum value when 9 is 45°. The numerical study reveals that the change of the SCO around the delamination tip for the case of lj=0.lm is more obvious than that of /rf=0.05m (see Fig. 7 for the case of 1st vibration mode). For the considered case of X^=0.15m and 0.21m, the influence of Xd on the change of SCO around the delamination tip is minor (see Fig. 8 for the case of 1st vibration mode). 1.20E+O0
5.00E-05
N 250 >.200 O
i^ PH _ -"—
ZJ
w
1.0OE+0O
'
3.D0E-05
eq.ll
X -H-W-i /
£
n
4.0OE-O5
w
e.ooE-oi
S i 200E-05 O O 1.00E-05 W
_v
J 6.00E-01
J " 4.00E-01
0.00E+O0
2.00E-O1
-1.00E-05
O.OOE*00
m
r^*T*"
0
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
x(m)
Figure 3 Variation trends of the first three natural frequencies with 9
5.0OE-O5 4 506-05 4.006-05
I
V
5.00E-05
I I
4.50E-05
\
4.00E-O5 3.50E-05
3.50E-05 3.006-05 2.50E-05 2.0OE-O5 1.506-05 1.006-05 S.OQE-06
aooE-oo
\ \ N \ ^ ^ ^A N SZ
t
3.00E-OS y.2.50E-05 O z.OOE-OS W I.50E-O5
5.00E-09
\ \
1 11
5.0OE-O6 O.OOE+00
M_ 0.2S
0.3
0.35
Figure 6 The SCOD for the cases of 0=15°, 45° and 75°.
1 !
4.50E-05
0.05
4.00E-05
MI
3.50E-05
\, V H^
3.0OE-OS O^2.50e-05
A1
O2.00E-05
O (O1.50E-0S
1.0OE-0S
«
Figure 5 TheNSCOD for the cases of 0=15°, 45° and 75°.
Figure 4 The SCOD for the beam with and without delamination (0=45°)
V v V.
_s
0.15
0.2
1.00E-05 500E-06
lb*.
0.25
0.3
0.35
x(m)
Figure 7 The SCOD for the cases of Lrf=0.05m and 0.1m (6^45°)
O.OM'OO
0.1
0.15.
xtm)
0.2
0.25
03
0 35
Figure 8 The SCOD for the cases of .^=0.15m and 0.21m (0=45°)
560
In order to validate the present model, three 2D plane strain FEA models are respectively developed for the cases of 0=15°, 45°, 75° using the commercial FEA software Strand7 [8]. The difference between the present analytical and FEA models ranges from 0.8% to 3.9% for the first three natural frequencies, which indicates there is a good agreement. 4
Conclusions
A dynamic analytical model is proposed to detect a delamination embedded in a cantilever laminated composite beam bonded with PFRCSs, followed by a numerical study. It is noted from the numerical study that for the case considered in this paper, the effect of 6 on the first three natural frequencies and SCOD is obvious, but that on the NSCOD is minor. The SCOD vs x location is closely related to ld and Xd. A comparison of the first three natural frequencies between the present analytical and FEA models shows a good agreement. 5
Acknowledgements
The authors are grateful to the support of Australian Research Council via a Discovery-Project Grant (Grant DP0209504). References 1. Keilers, C.H., Chang, F.-K, Identifying Delamination in Composite Beam Using Built-in Piezoelectrics: Part II-An Identification Method. Journal of Intelligent Material Systems and Structures, 6, (1995), pp.664-672. 2. Saravanos, D.A., Birman, V. and Hopkins, D.A., Detection of Delaminations in Composite Beams Using Piezoelectric sensors. NASA Technical Memorandum 106611, AIAA-94-1754, (1994). 3. Tong, L. and Steven, G.P., Analysis and Design of Structural Bonded Joints. (Dordrecht, Kluwer, 1999). 4. Tong, L., Sun, D.C. and Atluri, S.N., Sensing and Actuating Behaviours of Piezoelectric Layers with Debonding in Smart Beams. Smart Materials and Structures, 10, (2001), pp.713-723. 5. Yin L., Wang X.-M. and Shen Y.-P., Damage-monitoring in Composite Laminates by Piezoelectric Films. Computers & Structures, 59, (1996), pp.623-630. 6. Tan, P., Tong, L. and Steven, G.P., A Flexible 3D FEA Modelling Approach for Predicting the Mechanical Properties of Plain Weave Unit Cell. Proceedings of the Eleventh International Conference on Composite Materials (ICCM-11), V, (1997), pp.67-76. 7. Tan, P and Tong, L., Micro-Electromechanics Models for the Piezoelectric Fiber Reinforced Composite Materials. Composites Science and Technology, 61, (2001), pp.759-769. 8. Introduction to the strand7 Finite Element Analysis System. (G+D Computing Pty Ltd, Sydney, Australia, 1999).
A SIMPLE MODEL FOR PREDICATION OF CRACK SPACING IN CONCRETE PAVEMENTS G. CHEN and G. BAKER The University of Southern Queensland, E-mail: [email protected]
Toowoomba, and
QLD 4350,
Australia
[email protected]
This paper presents a simple model to investigate the minimum and the maximum cracking spacings in concrete pavements from the energy sense and explore the mechanism of the existence of the minimum cracking spacing. A cracking model, which is composed of two cohesive cracks and an elastic bar restrained by distributed elastic springs, is proposed to reflect the damage localization and or distribution in the concrete. By varying the length of the elastic bar of the cracking model, the tensile force on the cohesive cracks and the energy profiles are investigated. It is demonstrated that the cracking pattern varies with the length of the elastic bar (i.e., the spacing between the two possible cracks), from which the minimum and the maximum cracking spacings are obtained.
1
Introduction
Crack spacing in concrete pavements have received considerable attention for many years (Mccullough, 1983; Penev and Kawamura, 1993; Shen and Kirkner, 1999). However, no satisfactory explanation has been put forward as to why the minimum spacing exists. Shen and Kirkner (1999) attempted to tackle this problem through a 1-dimensional model that is very complicated. The present study presents a simple cracking model that consists of two cohesive cracks and an elastic bar restrained by distributed elastic springs. The force acting on the cohesive cracks and the energy profiles are investigated. The main objective is to establish the relationship between the energy variation and the crack patterns and to demonstrate that the energy minimization governs the cracking patterns. 2
A Cracking Model
A pavement can be represented by a series of sub-structures as shown in Fig. 1(a), which consists of a cohesive crack and an elastic bar. The elastic bar represents the un-cracked concrete. It is assumed that all damage within a certain distance is localized into a cohesive crack. The movement of the concrete is restrained by friction of the subgrade, which is modelled by distributed springs. In order to investigate the minimum and maximum cracking spacing, consider a cracking model as shown in Fig. 1(b). We are concerned with how the length of the elastic bar influences the cracking patterns. The equilibrium condition of the elastic bar is Ada /dx + x - 0 and the shear force is assumed x = -ku, with k being the stiffness of the
561
562
distributed springs and u the displacement of the elastic bar. The stress, a , is related to the strain, s , by Young's modulus E, i.e., • •
dtl
a = E(z - s"") = E(
e"") , where s"" is the initial strain caused either dx by shrinkage or temperature changes. By substituting a and T into the equilibrium condition, we obtain " -a2u = 0 with a 2 = Its AE dx' a(X u 1 _[-/,,«(*-*,)_ -aix-x^s + ( eMx^-x) solution is with u(x)=[(e -^ -e^^yuz - - -e• " ^ " ^ m / ^ , / = x 2 - x, and ¥ = e a / - e al. From which the total energy stored in the elastic bar and the distributed springs is obtained: Ee=
("—(Acye + ku2)dx =
[ O M 2 2 -4UXU2
+OWJ2]
(1)
where O = e a / + e~ a/ and ^4 is the cross-sectional area. The forces acting at the two ends of the elastic bar are: AEa 1 (2) F, =^^[®u2-2ul]~AEei *1 = [ 2 « 2 - O M , ] - ^ B E
m.suh-W-d
sub-1 sub-2
Q,
«i
Elastic W
ft
(a)
,„«"&
/,
"a
fesr--¥
:|M|aaaaaa^a|||^|||a^^
00
. 0 +j3f#
H~£
>
Fig, I A ntodelpsvemerat, (a) of a sexi«s sub-structures; (b) A cracking model, (c) the elastic bar„(d) A cohesive crack
Secondly, consider the cracks. The constitutive relation of the cohesive crack is shown in Fig. 1(d). The energy stored in the cohesive crack is:
Ec=\ {
A(ft-^wcra 2wc ^1Aftwc
)wc
if
w'
<
w„
(3) if
wc
>
w„
where ft is the tensile stress, wcra the crack opening, and wc the critical crack opening. From the equilibrium conditions between the two cracks and the two ends of the elastic bar, we obtain, „ 2Xw2 w, = M0 + Hx —
•OXM„
u,-H-
OXM3-2XM!-Z
(4)
563
where X = Eawc,
Z^xV{f
Y = O£ocwc -*¥ft,
functions Hl=H(wcral)
cra2^ craz
and H2 = H(w
)
+Ezini)wc.
Heaviside
indicate cracks closed (=0)
or open (=1), and the crack openings w =ul-u0 and w =u3 From (4) we obtain: 1 [Y(HpX-Y)u0+m]X(H2OX-Y)u3+HlZ(Y-2H2X)\ "J u}] AHXH2X2-Y2 IH^H^X-Y)^ 3
+Y(H2®X-Y)u3 +H2Z(2HlX-YJ]
(5)
Minimum And The Maximum Cracking Spacings
Consider shrinkage cracking, i.e., let w 0 = 0 and M 3 = 0 in (5). When E"" reaches the critical value, ec™ = - / , IE, the forces Fx and F2 reach Aft. If e™ increases further, both cracks have the opportunity to initiate. There are three Length. possibilities: (a) the damage localizes ?0E-H)5 3 5 7 9 into the first crack; (b) the damage Pig, 2 The force on the first wrack distributes on the two cracks; (c) the damage localizes into the second crack. Due to the symmetry, we only consider the first two cases. Let s"" = 1.03ec". Assume that the first crack always opens, (i.e., Hl = 1). We calculate the force acting on the second crack, F2, for both the case H2=0 and the case H2=\ with a variety of the length of the elastic bar. The material properties are: E = 24000MPa , / , = 2.4MPa, uc = 180um, k = 8 ° % and A = 1. The results are plotted in Fig. 2. Fig. 2 shows that, for the localized solution the force, F2, rises up to Aft at / = 6.8m . Thus, when / > 6.8m , the next crack will always initiate, i.e., no localized solution exists. When / < 6.8m, the forces for both the localized and distributed solutions are lesser than Af,, i.e., both solutions are possible. It is the energy minimization principle that governs which solution the cracking model should follow. With e™ being fixed, by varying ux and u2, the corresponding energy is calculated by summing (1) and (3). The energy profiles are illustrated in Fig. 3. For the case / < 6.8m, only the localized solutions correspond to the minima of the energy surface. This means /min = 6.8m is the minimum crack spacing. For the
564
case 1 > 6.8m , the energy surface has only one minimum that corresponds to the distributed solution, i.e., only the distributed solution is possible in this instance. However, the crack spacing will not be greater than 2/ min . Otherwise, we could insert a cohesive crack in the middle and it would initiate, as the force acting on it would surpasses the critical value Aft. Hence the minimum and the maximum cracking spacings are /min = 6.8m and ^ax^^min res P e ctively.
Fig, 3 the energy ptofiles s^al 1=3 m. (b) 1=6KI, (C) J=8m
4
Conclusions
A cracking model has been presented, through which the minimum and the maximum cracking spacings have been investigated. The forces acting on the cohesive crack and the energy profiles have demonstrated that the practical crack spacing will fall between the minimum and maximum spacing. In the circumstances that both the localized and distributed solutions are possible, it is the energy profile that governs which solution the cracking model should follow. 5
References
1. Mccullough, B.F. (1983). Criteria for the design, construction, and maintenance of continuously reinforced concrete pavement. Australian Road Research, 13,79-99 2. Penev, D. and Kawamura, M. (1993). Estimation of the spacing and the width of cracks caused by shrinkage in the cement-treated slab under restraint. Cement and Concrete Research, 23, 925-932 3. Shen, W. and Kirkner, D.J. (1999). Distributed shrinking cracking of AC pavement with frictional constraint. Jnl Eng. Mech., 125, 554-560
HELLINGER-REISSNER MIXED FORMULATION FOR THE NONLINEAR FRAME ELEMENT WITH LATERAL DEFORMABLE SUPPORTS Suchart Limkatanyu Lecturer, Dept. of Civil Engineering, Faculty of Engineering, Prince of Songkla University, Songkhla, Thailand, 90110, tel: 66-074-287129, lsuchartfajratree.psu.ac.th ABSTRACT This paper presents the theory and applications of the Hellinger-Reissner mixed formulation for the nonlinear frame element with lateral deformable supports. The governing differential equations of the problem (strong form) are derived first. Then, the Hellinger-Reissner mixed frame element (weak form) is formulated to solve for the numerical solution of the problem. Tonti' s diagrams are employed to conveniently represent the equations governing both strong and weak forms of the problem. Finally, a numerical example is used to show that the Hellinger-Reissner mixed element is much more accurate than the classical displacement-based element. The nonlinear frame model proposed in this paper has practical applications in modelling the soil-pile structural system, geosynthetics/fiber-glass reinforcement of foundation soils, beam on deformable foundations, etc. KEYWORDS Finite Elements, Nonlinear Analysis, Mixed Formulation, Soil-Structure Interaction, Frame Models, Winkler Foundation Model INTRODUCTION The problem of the soil-structure interaction is often modeled and solved as a beam (structure) on one-dimensional springs (soils). Winkler (1) is the first to propose the socalled 'Winkler foundation model" to study the problem of a beam on elastic foundations. It is also noted that the beam in the Winkler foundation model is based on the EulerBerhoulli beam theory widely used in structural analysis. It is the main focus of this paper to develop the general theoretical framework of the Hellinger-Reissner (H-R) mixed formulation of the nonlinear frame element with lateral deformable supports. This nonlinear frame element can be used as the numerical tool to study the problem of soilstructure interactions. The derivation of the governing differential equations (strong form) of the nonlinear frame element with lateral deformable supports is presented first. The HR mixed element formulation is presented next and form the core of this paper. Tonti' s diagrams are used to concisely represent the equations governing both strong and weak forms of the problem. Finally, the numerical example is used to show that the H-R mixed element is much more accurate than the classical displacement-based element.
565
566
DIFFERENTIAL EQUATIONS OF FRAME ELEMENT WITH LATERAL DEFORMABLE SUPPOTS (STRONG FORM) Equilibrium soil
V(x).
Ds (x)
1 i 1 1 1 1 11 beam
M(x) N(x)
dx
M(x) + dM(x) —*-N(x) + dN(x) V(x) + dV(x)
Figure 1 An Infinitesimal Segment of Frame Element with Lateral Deformable Supports
The free body diagram of an infinitesimal segment dx of frame element with lateral deformable supports is shown in Figure 1. Based on the small deformation assumption, axial, vertical, and moment equilibrium conditions are considered in the undeformed configuration. This work follows the Euler-Bernoulli beam theory, thus the shear deformations are neglected. The shear force V(x) is eliminated by combining the vertical and moment equilibrium equations. The resulting equilibrium equations can be grouped in the matrix form as follows: "d 0 dTD(x) + dTsDs (x) = 0; where d = dx d,=[0 lj (1) 0 dx2 where D(x) = | Af(x) M(x)J is the element section force and Ds{x) is the lateral soil force. Compatibility The element section deformation vector conjugate of D(x) is d(x) = | f ( x )
K"(x)J ,
where ^(x) is the axial strain at reference axis and K (X) is the bending curvature. The following displacements are defined at the element level: u(x) = | «(x)
v(x)J , where
w(x) and v(x) are the axial and transverse displacements, respectively. Based on the small deformation assumption, the element deformations are related to the element displacements through the following compatibility relations: e{x) = du(x)/dxand JC(X) = d1v{x)ldx1,
can be written in the following matrix form:
d(x) = du(x) The lateral soil deformation ds (x) is determined by the following matrix relation. ds(x) = dsu(x) Force-Deformation Relations
(2) (3)
The nonlinear nature of the proposed element derives entirely from the nonlinear relation between the section forces D(x), Ds(x) and the section deformations d(x), ds(x). In the proposed formulation, the fiber-section model is used to derive the section constitutive law D = D(d). The lateral-soil constitutive law is expressed in the form of Ds = Ds (ds).
567
The equilibrium, compatibility, and constitutive equations for the frame element with lateral deformable supports presented above are conveniently represented in the classical Tonti's diagram of Figure 2. p(*) = o
„(*)
compatibility
equilibrium
d = 3u "s - Us
dTD + dTsDs = 0 u
| section constitutive law
d(x),ds(x)
D — D (d) H(X),DS{X)
soil constitutive law
Ds — Ds
\ds)
Figure 2 Tonti's Diagram for Frame Element with Lateral Deformable Supports H E L L I N G E R - R E I S S N E R MIXED F O R M U L A T I O N O F F R A M E E L E M E N T W I T H L A T E R A L D E F O R M A B L E SUPPOTS (WEAK F O R M ) —
•(*)
• strong form
PW = O
soil com putibility equilibrium
ds = dsu
mEQ bea n compat,bil
snKB
= \SuT(dTD + dl.Ds)dx = 0 L
>y
i
T
$8D (du-d)dx
=0
L
«W i
\
w
\ section constitutive law
A = d (D )
»W."iW soil constitutive law
Ds = Ds
\ds)
Figure 3 Tonti's Diagram for Frame Element with Lateral Deformable Supports: Hellinger-Reissner Mixed Formulation
In the mixed formulation, the beam-section forces D(x) are expressed in terms of the element nodal forces through force shape functions, and the beam displacements u(x) are written as functions of the element nodal displacements via displacement shape functions. The element nodal forces and nodal displacements serve as the primary element unknowns. The equilibrium equation (Eq. 1) and the beam compatibility equation (Eq. 2) are satisfied in the integral form. On the other hand, the soil compatibility equation (Eq. 3) is satisfied in the strong form. The H-R mixed functional YlHR is defined as
nHR[u(x),n(x)-\=nEe[u(x)]+nBCE[D{x)-\
w
where YlEQ is the weak form of equilibrium and TlBCE is the weak form of the beam compatibility. According to the stationary principle, the compatible equilibrium configuration is obtained when YlHR reaches a stationary value (SYlHR =0). The mixed formulation is schematically represented in Tonti's diagram (Figure 3). The further details of the element formulation can be found in Limkatanyu (2). The H-R mixed element configuration is shown in Figure 4. The element displacement and force degrees of freedom systems are shown in Figure 4 (a) and (b), respectively. It is also noted that the orders of interpolation functions in these two systems have to be compatible each other in order to ensure the numerical stability of the mixed element.
568
a mmii'o ») Displacement
degrees
of
freedoms
R
\ «\ b) Force degrees of freedoms
Figure 4 Mixed Frame Element with Lateral Deformable Supports
NUMERICAL EXAMPLE
£,„,,= 200 GPu F""' =460 MPa ks = 0.06 Nl mm'
P.S I
IIIp0301[3 8 t K
Figure 5 Steel Beam on Deformable Foundations 5000 1 El*mant 4000 3000
4Elamant* 8 Elamanta * * • Ifi Elwnanti 32 Elamanta EWKit
^
\
4000
\
3000
£ 2000 a 2 1000
2000 1000
Midspan Displacement (mm)
- - 1 Elamant 2 Elam«nla 4 Elamanta Exiel
b) Hallingar-Raiainar Miud Elamant
Midspan Displacement (mm)
Figure 6 Convergence Study of the Displacement-Based and Mixed Elements
The performance of the displacement-based and mixed frame elements with lateral deformable supports is compared and investigated using the simply supported beam rested on deformable foundations of Figure 5. Figure 6 studies the number of elements needed to reach the converged solution for these two formulations by comparing the two loaddisplacement responses. The "exact" response is obtained with 64 displacement-based elements. The stiffness changes in the load-displacement responses are due to yielding of steel beam. Figure 6 obviously shows that the H-R mixed element is much more accurate than the displacement-based element. Only 4 H-R mixed elements are needed to obtain the exact response while 32 displacement-based elements are needed to obtain the exact response. This shows how the force shape functions play an important role in determining the element accuracy. CONCLUSIONS This paper presents the newly developed frame element with lateral deformable supports based on the Hellinger-Reissner mixed formulation. In this formulation, the equilibrium and beam-compatibility equations are satisfied in the integral form while the soilcompatibility condition is satisfied in the strong form. The converged study show that the force shape functions enhance the element performance of the mixed element when compared to the displacement-based element. REFERENCES (1). Winkler, E., Theory of Elasticity and Strength, H. Dominicus, Prague, 1867 (2). Limkatanyu, S., Reinforced Concrete Models with Bond-Interfaces for the Nonlinear Static and Dynamic Analysis of Reinforced Concrete Frame Structures (Ph.D. Dissertation), Department of Civil, Environmental, and Architectural Engineering, University of Colorado, Boulder, 2002.
ENERGY APPROACH TO NUMERICAL MODELLING OF CRACK SPACING IN REINFORCED CONCRETE G. CHEN and G. BAKER The University of Southern Queensland, Toowoomba, E-mail:
QLD 4350,
Australia
chens @ usq. edit, au and hakerp @ usq. edu. au
This paper presents a new numerical methodology for the prediction of crack spacing in reinforced concrete. It is assumed that the deformation pattern of crack spacing consumes the least energy among all kinematically admissible deformations, and the energy minimization approach is applied to predict crack spacing. To simplify the problem, a lattice model is used, in which the cracking process is represented by the softening of the concrete bar elements. The crack spacing due to tension and bending is investigated. The important result is that distinct cracks are predicted within a continuum formulation, with uncracked, unloaded material between them and the energy criterion is validated over the classical tangent stiffness equilibrium approach.
1
Introduction
Prediction of crack spacing has been studied by many researchers in both experimental and analytical methodologies (Chowdhury and Loo, 2001; Creazza and Russo, 1999). On the other hand, though a large number of numerical studies have been dedicated to cracking in reinforced concrete, however, none directly tackles crack spacing. Cracks are usually assumed to follow pre-defined propagation paths by either inserting a discrete crack or introducing imperfections (Rots and Blaauwendraad, 1989). This paper aims to present a different methodology to model crack spacing. Instead of pre-assuming the crack positions, this methodology assumes that the structure follows the deformation pattern that consumes the least energy. The energy minimization principle determines where and when the crack arises and how it propagates. To simplify the problem, a lattice model is adopted. The cracking process and fracture are simulated by strain softening and breakage of lattice members. 2
Lattice Model
In lattice-type models, the continuum is discretized into a framework of bar elements (van Mier et al., 1995). A bar element is defined by two coplanar points i and j , as shown in Fig. 1(a). The concrete bar element obeys a softening constitutive law shown in Fig. 1(b). When the strain s is greater than the elastic limit strain ee, the material will develop plastic deformation; as it reaches the ultimate strain s„, the bar breaks. For a
569
570
small deformation, the strain increment is related to the displacement increments by {aj - a,)(Auj As
- AM,.) + (bj - b,)(Avy - Av,)
„ .
(aj-a.y+Qj-b.Y
(«i.Vi)
f
w (c)
SN
ft
r I
X
8
i *
«
N /
V**
J?
Fig. 1 Bar Element (a) Bar Element, (b) Constitutive Law; (c) Calculation of the energy increment The strain increment As is decomposed into the elastic part, As e , and the plastic part, As p which are calculated as: h -As As„ = (2) As E+h " E+h where h is the hardening modulus; for softening materials, it is called a softening modulus. Eqn. (2) holds only for plastic loading. For the case s > s „ , the bar element breaks, any further loading will not increase the elastic strain, i.e., As p = As . For unloading or reloading, if the bar does not break, all the strain increment is elastic, that is, Ase = As ; while for a broken bar, it is simply assumed that Ase = 0. After knowing the elastic strain increment, As e , the stress increment is calculated by Aa = E- As e . There are three possibilities for energy calculation: During the strain increment As, the bar element concerned goes through (1) elastic deformation, (2) elastic to plastic deformation, and (3) elastic deformation, plastic deformation, and breakage. The energy consumed is shown in Fig. 1(c) by the shadowed and hatched areas for the last two possibilities. By summing the energy increments for all the elements, the total energy increment of the lattice model is obtained. Powell's conjugate method (Rao, 1996) is used to perform the energy minimization. As
=
571
3
Numerical examples
The lattice in Fig. 2 models a reinforced panel of 2.70m x 0.9m . Three reinforcing bars are represented by the heavy lines. The cross-sectional areas are assumed unity for the elements representing concrete and 0.25 for reinforcing bars. The concrete material properties are: E = 4x 104 MPa, /, = 4MPa , h = -0.05.E . A linear softening law is used. For reinforcements: E = 2x 10 5 MPa, the yield stress a y = 5 0 0 M P a , h = 0.01E. The reinforcing bars are hinged at one end and at the other end subjected to prescribed displacements that model either a uni-tension or pure bending.
The crack formations and propagations are shown in Fig. 3 for both tension and bending, in which the damage degree is indicated by the brightness (black for intact, white for broken) of the elements. Crack spacing is obviously obtained in both cases. At the beginning, the damage is distributed; with further loading, discrete cracks form and the damage localizes into several distinct cracks and the bordering concrete unloads. 4
Conclusions
A new numerical methodology for prediction of crack spacing in reinforced concrete has been proposed, based on the energy minimization principle. The numerical analyses confirm that crack spacing can be treated as a strain localization problem. Among all kinematically admissible deformations, the deformation pattern of crack spacing consumes the least energy. Both the uni-axial tension and pure bending examples have been investigated. The important result is that distinct cracks are predicted within a continuum formulation, with uncracked, unloaded material between them. Hence the energy criterion is validated over the classical tangent stiffness equilibrium approach.
572 /•^
5
fM
References
1. Chowdhury, S.H. and Loo,Y.C. (2001). A New Formula for Prediction of Crack Widths in Reinforced and Partially Prestressed Concrete Beams." Advances in Structural Engineering, 4, 101-110 2. Creazza, G. and Russo, S. (1999). A new model for predicting crack width with different percentages of reinforcement and concrete strength classes. Materials and Structures, Vol. 32, n221, 520-524 3. Rao, S.S. (1996). Engineering Optimization. Wiley: New York. 4. Rots, J.G., and Blaauwendraad, J. (1989). Crack models for concrete: discrete or smeared? Fixed, multi-directional or rotating? Heron, 34(1) 5. Van Mier, J.G.M., Schlangen, E, Vervuurt, A. (1995). "Lattice type fracture models for concrete." In Continuum Models for Materials with Micro-structure, Muhlhaus, H.B. (ed.), Wiley: Chichester, 342-377.
EFFECT OF BOLT CONNECTIONS ON DYNAMIC RESPONSE OF CYLINDRICAL SHELL STRUCTURES Q. H. CHENG , S. ZHANG ° AND Y. Y. WANG Institute of High Performance Computing, 1 Sci. ParkRd., #01-01 The Capricorn, S'pore E-mail: [email protected]
117528
No.15 Institute, China Academy of Launch-Vehicle Technology, PO Box 9200-71, Beijing E-mail: [email protected]
100076
A modeling technique of finite element method (FEM) is presented to investigate effect of bolt connections on dynamic response of structures. This method deviates from conventional practice in which bolt connections are represented by beam elements or a rigid bar elements or even more simply by a set of common nodes shared by the two connected parts. In this study, bolt connections would be modeled in detail. Interaction of the connected flanges would be considered by contact algorithms. Prestress of the bolts would also be incorporated. Normal mode results by ABAQUS code are presented to show the effect of the bolt-connection on natural frequencies of the structure.
1
Introduction
Bolt connections are widely employed in various industries. In aviation and aerospace structures, two cylindrical shell sections are usually fastened with two frame flanges by a number of bolts that are circumferentially distributed. A number of investigations of bolt connections have been reported. The selfloosening behavior of a bolt subjected to harmonic excitation was predicted [1]. The structural stiffness and strength properties of a column flange-endplate connection were studied by 3D FE model using ABAQUS [2] and ANSYS [3]. While most of the works focus on local behavior and phenomena of bolt connections, few authors paid attention to the role that bolts play in global behavior of structure assemblies. When applying FEM to analyze such structures, conventionally the bolt connections are modeled as beam elements or rigid elements (usually called MPC in commercial FEM codes) or even more simply representing a bolt by a unique node shared by the two connected parts. Because effects of preloading in bolts and interaction between connection flanges are ignored in this method, result accuracy might be a problem. However, very few papers can be found in the literature to study the feasibility and accuracy of that modeling technique. In this paper, a cylindrical shell structure is analyzed by a 3D FE model using the ABAQUS code. Bolt connections in the structure are modeled in detail. Pretension in bolts and contact interface are considered by a nonlinear analysis procedure. Effect of the connections on structural normal modes is presented. 2
Methods and Finite Element Model
Illustrated in Figure 1 is a tapered cylindrical shell structure consisting of two sections. Both sections are 3-meter long; beginning and ending diameters of Section 1 are 1.2 m and 1.0 m while they are 1.0 m and U.8 m for Section 2 respectively. Each section has two
573
574
ending frames of a thickness 8 mm, and is strengthened by 7 intermediate frames and 36 stringers evenly distributed along circumference as shown in Figure 2. The two inner ending frames serve as the flanges through which the two sections are fastened together by twelve Ml6 bolts, (see Figure 3)
Figure 1. Exploded view of a cylindrical shell structure.
Figure 2. Intermediate frames and stringers for shell sections.
Figure 3. Connections.
Dynamic response of this construction is investigated by finite element method (FEM) using the commercial code ABAQUS. Four-node shell element with reduced integration (S4R) is used to model the skin (see Figure 4). The intermediate frames and stringers are also modeled by S4R elements but the ending frames by 8-node linear brick element (C3D8) as shown in Figure 5.
Figure 4. Shell elements for skins.
Figure 5. Mesh for frames and stringers.
The FE meshes in regions around bolt connections are tuned much finer than other areas. A close observation of the mesh for one of the connections is illustrated in Figure 6. The bolts are also modeled by C3D8 elements. Note that the bolt nuts are assumed in circular shape instead of conventional hexagon in order to facilitate generating the mesh. Effect of this simplicity would be localized but insignificant for global behavior studied in this paper. There are three interfaces in this figure, i.e. two represented by the shorter thick lines between bolt nuts and flanges and one represented by the longer thick line between the two flanges. Interactions at the former two interfaces are reasonably simplified by a matching mesh between the bolt nut and the flange. Phenomenon at the
575
third interface is the main target of the investigation. The interaction is implemented with the contact algorithm. 3
Numerical Analysis and Results
Normal mode analysis of the structure is carried out using the described FE model. To incorporate the effect of preloading in bolt connections, the analysis is conducted in two steps. The first step is a non-linear static analysis to simulate the pretension of the bolts. It would develop axial forces in bolts and contact forces between the flanges. Force balance would be achieved at the three contact interfaces while other parts of the construction are in a strain-free state. Stress distributions in one flange as well as within one bolt are illustrated in Figure 7. It can be seen that stress in the flange concentrates in a small portion in the vicinity of the bolts. The total contact force is 1012.1 kN which would produce an average stress of 419.5 MPa in the bolt shank. This figure is about 60% of the ultimate strength of a high-strength bolt. The maximum Von Mises stresses in bolts and flanges are 531 MPa and 441 MPa respectively. The second step is a dynamic analysis to extract the normal modes of the structure. The strain state at the interfaces obtained from the first step is automatically incorporated in the second step. Only global vibration modes are concerned. Figure 8 shows the first four mode shapes.
Figure 6. Local view of FE mesh for one bolt connection.
Second bending mode, freq. = 256.33 Hz Figure 8. Normal mode shapes of the first four orders.
Figure 7. Stress distribution in flange and bolt.
First axial mode, freq. = 279.35 Hz
576
To examine the preloading effect of the bolt connections, two more alternative FE analyses are examined. One analysis applies only the second step with the above model. This means no pretension of bolt connections is included. For the other analysis, the mode is simplified in that the ending frames are represented by S4R shell elements. Neither bolts are modeled in detail nor interface contact is considered. The bolt connections are simply represented by beam element. A linear normal mode analysis is conducted for the modified model. Normal mode frequencies from the three analyses are summarized in Table 1 with percentage difference of the frequency result by the later two analyses from the first one. Due to structural symmetry, a pair of frequencies exists for each bending mode, but only one of them is listed. It can be seen that significant difference of frequency results is found for lower-order symmetric modes, i.e. the 1st bending and 1st axial modes. Considering the contact interface and modeling the bolts without pretension present higher frequency values. On the other hand, modeling the bolts by beam elements gives much lower results. Variation for higher-order symmetric modes (3rd and 5th bending) is attenuated but still higher than other non-symmetric modes. Table 1. Normal mode frequencies. (Hz)
Mode No. 1 2 3 4 5 6 7 8 9 10 4
Freq. with pretension 83.93 184.54 256.53 279.35 307.27 340.06 515.98 566.21 573.54 579.49
Freq. without Pretension 107.25 27.8% 185.84 0.7% 0.5% 257.88 24.3% 347.14 0.1% 307.70 5.2% 357.66 -0.6% 512.97 0.8% 570.82 -1.9% 562.58 2.7% 595.12
Freq. by Simplified Model 51.03 -39.2% 179.36 -2.8% 252.15 -1.7% 170.30 -39.0% 0.7% 309.39 324.06 -4.7% 509.41 -1.3% -4.2% 542.71 -3.0% 556.05 -6.3% 542.86
Mode Description 1st bending 1st torsion 2nd bending Is' axial 2nd torsion 3rd bending 4th bending 3rd torsion 2nd axial 5th bending
Discussion
Normal mode analysis of a cylindrical shell structure is carried out with consideration of bolt connection pretension and contact interface. It is found that bolt connections have significant effect on natural frequencies of symmetric vibration modes, even for those lower modes. Conventional modeling technique of bolt connections by beam elements is therefore questionable. Further study would be conducted to suggest a reasonable but simplified method to model bolt connections. References 1. Zadoks R. I. and Yu X., An investigation of the self-loosening behavior of bolts under transverse vibration. J. of Sound and Vib. 208 (1997) pp. 189-209. 2. Bursi O. S. and Jaspart J. P., Calibration of a finite element model for isolated bolted end-plate steel connection. J. Construct. Steel Res. 44 (1997) pp. 225-262. 3. Bahaari M. R. and Sherbourne A. N., Behavior of eight-bolt large capacity endplate connections. Computers and Structures 77 (2000) pp. 315-325.
SIMULATION OF DUCTILE FRACTURE IN TUBULAR JOINTS THROUGH A VOID NUCLEATION MODEL X. D. QIAN, Y. S. CHOO*, AND J. Y. R. LIEW Center for Offshore and Maritime Engineering, National University of Singapore, 10 Kent Ridge Crescent, Singapore 119260 E-mail: [email protected] This paper presents a numerical approach in simulating the global ductile fracture for Circular Hollow Section (CHS) joints. The void nucleation algorithm based on Gurson's model is employed in the global strength analysis. Three types of joint configurations are investigated. Comparison is made with the available test results. Due to lack of material property data required in the Gurson's model, sensitivity study is carried out on these properties to observe their effects on the tubular joint strength.
Introduction Crack initiation and propagation is one of the common failure modes in tubular joints under tensile loading. Conventional FE approach based on continuum mechanics is not able to predict the occurrence of crack which violates the material and geometry continuity. Detail FE simulation of the cracking effect requires the geometry and route of the crack to be known a prior. Alternative approaches in tackling the effect of cracking include the arbitrary strain criteria [1], continuum damage mechanics [2], smeared crack model, discrete crack model and fracture mechanics. Gurson's approach in simulating the ductile fracture consists of two parts: the material plastic flow rule and the void nucleation process. Yield criterion modified by Tvergaard [3] is defined in Eq. (1). V 0(a,av,f):
+ 2q,f cosh
3q2p - ( l + q f 2 ) = 0 3
(1)
2a„
In Eq. (1), q refers to the effective Mises stress. o y is the initial material yield stress, p stands for the hydrostatic pressure and f indicates the void volume fraction. The change in void volume is comprised of two parts: growth of existing void and nucleation of new voids, as illustrated in Eq. (2) (2)
I — I growth i I nucleation
p
W h e r e f nucleation = A
E,
E
Gy and A =
!p=exp I—*
'e -e
v
(3)
sNv2Jt
In Eq. (3), 8N refers to the mean plastic strain level at which void nucleation takes place. fN represents the void volume fraction of the void nucleating particles. sN is the standard deviation of the nucleation strain. The parameters that need to be defined for Gurson's model includes two series: the q, (1=1,2 or 3) factors and the material parameters, eN, sN and fN. The q> factors are found to simulate the best material behavior by Tvergaard [3] when they are equal to values indicated in Fig. 2(c). On the other
577
578
hand, there is little data available in the literature addressing the values of material parameters. The numerical model in the current study is generated using MSC Patran[4]. Analysis is carried out by ABAQUS [5]. Both nonlinear material and geometric properties are taken into account. A typical configuration of a tubular joint is shown in Fig. 1, together with the non-dimensional joint parameters.
d0: Chord diameter; di: Brace diameter; to: Chord wall thickness; ti: brace wall thickness; lo: chord length; 9: brace to chord angle; g: gap between the two braces; P = dydo; y = do/2to; x = Vt,,; a = 21,,/do
Figure 1 Typical configuration of a tubular joint
2
Benchmark Study
The conventional bar-necking problem has been studied to verify different void nucleation models by many researchers [6]. The FE verifications carried out in these studies were based on the 2D axi-symmetric elements (CAX8R in ABAQUS element library). In order to ensure the applicability of Gurson's model in 3D continuum elements, the bar necking problem is re-analyzed using solid elements (C3D20R in ABAQUS element library). The geometry and numerical results are shown in Fig. 2. The 2D model
- 3 D model eN=0.30 - 2D model s„=0.10, f.,=0.04
u
4.5
_ ] Xft
r Tvergaard model q, = 1.5, q, = 1.0, q, = 2.25
0.510=4R
0.0 0.0
(a) 3D quarter model
(b) 2D axi-symmetric model
0.2
0.4 S
0.6
(c) Load-deformation
Figure 2 (a) 3D quarter model; (b) 2D axi-symmetric half model; (c) Load-deformation; for the tensile bar
0.8
579 is obtained from the ABAQUS benchmark manual [5]. Hardly any difference is observed between the 2D and 3D model. 3
Tubular Joint Behavior
Three types of joints: X- T- and K-joints; are obtained from published experimental results [7, 8 9]. Since Gurson's model involves the calculation of the plastic strain, a sufficiently fine mesh is required. Three meshing schemes are adopted as shown in Fig. 3.
(a) Fine mesh
(b) Medium mesh
(c) Coarse mesh
Figure 3 (a) Fine mesh; (b) Medium mesh; and (c) Coarse mesh for the X-joint model Table 1 Geometry and strength of three types of joints studied. JOlllL
X
d0 (mm)
P
Y
X
a
407.4 407.4 298.5 217.4
1.0 0.4 0.5 0.7
26 26 7.5 25
1.0 0.8 1.0 0.8
17.5 4.9 10.2 13.9
XI X2
T K(9 = 60°)
Test(kN) 2248 397 225
FE (Lu's) kN FE/Test 2055 0.91 388 0.98 1937 210 0.93
The geometry of the three types of joints is shown in Table 1. The comparison of the joint ultimate strength is also incorporated in the same table. There are two replicated tests for each X-joint shown in Table 1. The joint strength shown in 1 is the average of the two tests. The comparison of X- (XI) and T-joint behavior is illustrated in Fig. 4. The effect of different mesh schemes is also incorporated. The FE mesh density does not seem to show a significant effect on the joint ultimate strength. However, void nucleation process simulating the ductile fracture effect is a strain controlled criterion, which is directly dependent on the mesh scheme employed. This is demonstrated in Table 2, in which the displacement levels corresponding to 15% plastic strain are compared for different mesh Table 2 Comparison of the displacement at 15% plastic strain Mesh density Fine Medium Coarse
X(mm) XI 20 22 26
X2 12 38 61
T(mm)
K(mm)
11 14 36
4 7 9
580 -X1 (test A) FE medium - X1 (test B) FE coarse -no void p=l.O y=25.5 a=l7.5 - FE fine d =407,4mm
-Test [5=0.51 y=7.46 -no void T=I.OOC=IO.I8 - FE fine d0=298.5mm - FE medium - FE coarse
16 12 8 -
40
20 5 (mm)
0
10
20 5 (mm)
\a) ^V-JUHIL ucnaviui (b) T-joint behavior (a) X-joint behavior Figure 4 (a) X- joint behavior; (b) T- joint behavior; with Gurson's model simulation schemes. Large variations in the displacement levels are observed. The effect of the meshing scheme is apparent. Load reduction is most pronounced in the fine mesh, which results in the largest strain value as shown in Fig. 4. Sensitivity Study on £N, fN and sN The accuracy of the void nucleation analysis relies on the material input. However, there is no rigorous formulation in the literature to compute the material parameters needed for Gurson's model. The material properties may be affected during the manufacturing process [10]. For this study, three values of £N(0.0, 0.10 and 0.30), two
V=0.0
• A
P=1.0y=25.5
£N=0.1 T=1.0a=17.5 f-N=0.3 do=407.4mm X1 (test A) X1 (test B)
•
A
-f N =0.04 -f N =0.10 fN=0.20 X1 (test X1 (test
120
120
**
100
/•L
n 0
10
20 5 (mm)
30
(a) Effect of £N
~.
//.. J*
20
^-—""
mJtr^
, o80 u." SL 60 40
(i=l - 0 7=25.5 1=1.0 a=17.5 d0=407.4mm A) B)
eN=0.10, sN=0.05 ! 10
,
! 20 8 (mm)
i
; i 30
(b) Effect of fN
Figure 5 (a) Effect of eN; (b) Effect of fN; on X- joint behavior
"--
40
581 values of sN(0.05 and 0.10) , and three values of fN(0.04, 0.10 and 0.20) are looked into. Arndt & Dahl [11] reported that void nucleation process initiates once yielding occurs for high strength steel, hence eN =0.0 is selected. A relative large value of eN =0.30 is also incorporated for comparison. fN is normally taken less than 0.10. The extreme large value of fN =0.20 is selected to observe the amplified effect of void volume fraction. Figure 5 illustrates the effect of the material parameters on the X-joint (XI) behavior. With a eN=0.0, the joint strength is slightly reduced compared to the other two cases once plasticity occurs. A large value of £N postpones the initiation of void nucleation. The load reduction is only observed in the case of eN=0.10. The void volume fraction of the nucleating particles plays a significant role in the joint behavior as demonstrated in Fig. 5 (b). Relatively small value of fN does not initiate a strength reduction within the prescribed displacement. On the other hand, a very early reduction in the joint strength is observed if a rather large fN is taken. The joint behavior does not show a strong dependence on the standard deviation sN. The comparison is therefore is not shown here. 4
Conclusion
Gurson's model offers an alternative way in simulating the effect of ductile fracture for the tubular joints. Load reduction due to ductile tearing is captured in the numerical analysis. The ultimate strength obtained in Gurson's approach lies around similar level compared to that of the tests. Tubular joint behavior shows a high dependency on the material properties, especially the £N and fN values. High fN results in conservative estimation in the joint strength, with a premature effect of fracture being observed. On the other hand, a large void nucleation strain postpones the effect ductile fracture. References 1. Dexter E. M. and Lee M. M. K. Static strength of axially loaded tubular K-joints. I: Behavior. Journal ofStructural Engineering. (1999) pp. 194-201. 2. Jurban J. S. and Cofer W. F. Ultimate strength analysis of structural components using the continuum damage mechanics approach. Computers & Structures. (1991) pp. 741-752. 3. Tvergaard V. Influence of voids on shear band instabilities under plane strain conditions. International Journal of Fracture. 17 (1981) pp. 389-406. 4. Qian X. D., Romeijn A. Wardenier J. and Choo Y. S. An automatic FE mesh generator for CHS joints. Proceedings of the 12'h International Offshore and Polar Engineering Conference 4 (2002) pp. 11-18. 5. ABAQUS User Manual. Version 6.2.1 Hibbitt, karlsson & Sorensen Inc. (2001) 6. Tvergaard V. and Needleman A. Analysis of the cup-cone fracture in a round tensile bar. Acta metallurgica et materialia. 32 (1984) pp. 157-169. 7. Sanders D, h, and Yura J. A. Strength of doube-tee tubular joints in tension. Offshore Technology Conference. OTC5437 (1987) pp. 139-150. 8. Zerbst U., Heerens J. and Schwalbe K. H. The fracture of a welded tubular joint - an ESIS TV 1.3 round robin on failure assessment methods part I: experimental data base and brief summary of the results. Engineering Fracture Mechanics. 69 (2002) ppl093-1110.
582
9. Wang B., Hu N., Kurobane Y., Makino Y. and Lie S.T. Damage criterion and safety assessment approach to tubular joints. Engineering Structures 22 (2000) pp. 424-434. 10. Thomason P. F. Ductile Fracture of Metals. 1990 11. Arndt J. and Dahl W. Effect of void growth and shape on the initiation of ductile failure of steels. Computational Materials Science. 9 (1997) pp. 1-6.
STRESS INTENSITY FACTORS FOR DOUBLER-PLATE REINFORCED TUBULAR JOINT SUBJECTED TO AXIAL LOADS R. JIANG, Y. S. CHOO* Department of Civil Engineering, National University of Singapore, Singapore 117576 Email: [email protected] Corresponding author Doubler-plates are used to reinforce the tubular joint in offshore structures. As these are usually fillet-welded to the chord, the weld root is one of the key areas in considering the fatigue strength of the joint. This paper reports on the results of an on-going project with the objective to improve the understanding of the weld root failure phenomena through a systematic parametric study so that proper proportioning of the doubler-plate reinforced joint may be achieved. It is found that the doubler plate size should be carefully chosen to reduce high stress intensities at the weld root.
1
Introduction
Tubular joints are widely used in offshore structures. Fatigue strength is a major concern in design of this kind of structures. Many researchers have been studying the fatigue of the tubular joints. Lee et al (2000) developed a set of equations for estimating weld toe magnification factors (Mk) for semi-elliptical cracks in T-butt joints, from multiple regression analyses of the Mk factors gained in parametric study.. Their equations have been included in the new British Standard BS 7910. Lie et al (2002) also developed their own method in evaluating the stress intensity factors for cracks existing in weld toe area. In positions of structural weakness or areas of importance, doubler plate is often used to strengthen a tubular joint. In considering the fatigue strength of doubler-plate reinforced tubular joint, the weld root area is one of the key areas where lack of penetration usually exists due to inaccessibility during the welding process. Little research results are found in the technical literature in the fatigue failure of this area. Therefore, one of the aims of our project is to better understand the weld root failure so that proper proportioning of the doubler reinforced joint may be achieved.
2 Finite Element Analysis Doubler plate reinforced tubular X-Joints subject to brace axial compression (Figure 1) or tension were studied by numerical analysis. SIF values were evaluated for the root of doubler plate to chord weld. A systematic parametric study was carried on brace diameter, doubler plate thickness and doubler plate size which were considered to have significant influences (Table 1). The virtual crack extension technique embedded in ABAQUS code was adopted to evaluate the SIF values. To verify the validity and accuracy of this method, comparisons with standard connections were carried out prior to the current study. The results for
583
584
method verifications were reported elsewhere (Choo, 2002) and the SIF method was found to provide good correlation with referenced cases. Table 1. Range for parametric study
Brace diameter / Chord diameter Doubler plate thickness / Chord thickness Doubler plate length / Brace diameter
0.25, 0.5, 0.64, 0.8 1, 1.25, 1.6 1.25, 1.5, 1.75,2,2.5,3
Figure 1. Doubler-plate reinforced tubular joint subject to brace axial compression.
Numerical analyses were performed using ABAQUS while the models were generated in PATRAN. Due to symmetry, only 1/8 of the joint was modeled. 20-node brick elements were used for the analyses and the material used was steel with elastic modulus of 205000 MPa and Poisson's ratio of 0.3. For accurate evaluation of SIF, the slit tip area was modified as shown in Figure 2. The SIF values were evaluated at the root along both transverse and longitudinal welds. It was found that under brace compression, there was separation between the doubler-plate and the chord at the weld root along the transverse weld (from crown to the doubler plate corner). The position with the highest SIF value was found to be the crown position of the joint (Figure 3). At the corner where the transverse weld meets the longitudinal weld, the opening up of the slit tip is restricted by each other and the SIF value is expected to be smaller than positions along the welds (circled in Figure 3). This can be observed from the deformation of the numerical model and agrees with previous study [3]. That means the corner point is not a critical position in considering fatigue strength. Therefore, parametric study was carried out on joint crown position on three geometric configurations that have major influences.
585
'v. crown
Figure 2. Tubular X-joint subject to axial compression (left) and tension (right), in typical 1/8 model.
i
>n -
90 ' - . .
E w
•• •
60
V\
C/3
o
30
o _ 0.0
Crown
0 0.5
•
1.0
Saddle Positions along doubler plate edge
Figure 3. Typical results for one 1/8 model subject to axial compression.
3
Results and Discussions
From the results obtained, it was observed that in the consideration for doubler plate size for a particular joint configuration, there is a maximum root SIF value at crown position. From this value, both increase and decrease in doubler plate size will lower the root SIF, as shown in Figure 4 (left). This is true within the range studied and for all the doubler plate thicknesses studied. It is also observed that increase in brace over chord diameter ratio {dxld0) results in an increase of the root SIF at the crown. As shown in Figure 4 (right), it is found that the doubler plate thickness does not have significant effect on root SIF, within the range studied.
586
Figure 4. Combined influence of brace diameter and doubler plate size, ( = 10mm •
4
Conclusion
The paper presents results from a parametric study on the stress intensity factors (SIF) of doubler plate reinforced joints. The parametric study included systematic variation of the brace diameter, doubler plate thickness and doubler plate size. From the results obtained within the parametric ranges, it is observed that for a given joint configuration there is a maximum root SIF value at crown position for a certain doubler plate size. From this reference size, both increase and decrease in the doubler plate size will lower the root SIF. It is also observed that an increase in brace over chord diameter ratio (, id0) results in an increase of the root SIF at the crown. The thickness of doubler plate is found to have marginal effect on the root SIF.
Reference 1. ABAQUS Standard manual Vols. 1 & 2, Theory Manual, Version 6.2. Hibbitt, Karlsson and Sorenson Inc (2001). 2. PCL and Customization for MSC.Patran. MSC Software Corporation (1999). 3. Y.S. Choo, R. Jiang and V. Thevendran, Stress Intensity Factors for Doubler Plate Reinforced Connections, ISOPE (2002). 4. D. Bowness, M.M.K. Lee, Prediction of Weld Toe Magnification Factors for SemiElliptical Cracks in T-butt Joints, Int. J. of Fatigue (2000). 5. S.T. Lie, S.P. Chiew and Z.W. Huang, Finite Element Model for the Fatigue Analysis of Cracked Tubular T-Joints under Complex Loads, ISOPE (2002).
VIBRATION ANALYSIS OF POROELASTIC BAR T. Z. CHEN, Z. ZONG AND K. C. HUNG* Institute of High Performance Computing, 1 Science Park Road, #01-01 the Capricorn, Singapore Science Park II, Singapore 117528 *Email: [email protected], Tel: (+65)6419-1564, Fax: (+65)6778-0522 Vibration analyses of poroelastic bar are carried out analytically and numerically in this paper. The numerical method is characterized by two steps: temporal and spatial discretization. Runge-Kutta method is used for temporal discretization and a local interpolation scheme for spatial discretization. It is a truly meshless method. First, a free vibration is computed. The phase difference between displacement and fluid pressure is about 7t/2 in time history. Then, a displacement forced vibration is simulated. The fluid flow works like a damper in the dynamics response of the bar. The numerical method is validated since its result coincides with analysis solution.
1
Introduction
Poroelasticity is a continuum theory for fluid-saturated porous medium. It was originally motivated by problems in soil and geomechanics. These problems generally concern massive structures, such as consolidation problems, seismic wave propagation, etc. This application of poroelastic theory is relatively mature. However, relatively few papers have thus far investigated the poroelastic light structures. On the other hand, the poroelastic theory has also been extensively applied in biomechanics, e.g. bone mechanics [1,2]. In this field, a lot of objects can be modeled by light structures. For example, many bones can be simplified as fluid-saturated poroelastic bar. Cederbaum et al [3] investigated the poroelastic beam and plate. But for dynamic response, their research is not enough. Poroelastic dynamics is of paramount importance for better understanding of some biological phenomena, particularly those related to impact injury, brain trauma and bone fracture. In this paper, analytical solutions of two special conditions are obtained. However, analytical solution for arbitrary conditions is difficult to work out. For these problems, numerical analyses are required. As a promising alternative to finite element method, meshless methods use only a set of scattered nodes. In [4], the authors provided a truly meshless method, local interpolation collocation method. Here, it is adopted to simulate vibration of poroelastic bar. 2
Equations for poroelastic bar
For a poroelastic bar subjected to axial load, diffusion in the longitudinal direction is viable, while the flow in the perpendicular directions can be neglected. The governing equations for the poroelastic bar can be obtained as Eq. (1) [3] in non-dimensional form within the small deflection theory and Biot's theory, with relative motion between the solid and fluid governed by Darcy's law.
587
588
d2u
df
dx2
dx
d2f
df
3* 2
2d2"
„
dt2
. d2u
(0<x
(1)
n
—r- + A =0 dt dxdt
Here, u and / are two unknown time-dependent functions, non-dimensional axial displacement and non-dimensional pressure of fluid. 7], /and A are material parameters. The boundary and initial conditions are different in different cases. The boundary conditions on axial displacement are that u is given at certain points. The diffusion boundary condition for a permeable end is that/is given at the end, while for an impermeable end is that df/dx is given as 0 at the end.
3
Simulation results
The numerical method in this paper is characterized by two steps: temporal and spatial discretization. Runge-Kutta method is used for temporal discretization and a local interpolation scheme [4] for spatial discretization. It is a truly meshless method. Two cases, one for free vibration and the other one for forced vibration, are simulated here because we can work out analytical solution of these two cases. In numerical simulation, 51 nodes are generated on the bar, and the time step is 10" . 3.1
Free vibration
In this case, the bar is fixed at both ends, and both ends are assumed as impermeable, so the boundary condition can be written as: u(0,t) = 0 H(l,f) = 0 df df
- f (0,r) = 0 - f (U) = 0 dx
(t>0)
(2)
(0<x
(3)
(0<x<\,t>0)
(4)
dx
The initial condition is a static status as: w(jc,0) = sin7ir f(x,0) = 0
u(x,0) = 0
The analytical solution of this case is: u(x,t) = R(t) sin izx y2 •• n
where
R{t) = cxePt +(czcosbt + c3sinbt)el"
While r\ = 1, 7 = 1 and X = 1, P =-8.88196885479441, a =-0.493817773147473, cl =0.01353475908632359, c2 = 0.986465240913676,
(f>0)
(5)
b = 3.27463045961263, c3 = 0.185471119476820
589
The numerical result of this case is shown in Fig. 1 and 2. It is agree with the analytical solution. The shape of displacement remains sine curve from 0 to n, while the shape of fluid pressure always is cosine curve from 0 to n. The fluid flow works like a damper. The phase difference between displacement and fluid pressure is about 7i/2 in time history.
0.0
0.2
0.4
0.6
0.8
Figure 2. Time history of u and/in free vibration
Figure 1. Shapes of u and/in free vibration
3.2
Forced displacement vibration
The initial condition of this case is set as static as K(JC,0) = 0
M(JC,0)=0
/(X,0)=0
(0<JC<1)
(6)
The bar is assumed to be impermeable at both ends, and it is fixed at one end, while the displacement of the other end is forced as 1 - coscot, so the boundary condition can be written as f-(0,f) = 0 | ^ ( 1 , 0 = 0 ox ox
(7)
(t>0)
w(0,f) = 0 w(l,f) = l-cosfi# If Ji, s2 and s3 are roots of the following equation: s +(nn) s +
'——s+
=0
(n = 1,2,3, — ,°°)
(8)
The analytical solution of this case is f(x,t) = 2Xco^(-l)n+l
cos(nnx)[
.?, (sl sin cot+ w cos ax-coe ' ) (5, -S2)(S1
-^3)(^i2
s2 (s2 sin cot + cocos cot -coe (s2-sl)(s2-si)(s2 +tt) 2 )
2
+C02)
)
s?,2(s? sin cot + co cos cot-coeS3')^ .„ + —2—^-2 —L] + A(l - cos cot) (si-sl)(s3-s2)(s3 +co )
(9)
590 N„+)
sin(n7Dc) .$](.?!+n TT X^ sina»+ a)cos£Of-oe ' )
7T ^ " ™
(*1
n=l
+
s 2 Cs 2 +«
2n_ 2
" -^2 K-5! — -^3 X'5'!2 + G>2 )
)(s2 sin cot + a cos cot - (oe 2
2
)
2
(10)
(s2-sl)(s2-s3)(s2 +co ) s3 (s3 + n2n2 )(s3 sin G» + cocos cot -coeh') ] + x(l-cosax) {s3 - sx){s3 - s2){s2 + a2)
The numerical result of this case is shown in Fig. 3 and 4 (co- 10). It is agree with the analytical solution. At begin, vibration attenuates because of fluid flow while the wave propagate from the forced end to the other end. However, with long duration, vibration at other position may larger than the forced end.
; p o A
> .
$*2
E O
0.
ta
o. 3 ° -0
V o
J \f
< D
'W\
O
t=0.2 t=0.4 t=0.6 t=0.8 t=1.0 t=1.2 t=1.4 t=1.6
•-^s^
-0
•Analytical iolutioh
— Analytic il Solution
-1 0.0
0.2
0.4
0.8
Figure 3. Shapes of u in forced vibration
4
1.0
0.0
0.2
0.4
Figure 4. Shapes of/in forced vibration
Discussion
Vibration of poroelastic bar is simulated in this paper. From the simulation, the fluid flow works like a damper to the bar vibration. The numerical results agree well with the analytical solutions. So, the numerical method is validated. Only two cases are simulated here because we can work out analytical solution of them. More cases solved by the numerical method and the essential of such vibration should be discussed later. Afterward, the research may extend to shock analysis of bone. References 1. Cowin S. C , Bone, Mechanics Handbook. CRC Press 2001 2. Cowin S. C , Survey article: Bone Poroelasticity, Journal of Biomechanics 32 (1999) pp 217-238 3. Cederbaum G., Li L. P. and Schulgasser K., Poroelastic Structures. ELSEVIER 2000 4. Chen T. Z., Zong Z. and Hung K. C , A local interpolation collocation approach to the wave equation, Comp. Meth. Appl. Mech. Engng (submitted)
FINITE ELEMENT FAILURE MODELLING OF CORRUGATED PANEL SUBJECTED TO DYNAMIC BLAST LOADING J.W. BOH Centre for Offshore and Maritime Engineering, Faculty of Engineering National University of Singapore, Singapore 117576 L.A. LOUCA Department of Civil & Environmental Engineering, Imperial College of Science, Technology and Medicine, London, SW7 2BU, U.K.
Y.S. CHOO Centre for Offshore and Maritime Engineering, Faculty of Engineering National University of Singapore, Singapore 117576
By adopting the Abaqus/Explicit finite element code, the authors have investigated the use of a force-based failure criterion and rupture strain-based criterion to assess the integrity of a corrugated panel under dynamic loading. The responses obtained are found to be able to describe the tearing of the panel with good accuracy while the computed strain distribution is marginally conservative.
1
Introduction
Corrugated panels as firewalls are commonly found in offshore installations. In the low probability event of a hydrocarbon explosion, large plastic deformation is usually allowed although extensive tearing of the panel must be prevented. One area of interest to engineers is the failure criteria of the panel. In addition, numerical finite element codes, such as Abaqus/Explicit, have been successfully carried out in the past to model high transient dynamic stress wave propagation simulations. Two numerical failure models, namely the spot weld (SW) model and rupture strain (RS) model, are adopted in this study to investigate the integrity of the corrugated panel under blast loading. Past studies [1,2] have indicated that both models are at least able to quantitatively describe the tearing of the panel with some success. This paper attempts to further calibrate the two models with available experimental results highlighting their relative strengths and weaknesses in failure modelling. 2
Experimental Setup and Observations
The 2.5 mm thick stainless steel corrugated panel is a shallow profiled AO blast wall approximately 2.5m square (Figure 1). At the time of 64.2 ms,
591
592
the firewall ruptured initially at the bottom centre (S3A) of the transverse weld as shown in Figure 1. At the end of the test, the panel had almost completely dislodged from the angle frames with the corrugations significantly flattened and the transverse angles substantially deformed. 3 Finite Element and Failure Modelling The corrugated panel was modelled using first order reduced integration shell elements with an inbuilt hourglass viscosity as shown in Figure 2. Equation of motion was solved by means of central difference method and numerical integrations through the thickness of the shell were carried out using Simpson's rule. Both geometric and material non-linearities were included in the analysis.
Figure 1: Locations of strain and displacement gauges
Figure 2: Finite element model for the corrugated panel.
The peak blast loading of 2.45 bar was assumed to be uniformly distributed on the panel. An idealized bi-linear triangular pressure pulse, with equal rise and decay time of 116 bar/s, was employed in the study. The nominal stress-strain material curve of the stainless steel panel was obtained from quasi-static uniaxial tension test. The rupture strain (RS) model was implemented in the finite element model by letting the outer elements of the panel behave as the weld
593
material. Failure is assumed at an integration point when the incremental equivalent plastic strain (^£pi ) exceeds the rupture strain (£crit),
X e ^- e crit
(1)
The spot weld (SW) failure model is essentially a force-based failure criterion. No rotational restraint was considered. Failure is assumed when 2 FS max \
FN, FNult
FS
<1.0
(2)
ult
FNmax a'nd FNuit are the maximum and ultimate tensile force respectively; FSmax and FSuit are the maximum and ultimate shear force respectively. 4 Results and Discussions A typical finite element model showing the tearing of the panel is shown in Figure 3. Results obtained from the two failure models are compared with the experimental results in Table 1, and typical nominal strain distributions are shown in Figure 4 for outer element (S3A) and Figure 5 for inner element (SIA). f l j C T g i f i ' - , " T ? " 1 " •"•'• ; • • • " * ? • ! '
•
SW
'•••
Failure Time* Location S3AYA S3BY# S1AYA S1AX# S1BY# S1BXA S2X
Figure 3: Tearing of panel using RS 8% model.
Test
63.6
RS (8%) 63.6
64.2
S3A 0.051 0.138 0.006 0.039 0.015 0.010 0.009
S3B 0.031 0.054 0.014 0.058 0.010 0.008 0.061
S3A 0.004 0.017 0.006 0.018 0.007 0.008 0.003
Table 1: Comparison of failure models with test data. ( * refers time of Is' weld failure in ms; A refers to compressive strain; * refers tensile strain )
594
61
64
Time (ms)
Figure 4: Longitudinal nominal strain for S3A
Figure 5: Longitudinal nominal strain for SI A
Both models predict closely the deformation and tearing behaviour of the panel with experimental observation, in addition of the 1st weld failure time. Both models, however, have predicted different locations of the initial failure. One possible reason is the sensitivity of the rupture strain model to the steep strain gradient due to the profile of the corrugations, on top of the fact that the spot weld model is not capable of predicting the through thickness strain variations in these regions. Initial results have also showed that forced-based failure criteria gives better strain predictions for the inner elements than the rupture strain failure model. The reverse is however true for the outer elements. 5
Acknowledgements
The authors are grateful to British Gas for the permission to publish their experimental data. References 1. Louca, L. A., Harding, J.E. and White, G., Response of Corrugated Panels to Blast Loading, Offshore Mechanics and Arctic Engineering (1996), Florence pp. 297-305. 2. Louca, L.A. and Friis, J., Modeling Failure of Welded Connections to Corrugated Panel Structures under Blast Loading, Offshore Technology Report, OTO 00088, (2000).
SIMULATION OF ACOUSTIC RADIATION AND SCATTERING USING BOUNDARY ELEMENT METHOD Institute of High Performance Computing, 1 Science Park Road #01-01 The Singapore Science Park II, Singapore 117528,Singapore
Capricorn,
Z. Y. Yan, K. C. Hung, H. Zheng yanzy @ ihpc. a-star. edu. sg Acoustic radiation and scattering in the unlimited exterior domain are numerically investigated using the composite Helmholtz integral equation. The hyper-singular numerical integral involved in the normal derivative equation of the conventional Helmholtz integral equation has been dealt with by applying a regularization formulation. The influence matrix corresponding to a composite integral operator is proved to be just the product of the two influence matrices corresponding to the two integral operators that construct the composite integral operator. Consequently, a new approach to deal with hyper-singular numerical integral is generated. To analysis the accuracy and efficiency of the new approach, several numerical examples are computed.
1 Introduction It is well known that the classical boundary element method in acoustics fails to provide a unique solution at certain characteristic frequencies [1]. To overcome the nonuniqueness problem, Burton and Miller [2] developed the composite Helmholtz integral equation (CHEE), which consists of a linear combination of the Helmholtz integral equation and its normal derivative equation. However the CHIE suffers from the main drawback of hyper-singular integral. A double surface integral method was used by Burton and Miller [2] to reduce the order of the hyper-singularity. But it was computationally inefficient to implement this method. In this paper, a high efficient approach based on this double integral method is developed to deal with the hypersingularity. 2 Theoretical Development Here the acoustic field in unbounded exterior domain is studied. To overcome the nonuniqueness problem, the composite Helmholtz integral equation (CHIE) developed by Burton and Miller [2] is employed. In operator notation [3], it can be
i / + M,+oW, U = k , + a i / + M[ represents the surface acoustic pressure, OC takes the value — ilk and k is wave number. The integral operators Lk, Mk, Nk and Mk can be expressed as
Lkfl=\ln(q)Gk(p,q)dSq
, Mk^ = \\^(q)dGk^q)dSq
Nkli = p
P I
595
(2)
596
Gk(p,q) = e~'kr/47tr,
r = \p-q\
(4)
where p and q are respectively the source point and the field point on the surface. The main drawback of the CHIE method is the numerical treatment of the hypersingular integral operator Nk. Burton and Miller [2] used the following regularization relationship to deal with the hyper-singularity. L,N0=M20~I
(5)
where L 0 , iV 0 andM 0 are integral operators identical to Lk, Nk and Mk except that the kernels of L0,NQandM0contain
G0(p,q) = \lA7tr rather than Gk. The
composite integral operator L0N0 is defined as
L0N0n(p) = JjG0(p, J J J 3 ^ 0 ^ ' f V ( 0 A \dSq
(6)
Then the following transformation was generated to remove the hypersingularity. + M20-±I
L0Nk=L0[Nk-N0]
(7)
The composite integral operators L0[Nk —N0] and MQ are double surface integrals. Therefore, It is very inefficient to numerically implement such an approach. 3 Discretization of the Integral Operators The integral operators are discretized using eight-noded, quadrilateral isoparametric surface elements. The integral operator Lk can be discretized as Lk(p = Bk{cp}
(8)
where, matrix Bk is defined as the discretized operator matrix of the integral operator Lk. B
«,m=
X
\\KiGkijdSq
(9)
">=f{j,')ASj
A new ideal is created to discretize the operator L0N0. Assuming that Viq) = \f§^mdSt
(10)
It can be discretized and expressed in the form of discretized operator matrix as {y/} = D0{n} (11) where, DQ is the discretized operator matrix of integral operator A^0. Substituting Eq. (10) into Eq. (6), we have
597
L0N0ll(p) = jJG0(p,q)yf(q)dSq
(12)
The discretization of Eq. (12) can be expressed in discretized operator matrix form as
£oM=£0{y}
(i3)
where, E0 is the discretized operator matrix of the composite integral operator L0N0 . Substituting Eq. (11) into Eq. (13), we have E0{II) = BQD0{H) (14) Because Eq. (5) is an identity formulation and fl is an arbitrary function, we have E0 = B0D0
(15)
Similarly, the composite integral operator M0 can be discretized as A^, where \ is the discretized operator matrix of integral operator M 0 . Consequently, Eq. (5) can be expressed in the form of discretized operator matrices as D0=B-\A^~I)
(16)
Now the double surface integrals in Eq. (5) have been reduced to the product of surface integrals. By applying the operation Nk =(Nk -N0) + N0, the hypersingular integral is reduced to weakly singular integrals, which can be solved using the integration scheme proposed by Lachat and Watson [4]. 4 numerical examples Several examples have been computed to validate the new approach. Because of the length limit, only one case for plane acoustic wave scattering from rigid sphere is presented here. Fig. 1 shows that half of a sphere surface is discretized using 416 elements. The dimensionless scattered acoustic pressures obtained using CHIE and HIE at r = 5a as ka — Tt are compared with the analytical solutions, see Fig. 2. Clearly, the results obtained using CHIE are unique and agree with the analytical solutions quite well. 5 Conclusions It is proved that a composite integral operator can be discretized and expressed as the product of the two discretized operator matrices corresponding to the two integral operators that construct the composite integral operator. Consequently, a high efficient new approach is developed to deal with hyper-singular numerical integral.
598
Fig. 1. Discretization for half of a
Fig. 2. The angular dependence of dimensionless scattered
sphere surface with 416 elements.
acoustic pressures at T — 5 CI as
References 1. R. D. Ciskowski and C. A. Brebbia, Boundary element methods in acoustics, Computational Mechanics Publications, Souththampton Boston, 1991. 2. A. J. Burton and G. F. Miller, "The application of integral equation methods to the numerical solution of some exterior boundary value problems," Pwc.R.Soc.London Ser. A323, 201-210, 1971. 3. I. C. Mathews, "Numerical techniques for three dimensional steady-state fluidstructure interaction," /. Acoust. Soc. Am. 79, 1317-1325, 1986. 4. J. C. Lachat and J. O. Watson, "Effective numerical treatment of boundary integral equations," Int.J.Num.Methods Eng. 10,991-1005, 1976.
NUMERICAL CHARACTERIZATION OF RC PLATE RESPONSE AND FRAGMENTATION UNDER BLAST LOADING K. X U A N D Y . L U PTRC, NTU, Singapore 639798 E-mail: [email protected] H. S. LIM Defence Science and Technology Agency,
Singapore
The risk of accidental explosion is present wherever ammunition is stored. A major harmful effect from such accident is the debris of the storage magazine, typical of which are box-type concrete structures. This paper presents part of a research programme aiming at investigating the response and break-up of concrete box structure under high explosive loading. The energy formulation and cohesive failure model is proposed. Numerical simulation is performed on representative elastic plates subjected to simulated blast loading. Characteristic responses, such as time histories of normal stress and shear stress are examined for purpose to understand the possible governing failure modes. The numerical results are used to evaluate the loading strain rate and dynamic material strength. Based on some simplified assumptions, the energy dissipates and nominal debris dimensions are estimated.
1 Introduction Experimental work on square and rectangular plates under blast loading has been conducted extensively (Nurick and Shave [5]; Olson et al. [6]). Numerical results have been published on failure of clamped, thin square, unstiffened plates (Rudrapatna et al [7]) and stiffened plates (Rudrapatna et al [8]) under blast loading. In this paper, a numerical investigation of the basic response characteristics of RC plates subjected to blasting loading normal to the plate face is carried out in order to lead to some understanding on the dominant failure modes and the underlying mechanisms. To observe the possible variation of dominant response with variation of loading parameters, in the analysis the plate is subjected to simulated blast loading with varying duration. For this purpose, some basic information on the blast loading characteristics is summarized first. Characteristic responses, such as time histories of normal stress and shear stress are examined to illustrate the possible governing failure modes and change of such modes with different explosive loading conditions. The numerical results also allow for an evaluation of the energy import and transformation, based on which an estimation of the nominal debris dimension, assuming that the dominant fragmentation is formed during the composite shock stage, can be established. The strain rate effects on the concrete behavior are taken into account.
599
600
2Blast shock wave and energy formulation for fragmentation A typical pressure-time curve for an explosive blast wave is shown in Figure 1. The negative pressure phase is not considered here, since most of the structural damage is due to positive phase (Baker [1]).
Ap„
Figure 1. Shock wave approximation by a straight line
As proposed by Brode [2], an empirical exponential form can be used to describe the explosive blast wave of the positive phase as Ap(t) = AP<1>(\-t/x)e-a"' (1) 2 in which a = 112 + Ap0 for Ap^ < 1 (kg/cm ) a = l / 2 + Ap o [l.l-(0.13 + 0.20A^(I))(^/T)] for 1 < Ap0 < 3 where, Ap(t) is the instantaneous overpressure at time t, A/70 is the peak overpressure when t is zero, x is overpressure duration. The peak overpressure and overpressure duration can be found in Henrych [3]. According to the principle of conservation of energy, the total internal energy Um absorbed by system includes work of deformation Ud, fracture energy Uc and kinetic energy [/*; hence, Uc=Um-Ud-Uk (2) The stress waves are responsible for the development of a damage zone and subsequent fragment size distribution, while the explosion gases are important in separation of crack pattern that has already been formed after the passage of the stress wave, and the subsequent throw of the fragments. It can be assumed that the formation of fragment sizes is completed upon the end of the blast shock wave. By ignoring the kinetic energy at this time, an upper bound of the energy available for fragment formation can be obtained. The cohesive fracture crack model can be expressed as:
O„Au„=0
where h=-— w
(3)
601
where a and w are the critical stress and displacement, respectively. A«n is the crack opening displacement. The area under the tensile cohesive law is the fracture energy. For concrete plate, if linear variation of cohesive stress-crack opening displacement is considered, the energy required for the formation of opening crack surface can be expressed as: Uc=\st,ds
= \l;twj™odwdl =
-twLaw
(4)
where C, is energy change per unit area from cohesive crack into opened crack. L is crack length, tw is plate thickness. Considering the strain-rate effects, the static tensile strength should be substituted by dynamic tensile strength. The maximum amount of energy that can take in the form of strain energy is limited by the dynamic strength of the material and the corresponding fracture strain. Assuming that the total effective input energy is transformed eventually into the fracture strain and crack opening energy, subsequently the total crack length has. L - ^ =
(5)
If the dimension of plate length is b and width is a and a a
n-\
a
for [L - nx (a + b)\ < a (6) *2 ' n-\ n where h and l2 are the nominal fragment length and width, n is integer of L/(a+b). /, =
j
3 Numerical investigation and estimation of fragment size Ap
Mid-span
Nol
;
No4
;
No 7 -
Support No 7 ;
Nnlll-
Figure 2. Geometry of the RC plate under investigation (not to scale)
Computer program LS-DYNA is used to perform the numerical computation. The pressure loading from the explosive charge is assumed to be triangular shock wave (Figure 1) and uniformly distributed on the surface of plate. The material properties used in the calculation are as
602
follows: elastic Young's modulus E = 20 GPa; material quasistatic tensile strength o- = 3.86MPa ; material density p = 2427.516 kgm ; Poisson's ratio v = 0.3. For comparison purpose, the total impulse is kept constant (equal to 5 MPams) for different duration shock loading, hence three shock wave loadings are produced with maximum overpressure of 20 MPa, 10 MPa and 5 Mpa, respectively, while the correspond durations are respectively 0.5 ms, 1 ms and 2 ms. This roughly represents a 100kg explosive charge at a scaled distance of 0.5-1.0. Figure 2 shows the geometry of the plate under investigation. The width of the plate is assumed to be lm. Totally about 10,000 solid elements are meshed to model the plate in a 3-D model. From the Figure 3, the maximum normal stress takes place at the top layer and maximum shear stress is near the middle of the section. Clearly, the peak normal stress is much high than the shear stress in the entire elastic response. As the concrete tensile strength is smaller than that of compression, the actual failure would happen during the shock period, at which time it can be seen that both the normal stress and the shear stress are of similar magnitude; hence, the fracture will actually result from a combined tension and shear. From the elastic analysis results, the total energy absorbed by the plate at the end of shock wave loading are calculated to be 139 KJ, 127 KJ and 111 KJ respectively. Loading 20 MPi dunlion D.5 mi — Muimumihcudrcu (U middle thicknns of support) (HI top urfice of rnppon)
Figure 3. Maximum x-normal stress and xy-shear stress for 20MPa-0.5ms 1
The average loading strain rates are found to be 19.65 s" , 10.90 s"1 and 5.84 s"1 for the three shock loading cases, respectively. According to the dynamic strength model presented in Lu and Xu [4], the ultimate dynamic tensile strengths are found to be 16.87 MPa, 14.83 MPa and 13.01 MPa, respectively. Using the equations of (2), (5), and (6), the dimensions of fragment size can be obtained. The equivalent diameters of the fragment are about 0.065 m, 0.062m and 0.062m for the three loading cases, respectively.
603
4 Discussion The energy formulation and cohesive failure model are proposed for the prediction of fracture and fragmentation of RC plate subjected to blast shock loading. Numerical simulation is performed on representative RC plate under simplified blast shock pressure. Characteristic responses, such as stress time histories are examined to illustrate failure mode and the possible changes of such characteristics with the variation of the loading duration. The order of strain rate and magnitude of stress in the plate response is evaluated. The significance of the strain rate dependence of the material strength on the fracture RC plates is discussed. The analysis procedure can be applied to predict nominal debris size of RC plate structure. References 1. Baker, W. E., Explosions in Air. (University of Texas Press, London and Austin, 1973). 2. Brode H. L., Blast wave from a spherical charge. The Physics of Fluids 2 (1959). 3. Henrych J., The Dynamics of Explosion and Its Use. (Elsevier Scientific Publishing Company, 1979). 4. Lu Y. and Xu K, Numerical characterization of RC plate response and fragmentation under blast loading. Project Report. (Protective Technology Research Center, Nanyang Technological University, 2002). 5. Nurick G. N. and Shave G. C , The deformation and tearing of thin square plates subjected impulsive loads-an experimental study. International Journal of Impact Engineering. 18 (1996) pp. 99-116. 6. Olson M. D., Nurick G. N. and Fagnan J. R., Deformation and rupture of blast loaded square plates-predictions and experiments. International Journal of Impact Engineering. 13 (1993) pp. 279-291. 7. Rudrapatna N. S., Vaziri R. and Olson M. D., Deformation and failure of blast-loaded square plates. International Journal of Impact Engineering. 22 (1999) pp. 449-467. 8. Rudrapatna N. S., Vaziri R. and Olson M. D., Deformation and failure of blast-loaded stiffened plates. International Journal of Impact Engineering. 24 (2000) pp. 457-474.
DYNAMIC ANALYSIS OF BRICK-CONCRETE STRUCTURE BY USING WILSON-9 METHOD D . M . HOU, Y. B. W A N G AND M. YIN School of Civil Engineering and Mechanics, Xi'an Jiaotong Xi'an, 710049, P. R.. China E-mail: [email protected]
University
X.Y. MA Mathematics Division, Xi'an Electrical Power Xi'an,710032, P.R. China 710032
College
In this paper, the Wilson-0 integration method is thoroughly studied. By using Wilson-8 integration method and single particle shear model, the dynamic response of brick-concrete structure is given. The simulation results of dynamic response of brick-concrete structure under sine wave load are compared with analytical results, which are available in literatures. From this study, it demonstrated that the present approach of combining Wilson-6 integration method and single particle shear model method is a useful technique in the dynamic and vibration analysis of brick-concrete structure. Furthermore, by optimizing and selecting suitable parameters of Wilson-8 method, this approach also can be used to analyze the seismic dynamic response analysis of the brick-concrete structure.In traduction
1
Introduction
There are many dynamical problems in engineering application that cannot be solved by analytical method of mathematics. Such as, building response under complex earthquake wave loading. In this case, numerical method, such as step integral, is needed. As a consequence, the problem of precision and stability emerges. In this paper the precision and stability of Wilson-0 method, which is used for an analysis building under a sine wave loading, was studied. 2 2.1
Wilson- 9 Method Hypotheses and Mathematical Model
x
The dynamical shear model of single freedom is illustrated as figure 1. m is the mass of the particle, k is the lateral stiffness of rod, xg is the horizontal acceleration of earthquake and x is the horizontal displacement of particle. The dynamical equation of particle can be expressed in Eq. 1 (no damping case). mx + kx = -mxg
1
WVyW^ —*^~ x F l 1 Shear m o d e l
« -
where x and x are horizontal displacement and acceleration of particle relative to the earth. According to the hypothesis of linear step integral method, x will change linearly between the period of time from t to t + At. Furthermore, the acceleration of particle is expressed at the moment of t + x ( 0 < T < A?)*f+r=*,+T(*,+At-x,)/Af
2
In 70's Wilson improved this method and hypothesized that x will change linearly between the period of time from t to t + 6At. Therefore, the acceleration of particle is expressed at the moment of t + T ( 0 < T < GAt), 604
605 x
,+r = x,+
T
(^+eA/ -x,)/6At
3
It is proved that this method is absolutely stable when 0 > 1.37 . Obviously it is the linear step integral method in case of 0=1. Generally, this method, which is called Wilson- 9 method, is always used in engineering by using 0=1.4. 2.2
Integration Equations
If the state of the particle are known at the moment t state of particle can be described by Eq. 4 at t + At . X
t+At
X
t+Al
=
X
I
+
\Xt+d&t
~
X
I ' ' "
=~ xt + xtAt + (xt+m
-
x,)Atl(20)
X
,+A, =x<+ XA*+XA*212+(*,+<*
where x t+etj
- x, ) A ' 2 / ( 6 0 )
+dAtxt+\62At2xt)
F-k(x, m+
\d2At2k
F = -mxg (t) + [-mxg (t + At) + xg (t)]6 where At represents the increment of integral time for one step, increment 2.3
Analytical
Solution
If the acceleration of earth is a sine wave the dynamical equation of particle can be expressed as in Eq. 7. m'x + kx = -mxg = A sin cot 1 where, 0) is the frequency and A is the amplitude of the sine wave. Furthermore, the state of particle can be expressed in Eq. 8 at t, m Ik Aco sin J—t + sin cot x, = k — mco k V m k — mco Aco - Aco T -cos cot x, = k - mco2 -cos —t + mzj' m k ACO Aco k_ -sin cot sin x, = -mco2 m mco
3 Simulations Numerical results of single freedom, are showed in figure 2., using Wilson- 9 method (Eq. 4) and also analytical method ( Eq. 8). Moreover, the comparisons of errors between the two methods are given. The parameters of the model were selected as: m-20kg , k = 1000 N Im , xg=Bsmox , B = 0.25m/s2,
co = 2nf,
f = 0.2 Hz.
Acceleration (m/s ) "Wl son Sal vt b Time(s)
Fig 2. Response with 6 =2.0
606
3.1 Error due to 9 Value In order to examine the error caused by the choice of 9 , calculations were carried out when 9 was changed from 0.5 to 2.0. All the calculations had the total integral time t = 15 second and an integral increment A t = 0.01 second. The acceleration response of the particle is illustrated in figure 2 with 9 =2.0. In the discussion, the following subscriptions are used: 'w' for Wilson- 9 method, 'p' for analytical method, 'i' for the number of integral step and 'n' for the total number of integral steps.
\x... -
JC„
p\_
is the
Displacement (
Acceleration ( m/s 2 )
absolute error of displacement at i step and
-|v6R
the Mean Square Root (MSR) error of displacement is the and MAX absolute MAXimum (MAX) O.B error of displacement. The °" statistical error of simulative calculation is illustrated in figure 3. Furthermore,
Fig 3. Absolute error Percent 30
represents
25
the relative error of displacement at i
15
\(xw —x)/x\X
100%
-displacement -velocity
20
acceleration
10 5
step and
^S,\(xw
-xp)/xp\
\ln
0 0.4
represents the average relative error of displacement. Zero value exists in academic solution so that the Percent error is added up if ABS (xp)
> MAX (xp)x
5%
•
0.6
0.8
1.0
1.2
1.4
1.6
1.1
Fig 4 . Averaee relative error Acceleration ( m/s ) 2. 50E-04 "MSR I "MAX L
1. 50E-04 The result of average relative 1. 00E-04 error is showed as figure 4. As shown in figure 3 and figure 4, 5. 00E-05 the error is minimal when 9 is about 0.79. In order to show it 0.700.750.80 0.85 °- 7 0 0-75 0.80 0.85 Fl 5 b g Absolute error clearly, the error of acceleration is Fig 5a . Average relative error Fig 5. Error of acceleration only illustrated in figure 5.
Ve
3.2 Errors about Increment In order to examine the influence of increase, calculation was made at 6 — 1.4 with At hanged from 0.001 to 0.05. Only the error of acceleration is considered below.
607
Error
Error 6.E r 0i»_ MSR
.Increase 0. 00 0. 02 0. 04 0. 06 Fig 6. Absolute error of acceleration
0. 000
Percent
Percent
0. 008
Increase 0.02
0.04
0.06
Fig 7. Average relative error
The statistical absolute error of simulation is illustrated in figure 6 and the average relative error is illustrated in figure 7. 4 Stability of Wilson-6 In most cases, an earthquake persists about 60 seconds, therefore the stability of Wilson- 9 method is only needed to be checked at this period. As a result showed above, the optimized selection of parameter of Wilson-0 method is 0 = 0.79 and At < 0.004 . The result of simulation is expressed in table 1 in the case of 6 = 0.79; At = 0.004 and total integral time = 60 second Table i.d = 0.79; At = 0.004; Integral time=60
DISPLACEMENT VELOCITY ACCELERATION
5
MAX VALUE
MSR ERROR
MAX ERROR
6.078E-3 1.296E-2 5.400E-2
3.780E-8 2.611E-7 3.632E-6
9.927E-8 6.528E-7 7.135E-6
AVERAGE RELATIVE ERROR
0.0013% 0.0055% 0.0120%
Conclusions
(1) Wilson- 9 method is adopted that 9 is better to be decided about 0.79 and the result of simulation is more accurate. (2) The integral step of Wilson- 9 method is better to less than 0.004 second. (3) For seismic analysis, it is stable that 9 =0.79 and At < 0.004 to adopt Wilson- 9 method. References Mostaghel,N.and Khodaverdian ,M., Seismic Response of Structures Reported on R-FBI System, Earthquake Eng. Struct.Dyn., 16 (1983)pp.33~56. 2. Xing,L.P., Effective location of active control devices for building vibrations caused by periodic excitation acting on intermediate storey, Earthquake Eng. Struct, Dyn.. 2 (2000)pp.l77~193. 3. Yu, M.H.,Ma,G.W.,Wang,Y.B., etc, Seismic analysis of the fundamental isolation of brick masonry building, Learned Journal of Construction, 4 (1996) pp.52-59.
A NEW COMPUTATIONAL MATHEMATICAL MODEL OF HYDRAULIC DAMPER Y. B. Wang and D. M. Hou School of Civil Engineering and Mechanics, Xi'an Jiaotong Xi'anJ 10049, P. R. China Email: dmhoe @mail.xitu. edu. en
University
This paper presents a new computational mathematical model of hydraulic dampers for cars. Through simulation and measurement for the dampers, the velocity force curve of damper was derived. According to the curves, a new four linear damper model was established. As an example, the car was simulated by using this new four-linear damper model and equivalent linear model. From simulation results, it can be shown that the present new damper model is more accurate than equivalent linear damper model for simulating the dynamic characteristics of damper for the car. As four-linear model is asymmetric about velocity, it can more accurately represent the actual performance of damper in car system. From this study, it can be found that the new computational mathematical model for damper can provide more useful information for the car damper design.
1
Introduction
Suspension system is most main system of vehicle and its dynamic characteristic is important for riding comfortability and running stability of vehicle [4]. The system must have the capability to reduce and absorb vibration coming from road, moreover damper can carry out this mission. So many researches about damper were made in last years and many mathematic damping models were presented. Such as equivalent linearization damping model(ELDM), nonlinear hysteresis loop damping model, WEN model and Binghan model of electro-rheological fluid damper and magneto-rheological fluid damper[l]. All these models are symmetrical model, viz force of damper is symmetrical about positive and negative velocity. But in some cases, unsymmetrical damper is beter than symmetrical damper. In this paper an unsymmetrical damping model is proposed and called four-linear damping model(FLDM). After numerical simulation a comparison of symmetrical damping model and unsymmetrical damping model is given [2]. 2 . Dynamical Model of 1/2 Vehicle System I
The dynamical model of 1/2 vehicle suspension system is illustrated in Fig l.[3]. m, ,k wl ,Fcl and m 2 , k w 2 , Fc2 are masses, stiffness, forces of wheel 1 and wheel 2, respectively. m3 is mass of vehicle body and I is moment of inertia about center of
608
mi
TTHT'gQl
ml
^
Figure 1. 1/2 Vehicle Model
609
vehicle body, f ^ and f2 are the road displacements inputting to suspension system from wheel 1 and wheel 2, respectively, a and b are horizontal distances of wheel 1 and wheel 2 to the mass center of vehicle body. X!, x 2 and x3 are vertical freedoms of particles. 9 is the slope of vehicle body. From the model drawn in Fig 1, the relative displacements and velocities of damper 1 and damper 2 are expressed as follows: D x l = a9 + x 3 - x j D x2 = -b6 + x3 - x 2
Vxl = a9+ x3 - x v x 2 = - b e + x3 -
where D x l , V x l a n d D x 2 , Vx2 are the relative displacement,velocity of damper 1 and damper 2. Dynamical equations of suspension system are expressed as follows: fi)k w 1 (x m i A 1 = D vx l, Kk ,1 + F cl J ( X f2)k w 2 2 m 2x 2 D x2 + F c2 m 3 x 3 = - D x I k , - Fcl - D x 2 k 2 - Fc2 v 16 = -aD x l k [ - aF c l + bD x 2 k 2 + b F c 2 where x , x and x are acceleration, velocity and displacement of particles. 8 is angular acceleration of vehicle body. And expressed in matrixes as follows: M x + C ( x ) + Kx = F ,
M=
"m,
0
0
0 0 0
m2 0 0
0 m3 0
0" 0 0 I
'
F -
V wlfl
k w ] + k]
0
0
kw2 + k 2
K=
Kj
-akj
— K
2
bk2
k
w 2
Ki
f
2
0
0}T
— kj
-ak.
-k2
bk 2 ak t - b k 2 a2k,+b2k2
TXO
ak[ - b k 2
C(x) is function of damper force. 3 . Mathematical Model of Damper 3.1 Experimentation of hydraulic damper A hydraulic damper used in the front and the back of car was investigated on MTS (called front damper and back damper). Fig 2 shows the experimental results which the input displacements are sine waves. It shows obviously that the force is unsymmetrical about velocity. The maximum positive damping force is about lkN and the minimum negative damping force is about -0.2kN. (Fig 2a., the amplitude and frequency is 4mm and 5 Hz, 40mm and 0.5 Hz. Fig 2b., 20mm and 1 Hz, 4mm and 0.5 Hz, respectively.)
610
3.2 Four-Linear Damping Model The results of Fig 2. indicate that the damping force about velocity is nonlinear, especially the damping force about positive and negative velocity is unsymrnetrical. So the ELDM is not suitable for this case. Base on this point, the FLDM is presented as Fig 3. where, DB and AC are horizontal beelines, OB and OA diagonal beelines. Point A and B is the saturated point of positive and negative velocity, they can be determined by the experimental curves of damper. The saturated point means that the damping force does not increase or very little while velocity is increasing from this point. Point A and B can be determined by positive and negative saturated velocity Vi and v2 and maximum and minimum damping force Fcri and
Force
-150
.2" -0.4" -0.6 (a) Front Damper Force
Velocity (mm/s) -200
FCr2-
In order to compare the ELDM to FLDM, the ELDM is supposed that equivalent viscous damping coefficient and damper forces are cj ,c 2 and Fcl , Fc2 of damper 1 and damper 2, respectively, cl 'xl X c. Fc2 = V x 2 X C 2
200
.2 -0.4 (b) Back Damper
Figure 2. Velocity and Damping Force k(kN)
Figure 3. Four Linear Model
l
and the FLDM, damper forces are Fcl and Fc2 as follows,respectively, r
cr2 V'xl v 1 x F a2> rr7/VV'xl„ x F rcrl' r,/V '1 r
crl
Fcr2
(V x , =< V 2 ) (V 2 < V x l =< 0) , (0
r
c2
(Vx2=
Vx2xFcr2/V2
(V2
Vx2xFcrl/V,
(0
Fcrl
(*)
(Vl
4 . Numerical Simulation of Suspension System According to 1/2 vehicle dynamical model (Fig 1), two simulation systems (ELDM and FLDM) were established in MATLAB SIMULINK. TWO kinds of road displacement functions(RDF) were input to the system, which was sine and pulse wave. The parameter of vehicle suspension system is showed in Table 1.
611
Table 1. Parameter of Vehicle Suspension System m^g
m2/kg
40
40
m3/kg I/kgm2 k wl /Nm'' 730
1230
175500
lq/Nm"1 k 2 /Nm 1
kw2/Nm"'
17500
175500
a/m
b/m
1.0
1.0
17500
According to the maximum acceleration (MA) and mean-square root of acceleration (MSRA) of vehicle body are the index of comfortability and stability of vehicle suspension system[5]. A good vehicle suspension system can reduce the value of MA and MSRA of vehicle body, so it is researched on MA and MSRA of vehicle body. 4.1 Results of Sine Wave When RDF inputting to the simulation system is sine wave, the expressions of functions of wheel 1 and wheel 2 are given as follows: 'fl = 0.02 sin( f2
2n£lt)
= 0.02 sin( 27tQt)
where Q. is frequency, which changes from 1 Hz to 9 Hz, amplitude is 0.02m. For the ELDM, equivalent viscous damping coefficient c, =c 2 =1290 Nsm"1, the damper force FC1 = v x l x Cl , Fc2 = Vx2 xc 2 .And the FLDM, the saturated point A and B is defined as Vi=0.05ms_1, Fcri=1000N, and V2 = —0.05ms"1, Fcr2= —200N. damper forces are determined by Eq. (*). As the frequency of inputting function is 8Hz, the acceleration, MA and MSRA of Acceleration (m/s ) vehicle body are shown in Fig 4. and Fig 5.,respectively. As the results of simulation, the FLDM is better than ELDM,and also FLDM is not sensitive to the time (s) frequency of RDF, and the Figure 4. Response of acceleration FLDM can be used in wider frequency band of RDF. 2 4.2 Results of Pulse wave The RDF is pulse wave. While the amplitude of pulse is 0.08 meter, the acceleration, and MA and MSRA of vehicle body are shown in Fig 6. and Fig 7., respectively.
MSRA (m/s )
MA (m/s 2 )
f*r
frequency (Hz) 5
6
7
8
9 10
0
1
freqyency (Hz) 2
3
4
5
6
Figure 5. Response of vehicle body
7
8
9
10
612
Acceleration 20 10
MA (m/s 2 )
(m/s 2 )
ELDM — FLDM
0 -10 -20
time ( s ) 0
1
2
3
4
5
Figure 6. Acceleration of vehicle body at amplitude=0.08m
Amplitude (m) 0.08
Figure 7. Response of vehicle body
Obviously the FLDM'S MA and MSRA of vehicle body is smaller than especially when the amplitude of RDF increasing.
ELDM'S,
Conclusions •
• •
Adopting the FLDM suspension system of vehicle can be effective on wide frequency band of road displacement,and the FLDM is more stable than ELDM on the frequency of road displacement. The new FLDM can reduce obviously the MA and MSRA of vehicle body. The new FLDM can provide useful information for the car damper design.
References 1. Choi, S.B.,Choi,Y.T. and Park,D.W., A sliding mode control of a FullCar electrorheological suspension system via hardware in-the-loop simulation, J. of Dynamic SystemsMeasurement and Control, 122 (2000) pp. 114-121. 2. KiduckKinandDoyoung Jeon, Vibration suppression in an MR fluid damper suspension system, /. Intell. Mater. Systems and Struct. 10 (1999) pp.779-786. 3. Lei, Y.C., Dynamics and Simulation of Vehicle Systems (National Defence Industry Press ,Beijing,1997 ). 4. Weng, J.S., The Semi-Active Control of Vehicle Suspension Systems Based on Magnettorheological Damper (PhD Thesis, Nanjing University of Aeronautics and Astronautics ,2001 ). 5. Yu, Z.S., Theory of Vehicle (Mechanical Industry Press,Beijing, 1985 ).
BROADBAND ECHOES FROM UNDERWATER TARGETS HENRY LEW School of Electrical and Electronic Engineering, Nanyang Technological Nanyang Avenue, Singapore, 639798. E-mail: EHLew @ ntu. edu. sg
University, Block S2,
BINH NGUYEN Defence Science and Technology Organisation, PO Box 1500, Edinburgh, SA 5111, E-mail: [email protected]
Australia.
Keywords: scattering, underwater acoustics, broadband, active sonar Models of acoustic scattering that predict the echo time history of targets are essential tools for the development of signal processing algorithms in underwater active detection systems. Realistic simulation of echoes requires a multi-frequency evaluation of acoustic scattering. This can be achieved by numerically modelling the surface of the object of interest as a collection of facets and calculating the scattered field using the Helmholtz-Kirchhoff approximation. Time and frequency domain analysis of a simple object (e.g. a sphere) and a complex structure using this technique in monostatic and bistatic configurations are given as examples.
1
Introduction
The development and evaluation of signal processing algorithms for underwater active detection systems can be greatly enhanced in terms of robustness and accuracy if realistic signal/target models are used. In the past, very simple models were used for algorithm development because realistic models were computationally costly and of little benefit to lowresolution systems operating in benign environments. For example, the target echo of an echolocation system was modeled as an attenuated timedelayed and Doppler-shifted replica of the transmitted signal. However, measurements of actual target echoes have shown this to be a first order approximation at best, even for very simple targets such as spheres. With recent advances in computing software and hardware, more realistic modeling has become feasible and cost effective. In this paper we show some results of high fidelity modeling of echoes from underwater targets that are broadband in nature, to match what is possible in actual systems. Many previous results in this area have concentrated on single frequencies, narrowband approximations, or averaged/integrated quantities such as Target Strength [1], rather than the echo time series. The rest of the paper is organized as follows. We first review the modeling methodology, and then present the case of a sphere
613
614
followed by some results from the scattering of a target complex. The results are presented in both the time and frequency domains. Finally, the paper concludes with some comments and observations. 2
Model of Acoustic Scattering
The model for acoustic scattering [2] is based on the evaluation of the Helmhotz-Kirchhoff integral over the surface of the object under consideration. The surface of the object is modelled by a mesh of triangular facets. The Helmholtz-Kirchhoff integral can be evaluated analytically for each triangular facet, which helps reduce the amount of computation needed. The total scattered field from the target of interest is then obtained by a coherent summation of the scattered field from all the individual facets. The material properties of the target are encapsulated in the local reflection and transmission coefficients of the target. Additional computational complications such as multiple transmission layers, scattering from several layers, hidden surface removal, and first-order multiple scattering can be included, if necessary. Note that, at lower frequencies, diffraction of sound around components of the object can be significant, and is only crudely approximated. However, the diffraction that gives rise to the forward scattering lobe for bistatic calculations is fairly well approximated. The model has been well tested against known results for the monostatic scattering of basic shapes. However, no rigorous tests of a target with a high level of complexity are available. Currently, for large size complex targets, the model is expected to give less accurate results at the lower frequencies, where structural resonance and diffractive effects become important. At the other end of the spectrum, there is no high frequency limit of validity since the technique is intrinsically a high frequency one. In practice, however, the upper frequency limit is determined by the need to accurately represent the target surface by plane facets to some fractionalwavelength accuracy. Therefore, surfaces with a large degree of curvature need more facets to achieve a given accuracy, and hence a longer computation time is required. Target scattering over a band of frequencies can be characterized by the target's impulse response or transfer function. The two different, but equivalent, representations are related as follows:
615
hr{t)= \HT{f)el2*df
HT{f) = l4~l,
and
J
(1)
Pi\f)
where /?,(/) and ps{f) a r e the Fourier transforms of the incident and scattered pressures over the frequency band of interest, respectively1. In the following, examples of both representations of the target response will be shown. 3
Rigid Sphere
The sphere is very useful for verifying the accuracy of numerical models because it is one of the few shapes that have analytical (closed-form) solutions [3] for wave scattering. In order to illustrate the accuracy and the limitation of the numerical model, the scattering from a rigid sphere of radius, a, was calculated and compared to the analytical result. A mesh of over 84,000 triangular facets represented the surface of the sphere. In this calculation, the source and the receiver were placed at a large distance away from the sphere to achieve far-field conditions. The receiver was stepped at 1-degree increments counter-clockwise around the target. The sphere was assumed to be in seawater with a sound speed of 1500m/s and density of 1026 kg/m . The scattered field was then computed as a function of bistatic angle over a band of frequencies such that its wavenumber2, ka, varies from 17 to 42. The results, given in Fig. 1, show that the numerical model is fairly accurate except for when the bistatic angle approaches the forward direction (0=180°), where diffraction effects become important.
| 0
j 50
| 100
\ ^ 150 200 BiUlicwigtoftleg)
, 250
; 300
Lj 3S0
0
100 *00 0«l. BWHIe inO» <0«0)
0
1M MO J B.ltiUci4a«fc 3 )
Figure 1. The bistatic target strength and the magnitudes of the transfer function and the impulse response of a rigid sphere calculated using numerical and analytical methods.
Note that the transfer function and the impulse response are also functions of the distance between the target and the receiver measuring its scattered field. 2 k is the acoustic wavenumber k=2n/X
616
4
Target Complex
The main reason for the development and use of numerical models is that analytical solutions do not exist for complex structures. By the same token, this makes it difficult to directly verify the correctness and accuracy of these models. However, the following examples will show how confidence in the results can be gained by considering both the time and frequency domain representations of the target response. The target under investigation is a submarine-like structure built from a combination of simple shapes (e.g., cylindrical hull, spherical end, conical tail and airfoil sail/fins). For simplicity, the structure is assumed to be rigid. The geometry of the target complex is shown in Fig. 2. The major dimensions of this structure are the length, L, and the width, 2a. Over 24,000 facets were used to model all the parts of the structure. W&guUuTflhlft
•*•
jflMT A
Figure 2. Different views of the target complex.
The transfer function and the impulse response for monostatic (backscattering) and bistatic scattering, as a function of target aspect and bistatic angle, are shown in Figs. 3 and 4, respectively. Note that the transfer function confirms that the hull specular at broadside is independent of frequency. The impulse response, on the other hand, reveals the highlight structure of various components that make up the target complex. For sufficiently high frequencies, the relative time delays of these highlights are found to be consistent with what is expected from simple geometrical considerations. All these are indications that the model is doing what it should, and thereby providing the user with some confidence in the model.
m nfi
Figure 3. Monostatic frequency and time domain responses of the target complex.
617 T r a i l staction(Gmrfc5iiinwrtno}-Maptuii(ilB)
lmH"S«ip«iM(0»o«rti:Bii™ilB)- M«rtlud»(dB}
Figure 4. Bistatic frequency and time domain responses of the target complex.
5
Conclusions
The transfer function or impulse response contains all the information that can be revealed by a scattering experiment over a frequency band of interest. Under certain conditions, the features of the impulse response can be related to the physical attributes of the target, such as its size and construction. Once either of these functions is known, then it is relatively straightforward (at least in principle) to calculate the target response for any arbitrary input waveform. This is particularly useful when time series simulation of the scattered field is required. Note that even though the transfer function and the impulse response contain the same information, sometimes certain features of the target response are better revealed by one representation over the other. This paper has shown an example of a target complex in which the time domain target highlights gave a more physically intuitive interpretation of the results. References 1. Urick, R.J., "Principles of Underwater Sound", 3rd Ed., Peninsula, 1983. 2. MacGillivray, I.R., Model(V.3) Software, DSTO. 3. Anderson, V.C., "Sound Scattering from a Fluid Sphere", J. Acoust. Soc. Am. Vol. 22, No. 4, pp. 426-431 (1950).
Competing Risks For Reliability Analysis Using Cox's Model F. A. M. Elfaki, I. Daud, N. A. Ibrahim, M. Y. Abdullah and I. Lukman Department of Mathematics.Faculty of Science and Environmental Studies, Universiti Putra, Malaysia, 43400, Serdang, Selangor, Malaysia
Abstract Weibull distribution as the basis of reliability function is generalized by introducing an additional shape parameter. The use of an algorithm based on Cox's proportional hazard specifically developed for this model, is illustrated. The usefulness and flexibility of the distributions are also illustrated by analyzing the multiple stress data sets from Crowder et al (1991). In addition simulation data are generated to further illustrate the idea. The parametric Cox's model with Weibull distribution shows similar results as Cox's with exponential distribution, especially for a sample size greater than 40 based on EM algorithm. The modification of the model of both distributions is considered.
1. Introduction The theory of competing risks is applied in the analysis of reliability and survival data involving several different failure types or risks. In an industry, for instance, one might distinguish between a mechanical device failure attributable to a component that has failed and those due to unrelated causes. This constitutes the different risks under consideration. Typically, the data include the time of failure or censoring of each individual, as well as an indicator of the type of failures. To assess the effects of covariates on cause-specific hazards, one can perform a parametric Cox's proportional hazards model, treating failure types which are of interest as censored observation ([5], [2]). A general model is adopted in this paper which incorporates most of the widely used life stresses. The model can be used for single or multiple stresses. Under this formulation, the model can be either solved as a Proportional Hazards Weibull Model (PHW) or Proportional Hazards Exponential model (PHE). 2. Methodology The proportional hazards (PH) regression model is commonly used in the analysis of survival data and, recently, there has been an increasing interest in its application in reliability engineering. Following [1], we will focus on a particular model that is k(t/z) = \(t)exp(z8) (1) where P=((3i,...,PP) is a vector of regression coefficients, t is continuous random variable representing an individual's lifetime, and z = (z],...,z ) is vector of regressor variable associated with the individual. Model (1) is flexible enough for many purposes. The modification of model (1), can be represented as, h(t/z) = h0(tXTsexp(&)) (2)
618
619
where Ts is the censored time to failure. Model (2) is not limited to nonnegative (3 s or categorical covariates and has a very interesting contrast with model (1). The full likelihood based on the data (t^S^Zj), i = 1,2,...,n, is given by [6] and [4], as follows:
LfQ) =
f[f(ti;B,zlf'R(ti;B,zJ-*'
(3)
where 8j's are the event indicator variables (b i =1 if the ith subject fails; 8; = 0 if the ith subject is censored), 0 is a parameter that indexes the density function and zi are the vector covariates for the ith subject. 2.1. The PH Weibull Model The Weibull distribution is commonly used for analyzing lifetime data. In other words it is assumed that the baseline failure rate in equation (1) is parametric and is given by the Weibull distribution. In this case, the baseline failure rate is given by: h0(t) = T\arl(t/ap-lexp[-(t/a)r[\ (4) where a is the scale parameter depending on z and rj is the shape parameter. The reliability function can be derived as, (5) The likelihood function for PHW model, as follows: \
PJ^T1 exp
l = Yjn i=\
0 exp • r V -
-IS
S,k-C
\k=0
Z°JtJ
(6)
JJ
2.2 The PH Exponential model In reliability studies, the exponential distribution plays an importance role of analogous to that of the normal distribution in other areas of statistics [7]. The baseline failure rate can be defined by:
\{i) = a 'exp
^ie'z'
•t
(7)
a Log-likelihood function for PHE model can be written as: f
l = Yjn
( (
-2L
V\
m
exp
exp V*=o
(8)
-T:e>'° k=\
620
Note that, if we substitute (3 = 1 in the likelihood function for the Proportional hazards Weibull (PHW), it will become similar to the likelihood function for the Proportional hazards Exponential (PHE) model. Table 1: Results from simulations study comparing model (1) and (2) with Weibull distribution, based on EM algorithm
Sample Size 15
Cen %
25
Method
Parameter
Mean
Bias
RMSE
(2)
4 e2 fi
-26.000 -5.8060 1.1336 -25.922 -25.585 6.8047 -32.052 -0.2858 1.0029 -32.052 -0.2858 1.0008
-25.922 -5.8059 0.1336 -25.923 -25.585 5.8047 -32.053 -0.2858 0.1119 -32.053 -0.2858 2.1011
31.591 30.854 0.5867 31.591 31.369 6.98096 3.2683 0.3006 0.1132 3.2683 0.3006 7.2416
%
15
25
(1)
100
25
(2)
100
25
(1)
e2 fi 4 e2 fi 4 &2
fi
3. Simulation Data The objective of this simulation study is to compare the mean, bias, and the root means square error (RMSE) obtained from fitting model (1) and (2) based on the EM algorithm method. The simulation data is generated from Kevlar 49 Failure Data [3] with two covariates (stress and spool). All data generation is carried out by SAS program. The data generated is run 1000 times for every sample size corresponding to the different percentage of censoring. The result of this studies are shown in Table (1) and (2) for PHW and PHE, respectively. As mentioned before in section 2.2, to obtain PHE, we need to substitute (3 = 1 into PHW (equation (6)). From the simulation study we conclude that, for sample size greater than 40, as can be clearly seen in Table (1). Table 2: Results from simulations study comparing model (1) and (2) with exponential distribution, based on EM algorithm
Sample Size
Cen
15
25
Method (2)
Parameter
Mean
Bias
RMSE
3
-4.8554
02
-10.000
-5.7554 -10.000
1.8423 8.6101
621
15
25
(1)
100
25
(2)
100
25
(1)
4
-4.8554
-5.7554
5.8259
02
-10.000
-10.000
8.7101
3
-32.052
-32.053
3.2683
o2
-0.2858
-0.2858
0.3006
3
-32.052
-32.053
3.2683
e2
-0.2858
-0.2858
0.3006
4. Conclusions Two different lifetime distributions of the competing risks model via Cox's model, namely Weibull and exponential with censored data is presented. The modification of the models for both distributions is considered. The EM algorithm is used to estimate the parameters. It is observed that Weibull distribution describes well the nature of the model concerned as compared to the exponential distribution. It is also observed that the EM algorithm behaves, reasonably well in the estimation of the parameters concerned and provides consistent estimates of for both formulations. For sample size greater than 40, both PHW & PHE give similar result. However when the sample size is less than 40, we cannot draw the same conclusion. More work is needed to determine the efficiency of both models for smaller sample sizes. References [1] Cox, D. R. "Regression Models and Life Tables (with discussion)., JR. Statist. Soc. 34, (1972) pp. 187-220. [2] Cox, D. R. and Oakes. D., Analysis of Survival Dat, London: Chapman and Hall (1984). [3] Crowder, M. J., Kimber,. A. C , Smith. R. L., and Sweeting. T. J., Statistical Analysis of Reliability Data, London: Chapman and Hall, (1991). [4] Kalbfleisch, J. D and J. F. Lawless , Estimation of Reliability in FieldPerformance Studies, Technometrics, 30, (1988) pp.365-388. [5] Kalbfleisch, J. D and R. L. Prentice, The Statistical Analysis of Failure Time Data, New York: Wiley (1980). [6] Lawless, J. F., Statistical Methods In Reliability, Technometrics, 25, (1983)pp.305-335. [7] Mann, R. N., Schafer, E. R., and Singpurwalla, D. N., Methods for Statistical Analysis of Reliability and Life Data., John Wiley and Sons, New York (1974).
PARALLEL MULTIBODY DYNAMICS USING THE MESSAGE PASSING INTERFACE B. FOX AND F. J. WELNA Parallel Computing Research Group, School of Electrical, Electronic and Computer Engineering, The University of Western Australia, Crawley WA 6907, Australia. Email: {budfox, welna-fj} @ee. uwa. edu. au D. J. LILJA Department of Electrical and Computer Engineering, Institute of Technology, University of Minnesota - Twin Cities Campus, 4-174 EE/Csci Building, 200 Union Street S.E., Minneapolis, MN 55455, USA Email: [email protected] L. S. JENNINGS School of Mathematics and Statistics, The University of Western Australia, Crawley WA 6907, Australia. Email: [email protected] The multibody modelling and computation of an arbitrary length pendulum system is investigated. The equations of motion are cast as either Differential Algebraic Equations (DAEs) or the underlying Ordinary Differential Equations (ODEs) with augmented constraint equat ions, and are computed using the Differential Algebraic Equation System Package (DASPK) [1] and the Livermore Solver of Ordinary Differential Equations with Automatic method switching and Root finding (LSODAR) [2] respectively. Coarse-grain parallelism is implemented through the use of the Message Passing Interface (MPI) [3] library and two different architecture types are compared.
1
Equations of Motion
From Kibble's [4] treatment of variational calculus, one may recall encountering variational changes of the function f(t,qj,qj)
of the independent variable t and the
generalized coordinates q{, for i = \,...,n . The stationary integral of this function led to the Euler-Lagrange equations df dqt and if the kinetic energy 7J =—mq{t)
d df dt dqi
„
of an arbitrary body is substituted for
f(t,qj,qj)
and the stationary integral of this function sought, then one may obtain the Lagrange equations in the form of kinetic energy, that is,
dT dqt
(2)
Hr=^ dq
t In the study of planar multibody dynamics, an expression for the kinetic energy of an arbitrary body may be written according to Shabana [5] as
1 -r
•
1
Ti=-RimRRiRi+-me6:idi
622
-2
,
(3)
623
where qi = (Ri,9i)T,
# « M , = j piIdVt = mt I, wee,- = Jp,«, 7 M, dVt , mt, p, and Vt
are the body mass, density, and volume respectively, and ^ is a local position vector on body i. On substitution of (3) in (2) yields Mq\
mRR,i 0
0
Qe.
"88 i
(4)
and augmenting the constraints C(q,t) = 0 and the constraint forces 2c -~CTqX to (4) yields q=v Mv + C^=Qe.
(5)
C{q,t) = 0 This is regarded by Ascher [6] as an index-3 DAE, since three differentiations (two differentiations of C[q,t) = 0 and the replacement of jl = A) are required to allow the DAE (5) to be written as the ODE q=v T
Mv + C qii = Qe CT,v =
(6) -[Ctq)iq-2Cq,q-Ctt
The DAE (5) may be computed directly using DASPK, which solves a semi-explicit system of the form f(t,y,y)
= 0 [1], where y = [q,v,?J) . The ODE (6) may be
computed by LSODAR, which solves the explicit system
y = f(t,y),
where
y = {q,v,nf • The results of the following section concern the use of LSODAR to compute (6), and preliminary investigations are made using DASPK on (5), with the option of allowing DASPK to solve the initialization problem: given the differential variables Y_d = (qr, v) , calculate their derivatives Y l d = (^, v)
and the algebraic variables
Y_a = A , the Lagrange multipliers [1]. 2
Implementation
Both sequential and parallel computation of the system equations was performed using LSODAR, however, due to the inherently sequential nature of numerical integration, little parallelism at the coarse-grain level in between times steps, was exploited: although nonblocking MPI_Send() and blocking MPI_Recv() function calls were employed through MPI to allow for potential concurrency between the times steps of integration. DASPK does in fact allow for the use of multiple processes in the integration process [7], but this was not employed here due to convergence failures in single-process computation using DASPK.
624
It is of particular interest to identify, through the use of profiling, which routines perform at least 75% of the computation. The linear algebra package LAPACK [1], in particular the routine dgesv_(), used in the Gaussian elimination of the system of equations, consumed most of the execution time. However, the user provided routines used for the construction of the system of equations, were each allocated a separate processor in a parallel master-slave computational approach as shown in Figure [1]. Slave loop
Master loop
Construct the following system for each time-step using
Process 0
jc = [^,qr,/i] and t. Jc = [?,^,ju] and t.
M
C,
MPI Recv
Qe
iQd.
Construct
M
C,
Construct MPI Recv
Construct MPI Recv
Construct
Construct MPI Recv
Qe Qd
Process 2
Process 3
Qe Construct
MPI Recv
Process 1
M
Process 4
Qd
Figure 1. MPI master/slave computational flow structure.
Figure [2] shows that the parallel implementation of the code takes longer to compute the system of equations than the sequential implementation, for both the dual Intel 1GHz Pentium III and the SGI 38000, 500 MHz R14000 Origin machines [8]: this is due to a high communication-overhead/computation ratio.
625
3
Conclusions and Future Research
The numerical integration of a classical n -bodied pendulum was performed using LSODAR and DASPK on a distributed network of two different architecture types. The parallel implementation suffers from a high communication-overhead/computation ratio, however, for a greater work-load in between integration steps, the parallel run-times are expected to be faster. For example: in a planetary system containing n bodies, where each of the bodies has a gravitational effect on the others, n(n-\)/2 gravitational force computations are required. Although there is a quadratic relationship between the number of bodies and the number of force computations, the increase in communication overhead is expected to be linear; preliminary investigations indicate that n would need to be greater than 1500 bodies. SGI 38000 Origin & Intel Pentium III Run Times
i
45000• 40000. 35000•„• 30000£ 25000-
20000-
X
1500010000-
^
jr
P3-S -<
—*
P3 - P
Origin - P
. . . - - " • > ^
,.........^j
5000.
//
/ y/f
"
——J»-^-3fc7..~5£-
050
100
150
200
250
300
number of bodlas (n)
Figure 2. Sequential (S)/Parallel (P) implementations for n < 300 bodies.
4
Acknowledgements
This work was supported in part by the Minnesota Supercomputing Institute: http://www.msi.umn.edu/, under the supervision of Prof. D. J. Lilja. References 1. 2.
3. 4. 5. 6. 7. 8.
http://www.netlib.org/ - DASPK2.0, LAPACK. Petzold, L. R. and Hindmarsh, A. C. "Livermore Solver of Ordinary Differential Equations with Automatic Method Switching and Rootfinding", Computing and Mathematics Research Division, 1-316 Lawrence Livermore National Laboratory, Livermore CA 94550, (1987). Gropp, W., Luck, E. and Skejullum, A., Using MPI: Portable Parallel Programming, with the Message-Passing Interface, The MIT press, Cambridge, Mass., (1994). Kibble, T. W. B , Classical Mechanics, 3rd Ed., Longman Inc., New York, (1985). Shabana, A. A., Computational Dynamics, John Wiley & Sons, New York, (1994). Ascher, U. M. and Petzold, L. R., Computer Methods for Ordinary Difrerential Equations and Differential-Algebraic Equations, SIAM, Philadelphia, (1998). http://www.engineering.ucsb.edu/~cse - DASPK3.0 http://www.msi.umn.edu/
SOME COMPUTATION ASPECTS IN MODEL-ORDER REDUCTION OF FLEXIBLE STRUCTURES ROBERD SARAGIH Department of Mathematics, Institut Teknologi Bandung, Jin. Ganesha No. 10 Bandung, 40132, Telp. 062-22-2502545, Fax. 062-22-2506450, Email: [email protected] Model reduction is part of the dynamic analysis of flexible structures. Typically, a model with a large number of degrees of freedom, such as the one developed for the static analysis of structures, causes numerical difficulties in the dynamic analysis, to say nothing of the high computational cost. Additionally, if one takes into account that the complexity of a controller depends on the plant order then it is not difficult to see that a full-order controller for a high-order plant is hardly implementable. Thus the reduction of system order solves the problem assuming that the reduced model acquires essential properties of the full-order model. This paper is concern with the computation problem of reducing a high-order model of flexible structures to a low-order one without significant errors. Some methods will be discussed and computationally will be compared to the other methods such as modal truncation, balanced truncation and singular perturbation approach
1
Introduction
A major difficulty in control of flexible structure or any other large-scale system is, in the words of Bellman, the curse of dimensionality. A flexible structure is, by nature, a distributed-parameter system, and, hence, it has infinitely many degrees-of-freedom. Even approximate structural model by some discretizations approach are generally still too large for using in control design applications. Moreover, many controller design methods such as H^, or //-synthesis yields a controller with the order at least equal to the plant order. Such high-order controllers are designed to optimize performance objectives, but often can not be used in practical applications. As mentioned in [1] that a controller with a large number of degrees-of-freedom can cause numerical difficulties, uncertainties, and high computational cost. Thus it is desirable to have methods available for designing low-order controller that will guarantee closed-loop stability and performance. One approach to obtain the low-order controller is, firstly, the order of plant is reduced and then the low-order controller is designed for that reduced order plant. This paper is concern with the computation problem of reducing a high-order model of flexible structures to a low-order one without significant errors. Some methods will be discussed and computationally will be compared to the other methods such as modal truncation, balanced truncation and singular perturbation approach. Firstly, the method based on the modal truncation will be presented. The feature of the modal truncation is conceptually simple and computationally cheap. In
626
627
the frequency domain terms where a stable transfer function matrix is in partial fraction form, the low-order system is obtained by throwing away the smallest value of maximum imaginary magnitude. Secondly, the balanced truncation is reviewed. The balanced truncation method tends to have smaller errors at high frequencies and larger errors at low frequencies. It is undesirable in some applications. In contrast, the singular perturbation approach display opposite character. The concept of the singular perturbation is that the stable variable of the system is divided into the slow and the fast modes. The low-order model is obtained by setting the velocity of the fast modes is equal to zeros. 2
Model of Structure
The structure has four-stories and is tower-like in shape. To simplify the modeling processes, some assumptions are made. Each story is modeled such that it has a single-degree-of-freedom in the transverse direction (the same direction as the excitation) and one more degree-of-freedom in the angle of torsion around the centroid of the story, which yields that the whole structure has 8 degrees-of-freedom. This structure has long and short spans symmetric with respect to the central axis, but has a deviation on the right and the long side on the third story due to an auxiliary mass, which thereby creates a coupling between the transverse and torsional vibration. The mass distribution of each story is homogeneous and the stiffness of four columns are supposed to be the same in the direction of the excitation at all stories. On this condition, the distance from the centroid to the spring of the right side of / th-story and the distance from the centroid to the spring of the left side of the i th-story are equal and all the cross terms have no value. On the third story, however, there is a lumped load at the right side. Therefore, the cross terms have certain values and the structure possesses transverse-torsional coupled vibration modes. By the using the Lagrange equation, we can obtain the dynamic model of the structure in the second order differential equation, i.e.
Mpm+cpm+KpX(t)+dpm+bpnt)=o For model analysis, the model of structure is transformed into the state space form and can be written as: x(t) - Ax{t) + Bu(t) y(t) = Cx(t)
628
3
Reduced-order Model
The model reduction has a quite long history, and many reduction techniques have been published. In this paper we use modal truncation, weighted balanced truncation, and weighted balanced singular perturbation (modified singular perturbation). 3.1 Modal Truncation The truncation of modal realizations is common in engineering practice because it is often the case that high-frequency modes may be neglected on physical grounds, or because the phenomena resulting in such modes only play a secondary role in determining the model's essential characteristics. Truncation method of model order reduction seeks to remove, or truncate unimportant states of model. If a state space of model has its A -matrix in Jordan canonical form, the state space truncation will amount to classical modal truncation. The features of the modal truncation are conceptually simple and computationally cheap. In this method, the model is transformed into modal coordinate. By using the modal coordinate, the contribution of the eigenvalues is identified and the low-order model is obtained by truncation the eigenvalues having the smallest contribution. 3.2 Weighted Balanced Realization Truncation Recent control literature shows that the balanced realization truncation techniques is widely used in the model reduction procedures. Some model reduction methods are also based on approximation via balanced realization and closely related Hankel norm optimal approximation procedure. More (1981) [4], first introduced the internally balanced realization and showed its application to the model reduction problem. The controllability and observability Grammians are used to define measure of controllability and observability in certain directions of the state space. The Grammians are not invariant under coordinate system in which the Grammians are equal and diagonal. The corresponding system representation is called balanced. A low-order model can be obtained from the balanced representation by deleting the least controllable and observable part. In this paper we adopt the model reduction developed by Enns [2]. The standard optimal model reduction problem is expressed as follows. Consider the n th-order model: P(s) = C(sl - A)~l B. Find an r th-order (
r
)
model:
Pr(s) = C (si -Ar)~xBr
which
minimizes
629
7=|W,.(s)[P(s)-P r (s)]W o (s)L, where W^s) and W0(s) are input and output weighting matrices, respectively. Let W, (s) = H{ (si - Fi )_1 G, + Df be an asymptotically stable frequency weighting as an input weight to the asymptotically stable system P(s). Define the associated system matrices by A
=
A
and fi , =
BDfl
Suppose that ju -
U
0 u2}\ 12 is the nonnegative definite solution of the following Lyapunov equation Anew{i + nATnew + BnewBTnew - 0. Define Y as the positive definite solution new
of the following Lyapunov equation YA + ATY + CTC-0. Consider a transformation to the realization (A,B,C,0) , which makes u new = Ynew = 2 = diag{ox,02,...,0„} with Gl > <7M . Now the frequency weighted approximation can be achieved by eliminating the rows and columns of the new realization (A, B, C,0) of P{s) corresponding to the smallest singular values, so that the low-order model is (Au,Bl,C1,0) where Au is the top left rxr of the new A. In case <7r ><Jr+i, the approximation guarantees the system to be stable. 3.3 Weighted Singular Perturbation Approach The weighted balanced truncation method tends to have smaller errors at high frequencies and larger errors at low frequencies. It is undesirable in some applications. In contrast, the weighted singular perturbation approach displays the opposite character. The concept of the weighted singular perturbation is that the stables variable of weighted balanced system is divided into the slow and the fast modes [3]. The low-order system are approximated by setting the velocity of the fast modes equal to zeros. In this paper we modified the singular perturbation by using the weighting function. Consider the weighted balanced realization (A, B, C,0) and the state variable is divided into the slow and the fast modes as the following form:
"*, (Ol _
"•U
x2 (0] ~ _A21 y(t) = [c,
5,1 ~*,(0l u(t) + B2l] A 22 J x2 (OJ ^12,'
c2plW
1*2 W !
630
If we define {Ar,Br,Cr,0) C r — Cj
4
x2
is as the fast mode, the low-order system is Br — Bx AUA22B2 where Ar = A, • AnA22A2x
^2^22-^2!.
Simulation Results
In this simulation we use all reduction method to reduce the order of model. In this case the order of original model is 16 and can be reduced to 4th-order for the low frequencies. The error bounds in the sense H„-norm are nearly same for all methods. From this simulation we verified that the singular perturbation has smaller errors at low frequencies than the balanced truncation and the modal truncation. The weakness of the weighted singular perturbation is computationally expensive and conceptually complex, in contrast, the modal truncation is theoretically simple and computationally cheap.
Freq uency Response 40
•
i
i
i
20
0 CQ
2, CO
"o -20 'E O) CO
5
-40
'—'—T~^~^^
/
-60
.Rn
I
10
I
I
20 30 Frequency [Hz]
I
40
50
Figure 1. Frequency response of the full order and reduced order model
631
References 1. Anderson, B. D. O., Controller Design: Moving from Theory to Practice, IEEE Control System 13 (1992) pp. 16-25. 2. Enns, D. F., Model Reduction with Balanced Realizations: An Error Bound and a Frequency Weighted Generalization, Proc. 23rd IEEE Conference on Decision and Control (1984) pp. 127-132. 3. Liu, Y. and Anderson, B. D. O., Singular Perturbation Approximation of Balanced System, International Journal of Control, Vol. 50, No. 4 (1989) pp. 1379-1405. 4. Moore, B. C , Principal Component Analysis in Linear Systems: Controllability, Observability, and Model Reduction, IEEE Transaction on Automatic Control, Vol. AC-26, No.l pp. 17-31.
MESHLESS ANALYSIS OF THE OBSTACLE PROBLEM FOR TIMOSHENKO BEAMS BASED ON A LOCKING-FREE FORMULATION J. R. XIAO Department
of Mechanical and Aeronautical Engineering, University of Limerick, jiarun.xiao @ul. ie
IrelandE-mail:
F. WANG, Q. H. CHENG Division of Computational Mechanics, IHPC, Singapore 118261 E-mail: [email protected], [email protected] A meshless method is developed, based on the meshless local Petrov-Galerkin (MLPG) approach and the local point interpolation method (LPIM), along with the locking-free formulation, for the obstacle problem for thick beams by means of variational inequalities and the corresponding linear complementary equation. The meshless method is based only on a number of randomly located nodes. No global background integration mesh is needed, no element matrix assembly is required and no special treatment is needed to impose the essential boundary conditions. An obstacle problem for thick beam is analysed by the proposed method and the numerical results are compared with analytical solutions.
1.
Introduction
In the present study, the solution of an obstacle problem for Timoshenko beam is investigated by means of variational inequalities and a local Petrov-Galerkin approximation along with the locking-free formulation [1]. The LPIM [2] is employed for constructing both trial and test functions, and the test function was deliberately selected to enable users to simplify the construction of the global stiffness matrix by eliminating the need for element matrix assembly [3]. Implementation details and numerical example are presented. 2.
Problem Formulations
In this paper, M and K represent the moment and curvature, Q and y represent the shear force and shear strain, EI is the bending stiffness, kGA is the shear rigidity, and w is the displacement. It was showed in [1] that the shear-locking phenomenon in the thin beam limit can be removed by simply changing the dependent variables in the governing equations, where the transverse displacement w and the transverse shear strain y are used for dependent variables, instead of using the total rotation § and the displacement w, as long as the y field is one order lower than that of w. The corresponding locking free governing equations can be written in terms of w and y as follows: EI(w'-y)"'-q
= 0,
EI(w'-y)"
+ kGAy = 0
(1)
The essential boundary and natural conditions are written using w and y as W = w on r„,w'-Y
= 0 on r^M=
M on rM;Q = Q on T e .
In a contact problem, the non-penetration condition leads to a boundary condition in the form of inequalities:
632
633
g(x) > 0,
g(x)F(x) = 0,
F(x) < 0
(2)
where F(x) is the normal contact force density on the interfaces, g(x) is the gap function along the contact interface Tc. Consider a thick beam unilaterally supported by a frictionless rigid body, with an initial gap 50(x) between the beam and the rigid body. The equilibrium equation is given by: Q,x = q + F,M,x-Q = 0 (3) In this case, the solution spaces are defined as: C(Q) ={Ve H\Q) I V = 0on rm g(x) = $'(x)-V>0on rc} (4a) G(£2) = {y a e H'(X2)| V ' - ^ O o n / ^ W e C(£2)} (4b) Introduce the following continuous forms:
a(w,y,V,ya)
= jaM(w,y)K(y,ya)dx,
Hy,ya) = jpQ(y)yadx
(5a)
(q,V) = jaqVdx
(5b)
The following variational inequality can be obtained:
(W, f> e C(i2)xG(Q) I a(w,y;V-
w,ya -y) + b(y,ya-y)>
V (V, Ya) e C(Q)xG(£2)
(q,V-w) (6)
The variational formulation (6) can be approximated by either the finite element technique or a meshless method. 3.
Local Point Interpolation Approximation
The LPIM [2] interpolates w(x) and slope 9 for the thin beam from the surrounding nodes of a point xQ using polynomials: 2n
w(x, xQ) = ^ p. (x)ai (xQ) = PT (x)a(xQ)
(7)
;=i
0(x,xQ) =
dw(x,XM n) T-
-
ax
In
= lpix(x)ai(xQ) i=i
T
= Pl(x)a(xQ)
(8)
where P (x) is a complete monomial basis of order 2n, and n is the number of nodes in the neighbourhood of xQ. a,(jce) is the coefficient. The y field is one order lower than that of w. In this case, no derivative of the variable is needed and the basis number is taken as the number (n) of nodes in the influence domain. n 7<X xQ) = X Pi O K (*Q) = PT (x)a(xQ) (9) The LPIM determines the coefficients a, by enforcing Eqs. (7-9) to be satisfied at the n nodes surrounding point xQ and writing the result in terms of ( w and 0 ) and y: w(x)=0T(x)we y(x)=&I(x)Y
=&lw
+ &Qd
(10) (11)
634
where vf = [w, , 0 , , w 2 , 0 2 , . . . , wn,On]
, f = fa ,f2 , . . , ? „ F • w,-, 0,-, and ?,. are
the nodal values of w, 0, and yat x = xt, respectively. The shape functions in Eqs. (10) and (11) possess the delta function property, and the essential boundary conditions can be easily imposed. 4.
MLPG and LPIM Discretisation
We define subdomains Qs with boundary rs, which is assumed to be the support of nodal test function v, centred at each nodal point xt. In the present study, the Petrov-Galerkin approximation procedure is presented. The variational formulation (8) can be rewritten in the following local weak form where the test functions (V-w), (ya - y) are represented by vw and Vyi
El\ (v"w-v')(w"-y')dx + kGA\ vyydx> f vjdx
(12)
The test functions are approximated by linear combinations of the nodal shape functions for nodal point x, obtained from the procedure in section 3: vwi(x) = \f/wi (x)CCi + If/Qi (x)fii
(no summation)
(13a)
Vyi(x) = y/y (x)^t
(no summation)
(13b)
where a,, /?, and £, are the fictitious nodal displacement, slope, and shear strain, respectively. The test functions are constructed, using Eqs. (10-11) based on three points i.e. two boundary points of the sub-domain of the node JC, and the node x, itself. Only the nodal shape function for nodal point xt is used (no summation). Substituting Eqs. (10-11, and 13) into the local weak form (12) leads to the following discrete equation:
Ky
> f.
d4)
Because the above relation (14) should hold for every local sub-domain 1 2 / , we can finally obtain the following matrix equation for the whole discrete system by collecting the equations obtained from each local sub-domain Qj'', without any element assembly:
Kwe>f
(15)
A numerical integration is needed to evaluate Eq. (14). Gauss quadrature is employed in each local sub-domain. For each Gauss quadrature point xQ, point interpolation is performed to obtain the integrand. Therefore, for a node x„ there are two local domains: test function domain Qs for V; * 0 (size rs), and interpolation domain £2t for xQ (size r,). These two domains are independent and defined as rs = a^, and r, = cc4b respectively, where a, and a; are coefficients and dt is the distance from the node i to its closest neighbouring node. It should be noted that it is sufficient to integrate in each local subdomain by using a conventional numerical integration scheme without any numerical difficulties. In this study, the variational inequality problem has been transformed into a linear complementary problem following a similar procedure as given in [3]. Then, the corresponding linear complementary equation is solved using mathematical programming solvers.
635
5.
Numerical examples
A cantilever beam gradually contacting with a rigid cylindrical supporting surface with a constant curvature l/R is analysed. This problem has been studied using thin beam theory in [3]. For a certain load the beam begins to come into contact with the cylindrical supporting surface with a contact region AD of length lc as the dashed line in Fig. 1. In the thin beam theory [5], there is no contact reaction force within the contact region AD. Instead, there is a concentrated contact force at the transition point D. This is not true in the thick beam limit [6]. Taking EI = 1, kGA = 1, L = 3, R = 5, and P = 0.1. The analysis is performed using 81 uniform nodes. The calculated contact region is 2.025, which agrees well with the exact value of 2.0348 with an error of 0.48%. Fig. 2 gives the calculated reaction force along the contact interface for an intermediate thick beam with kGA = 1 0 0 which shows good agreement with the analytical solution. Finally, the calculated contact regions under different values of kGA are given in Fig. 3 and compared with the analytical results. All the results in the graph show excellent agreement between the numerical and analytical results, and indicate that the proposed method gives high accuracy and no shear locking in thin beam limit. / - <
A
I
P > •
'.
B
I
c y Figure 1. Geometry of cantilever beam and cylindrical supporting surface.
6.
Conclusions
A meshless method based on the Meshless Local Petrov-Galerkin (MLPG) and Local Point Interpolation Method (LPIM) has been presented to solve the fourth-order boundary problems of thick beams involving unilateral contact conditions, based on a locking-free formulation. In this meshless method, polynomial interpolation functions with the delta function property were constructed by the LPIM technique. The problem of beams involving unilateral contact conditions is described by variational inequality. The corresponding linear complementary equation for this highly non-linear problem was derived by using the developed meshless method, and solved by mathematical programming. A contact problem for beams was examined to verify the presented approach. The present method is completely locking-free in the thin beam limit. References 1.
Cho J. Y., and Atluri S. N., Analysis of shear flexible beams, using the meshless local Petrov-Galerkin method, based on a locking-free formulation, Eng. Computet., 18 (2001) pp. 215-240.
636
Gu Y. T., and Liu G. R., A local point interpolation method for static and dynamic analysis of thin beams, Comput. Meth. Appl. Mech. Eng. 190 (2001) pp. 55155528. Xiao J. R., McCarthy M. A., and Liu G. R., Local form of variational inequality and meshless analysis of a beam involving unilateral contact conditions, submitted to Comput. Model. Eng. Sci. (2002). Timoshenko S. Strength of Materials, Part II: Advanced Theory and Problems, 3th-edtion, (Robert E. Krieger, New York, 1983). Hu H. C. Variational principles of elasticity and their applications, (Science Press, Beijing, 1981). (In Chinese)
4. 5.
1.2 — Exact solution o o
•
Numerical result
0.8
u (0
O 0.4 u
0
»«MMMIMf
0
0.5
i
1.5
2
2.5
X Figure 2. Contact force along the contact region (kGA = 100).
2.5 • Exact solution This study
O
TTTmn—rTTHTirr
1.E+00
1.E+02
1.E+04 1.E+06 kGA
1 r I I TII |
[ 1 Tl M i l
1.E+08
Figure 3. Contact regions with different values of kGA.
1 I I I I III 1
1.E+10
EFFICIENT PARALLEL ALGORITHM FOR LARGE-SCALE MOLECULAR DYNAMICS SIMULATION IN MICROSCALE THERMOPHYSICS BING WANG JIWU SHU WEIMIN ZHENG Department
of Computer Science and Technology, Tsinghua University, Beijing, China, wbing97@ mails, tsinghua. edu. en
100084
JINZHAO WANG Department of Engineering Mechanics, Tsinghua University, Beijing, China,
100084
Molecular dynamics (MD) simulation is an important research method in thermophysics. But it is difficult to implement the simulation with traditional serial algorithms because of a complex numerical calculation. In this paper, we propose algorithms based on a new force decomposition approach called Half Force-Block Decomposition (HFBD). The HFBD approach greatly reduces the memory usage and the communication cost, so it can be more easily to simulate a large-scale particle system. Furthermore, we propose two new strategies to maintain load balance, which is the main problem when parallel algorithms based on force decomposition apply to short-range MD simulation. The first technique which we called Random Redistribution strategy (RRD), randomly permuting the particle ordering when load is imbalanced and the other approach, called Optimal Redistribution strategy (ORD), makes a simple load-balance calculation based on the computing time of all processors, and achieves the optimal particle ordering. The parallel algorithm based on the above approaches was tested on a system of 4,000,000 particles and implemented on an SMP-cluster and achieved an efficiency of 67.2% on 120 processors. The numerical results show that the proposed parallel algorithm can simulate a thermophysical system with more particles than before efficiently.
1
Introduction
Molecular Dynamics (MD) is a numerical simulation method to study the dynamic behavior of multi-particle systems, which is widely used in microscale thermophysics. But the MD simulation always deals with a great complexity of computation due to numerous particles and simulating time steps, so that it can not be successfully solved with serial algorithms. The availability of high performance computing resources provides a new way to solve the multi-particle molecular dynamics simulation. Computational scientists has developed three types of parallel algorithms, namely, atom decomposition algorithm(AD), force decomposition algorithm(FD) and spatial decomposition algorithm(SD). In AD algorithm, particles are randomly distributed among processors irrespective of their spatial positions [1]. In FD algorithm, particle-pairs are evenly assigned to each processor [2, 3]. In SD algorithm, the simulating domain is divided into some sub-domains and each sub-domain is assigned to a processor [4]. Of the three algorithms, the AD algorithm has a very bad scalability and can not be applied in more than 10 processors. On the other hand, the SD algorithm always suffers a load imbalance problem. So the FD algorithm is the mostly used one in microscale thermophysics. Taylor in paper [3] proposed an efficient force decomposition algorithm. In this paper, a new force decomposition technique is proposed. This technique, which we call HFBD (Half-Block Force Decomposition), offers a new decomposition strategy of force matrix. Our new algorithm and Taylor's algorithm are both implemented on a cluster system with 144 processors, and the numerical results show that the former
637
638
can give a more efficient solution to the parallel MD simulation. And also, we propose two strategies to maintain load balance, namely, the Random Redistribution strategy (RRD) and the Optimal Redistribution strategy (ORD). When used to simulate a system of 4,000,000 particles, the new decomposition method, together with the new load balance strategy, achieves an efficiency of 67.2%. The rest of this paper is organized as follows. Section 2 and section 3 describe HFBD method and load balance strategies respectively. The benchmark and numerical results are given in section 4. 2
New parallel algorithm
2.1
force decomposition
Atam
| i 2 j <|« t r a | 9 ra u n\n it u ii\
list*
[
I I
K
l
.1 l | « ,
>', T < | 9 10 I I £ | »
14 l.< iZl
1
M i ^ t •£&•••'
Figure 1 Taylor's Algorithm
h
2
3
1 4
5
6
7
8
9
10
Figure 2 HFBD algorithm
A matrix called force matrix is often used to describe the FD algorithm as showed in Figure (1), which gives a picture of Taylor's algorithm. The element (i,j) in the force matrix stands for the force of particle j on the particle i, . In Taylor method, the force matrix is divided into P blocks, so that each processor can be assigned to a block of the force matrix. The position vector x is divided into VP sub-vectors and each has N / V P particles. We use p to denote a processor corresponding to the sub-block (i,j), which calculates all of the forces between particles from sub-vector i and sub-vector j . Due to the all-known NEWTON third law, f- and f ^ are equal in magnitude and opposite in directions. That is,
fj,=-f,j
(1)
So the tasks of Ptj and P-{ are in fact repeating. Taylor distributed the task between Py and PJI . For example, with N - 16; P = 16 (N is the number of particles and P is the number of processors) as illustrated in Figure (1), Pn is responsible to force calculation between particles
639 (1,2,3,4) and particles (5,6), and on the other hand, P21 is responsible to force calculation between particles (1,2,3,4) and particles (7,8). When both Pl2 and P2X complete the calculation, the two processors exchange the force results and so each processor has all of the forces between particles (1,2,3,4) and particles (5,6,7,8). Taylor's algorithm contains four types of communication: (1) the communication between/>. and/>7 as described above with a cost of Wll4P ; (2) the gathering of forces on sub-vector i among processors in the ith row, with a cost of Nl-fp; (3) the scatter of position information of sub-vector i to all of the processors in ith row, with a cost of N /y[P ; (4) the scatter of position information of sub-vector j to all of the processors in jth column, with a cost of N I^P • So the total communication cost of Taylor's algorithm is 9Nll4P (2) 2.2 HFBD force decomposition algorithm This section presents an algorithm based on a new force matrix decomposition technique, which is called Half Force-Block Decomposition algorithm. From equation (1), we can draw a conclusion that the force matrix is a skew-symmetric one as described with equation below:
FT=-F
(3)
The forces in the upper (or lower) part of the force matrix can be obtained easily once the forces in the lower (or upper) part are available. So only the lower (or upper) part of the force matrix is necessarily calculated. Our decomposition technique only deals with the lower force matrix, described in Figure (2). P, is responsible for block (1,1), P2 and P$ are responsible for block (2,1) and block (2,2) respectively. And the block (i,l) to block (i,i) are assigned from />(j(f_1)/2+1) to P,!U+l)/2) • The processor responsible for block (i,j) is Piia-n/2+j)'
wm
c h calculates forces between particles in sub-vector i and those in
sub-vector j . Suppose the lower part of 1 6 x 1 6 force matrix has been divided into 10 blocks and each is assigned to one of the 10 processors. So the block (4,1), (4,2), (4,3) and (4,4) are assigned to P 7 , Ps, P9 and Pm respectively. P5 not only is responsible for the force calculation of particles (5,6,7,8) on particles (9,10,11,12), but also calculates the forces of particles (9.10.11.12) on particles (5,6,7,8). Compared to Taylor's algorithm, the communication cost of the HFBD has been reduced. Only two kinds of communication are involved in the new algorithm: (1) the gathering of forces on sub-vector i among processors in the ith row, with a cost of ; (2)the scatter of position information of sub-vector i to all of the processors in ith row, with a cost of NI . The first and fourth communication types in Taylor's algorithm are not necessary in HFBD. So the total communication cost is only
2N/4F, where P is the number of processors in Taylor's algorithm when force matrix is divided into the same size with the HFBD. We have, VF = (V8P + l-l)/2
(4)
For example, when P in Figure (2) is 10, the P in Figure (1) is 16. So the total communication cost of HFBD is
640
2A^/VP7 = 4iV/(V8P + l - l ) = V2yV/VP,
(5)
which is less than one third of the cost of Taylor's algorithm. The HFBD reduces the communication cost of traditional FD algorithms so that is expected to offer a higher efficiency and a better scalability. The comparison result between HFBD and Taylor's algorithm is given in section 4. 3
Load balance strategy
Murty in paper [5] proposed static load balance (LB) strategy, which assigned equal number of particle-pairs to each processor before the simulation began. This kind of strategy is useful in a long-range MD simulation, in which each particle-pair stands for a unit of force calculation, so because processors have equal number of particle-pairs, they also have equal amount of calculation task. But when dealing with short-range MD simulation, in which only a pair of particles close enough to each other do have interaction. Even processors with equal number of particle-pairs always have different calculation tasks. Generally speaking, Murty's load balance strategy can not be successfully applied to short-range MD simulation. We present two dynamic strategies to maintain the run-time load balance. The first one is Random Redistribution strategy, which randomly permutes the particle ordering when load imbalance occurs. So a force matrix with uniformly sparse is expected. But this strategy has an obvious shortcoming that is the random redistribution only leads to a random result. Often we get the satisfactory result, but in some cases, we do get a bad result. So we present the second strategy to solve this problem, which we call Optimal Redistribution strategy. This strategy permutes the particle ordering too when load imbalance occurs, but the permutation is based on the spatial distribution of particles, not random. The ORD strategy is described in two steps as below: Dividing the simulated domain into cubes as LinkCell methods does and then ordering the cubes in a spatial index; determinating which cubes each particle belongs to and then building an index of particles in which particles from the same cube have successive index numbers. Reordering particles as below:
l,M +l,2M +l—;2,M + 2,2M +2, — ;
;M,2M, — \
where,
M =VP 7 = (V8P + l - l ) / 2
(6)
After the above two steps, the particles in the same cube are distributed as evenly as possible among all processors. So the force matrix has the uniformly sparse and the load of each processor is equal to each other. In the next section, numerical results are illustrated to show the comparison of load balance strategies. 4
Numerical results
Firstly, as illustrated in Figure (3), we make a comparison of the speedups of HFBD and Taylor's algorithm. The newly proposed HFBD algorithm is more efficient than Taylor's in almost all of processor levels (with N = 108.000). Secondly, the comparison of load balance strategies is showed in Figure (4). Form
641 that figure, we can see that both RRD and ORD strategies are more useful than Murty's static strategy. Furthermore, the ORD has a little higher efficiency than the RRD, the same as our theoretical analysis. Thirdly, we use the HFBD method to simulate a multi-particle system with 4,000,000 particles and achieve an efficiency of 67.2% in 120 processors. The result shows that the HFBD algorithm has a high scalability. - Murty's static strategy - R R D strategy ORD strategy 800000
,§.
600000 -
| C 3
400000200000 -
simulating step
number of processors
Figure 3 Speedup of algorithms
5
Figure 4 Comparison of LB Strategies
Conclusions
In this paper, a new force decomposition algorithm called Half Force-Block Decomposition is presented to do molecular dynamics simulation for microscale thermophysics. This decomposition technique is divided the lower part of force matrix into blocks and reduces the communication cost of traditional FD algorithms. We propose two load balance strategies, namely, Random Redistribution strategy and Optimal Redistribution strategy, which maintain the load balance in run-time.
References [1]
W. Smith, A replicated data molecular dynamics strategy for the parallel Evald sum, Comp. Phys. Comm. 62 (3) (1992) 392-406
[2]
S. Plimpton, Fast parallel algorithms for short-range molecular dynamics, J. Comput. Phys. 117(1) (1995) 1-19.
[3]
V.E.Taylor, R.L. Stevens, K.E.Arnold, Parallel molecular dynamics: Communication requirements for massively parallel machines, in: Proceedings of the Fifth Symposium on the Frontiers of Massively Parallel Computation. 1994,156-163
[4]
D. Brown, J.H.R.Clarke, M.Okuda, T.Yamazaki, A domain decomposition strategy for molecular dynamics simulations on distributed memory machines, Comp. Phys. Comm. 74 (1993)67-80
[5]
Ravi Murty, Daniel Okunbor, Efficient parallel algorithms for molecular dynamics simulations, Parallel Computing. 25 (1999) 217-230
IMPROVING THE CELL MAPPING METHOD AND DETERMING DOMAINS OF ATTRACTION OF A NONLINEAR STRUCTURAL SYSTEM
Q. DING Department of Mechanics, Tianjin University, Tianjin, PR China, 300072 E-mail: [email protected] Z. S. LIU Computational Mechanics Division, Institute of High Performance Computing 1 Science Park Road, #01-01 The Capricorn, Singapore Science Park II, SINGAPORE, 117528 E-mail: liuzs @ ihpc.nus. edu.sg J. J. LI Department of Mechanics, Tongji University, Shanghai, P. R. China, 200092 E-mail: [email protected] A process defined as "mapping trajectory pursuit (MTP)" was introduced to cell mapping techniques based on spatial Poincare sections. Such an improvement brings about the exact determination of properties of all cells in analyzing sequences and further reduction in memory and computational time. For the purpose of prediction of stability boundary as a function of initial conditions (domains of attraction), an initial condition region was defined. The proposed method was then applied to analyze the aeroelastic behavior of an aeroelastic system with bilinear-type structural nonlinearity. Different types of periodic motions were determined through presentation of domains of attraction.
1
Introduction
Nonlinear dynamic systems can have several distinct steady-state solutions depending on the particular initial conditions. However, determination of domains of attraction using direct numerical integration is often extremely time-consuming. In 1980, the "simple cell mapping" (SCM) method was proposed by Hsu [1,2] as an advanced computation technique for global analysis. Based on this concept, "generalized cell mapping" (GCM) [3] and "interpolated cell mapping" (ICM) [4] have also been developed thereafter. But these methods still remain time- and memory-consuming when applied to high-order systems. A further development to reduce the amount of cells in calculation led to the introduction of "Poincare like simple cell mapping" (PLSCM) and "Poincare linear interpolated cell mapping" (PLICM) [5, 6], which combine the use of spatial Poincare sections with SCM and ICM, respectively. Due to either the mid-points of cells or interpolated points of cell vertices but not the actual mapped positions are used as the initial values of every iteration, the solutions obtained are unavoidably approximate. Thus the limitation to cell sizes confines application of the methods in high-order dimensional systems. Besides, for interpolated-type methods, the cells generated in a procedure may wrongly be determined as "sink cells" even if the trajectory only leave the domain of interest temporarily. In this paper, a process defined as "mapping trajectory pursuit (MTP)" was introduced to cell mapping techniques based on spatial Poincare sections. The initial condition region was also defined for the special purpose of prediction of the domains of attraction. Using the improved method, the complicated flutter of a binary aero-elastic system with bilinear-type structural nonlinearity in torsion was analyzed.
642
643
2
Improvement on cell mapping method
In SCM and ICM, an M-dimensional dynamical system is transformed into point-to-point ip-p) mapping by numerical integration over a time interval T such that x{j + \) = P(x{j)), P: RN^R" (1) which means that x(j), a point in state space, is mapped by P after a period of time x into a point x(j +1). Then cells in the state space are defined according to the procedures described in [1, 2, 3] on the basis of a series points obtained by (1). Instead on time sections, we obtain the p-p mapping (1) on a spatial Poincare section 2 , an (Af-7)-dimensional hyperplane in R" state space being transversal to trajectory of the system. Such a procedure results in P: 2->2, and reduces the dimension of analyzing space by keeping one coordinate to be constant. The cell mapping unraveling is then applied to the intersecting points obtained on z . In addition, we record xJ (j^l) as the representing point of cell z', R(z'), and use it to determine the state of the trajectory. We also define an initial condition region Q , a subspace of R" from one- to Af-dimension, which covers all initial conditions to be investigated. Q is different from the domain of interest S c 2 [5], both in size and/or order of dimension. There are four cases one may encounter at each step during constructing a processing 1. 2.
3.
The newly generated cell zJ is virgin. In this case, x ; is recorded as R(z')and the integration of the present sequence is continued. z' has appeared before in the present sequence. A new periodic motion is found only when the distance between the new obtained point x ; and the representing point of the cell z', \R(z') - x J | , is less then a given small value d, • z' has appeared in one of the previous sequences. The current processing sequences is believed being attracted to an attractor only when |/J(z J )-x J '|is less then a given
small value d2 (usually, d2 can be reasonably set much larger than dl). 4. A cell is mapped outside S . We continue the numerical integration until the mapped points either return into S again or be assured to be divergent. The process that the actual positions of mapped points on z are recorded to represent the cells and followed until the final determination (even during they leave S) is defined as "mapping trajectory pursuit (MTP)". The MTP results that the size of S can be much smaller because it is unnecessary to contain the whole steady-state orbits in R" state space, but only part of their intersecting points on z. Contrarily, the sizes of cells can be reasonably larger because the criterion applied in exact numerical integration procedures is used to determine whether a newly mapped point is the representing point of the cell. These two aspects lead extensive reduction of amount of the required cells in calculation. Consequently, the computer time is also reduced considerably. So the proposed approach appears more appropriate for global study of high-order systems.
644
3
Analysis of a nonlinear aeroelastic system
Consider a rigid wing of constant chord pivoted at its root in bending and torsion such that there is no stiffness coupling between the motions. The equations of motion of the system are derived using quasi-steady aerodynamics [7] in dimensionless form as Aij + (pVB + D)g + (pV2C + 1E)q = 0 (2) where q = (y,ey , 7 is bending angle, 9 the torsional angle, p the air density, and V the air (or M(6) the wing) speed. A, B, D, C and E are mass, aerodynamic damping, structural damping, aerodynamic stiffness and the structural stiffness matrices, respectively. The bilinear stiffness in torsional direction is considered, as shown in Figure 1. In the following analysis, we take 9 = 0 as X to cope with the stable equilibrium points and limit cycles as well. The one-sided intersections of Figure 1. Bi-linear stiffness in trajectory with X from negative 9 to positive 9 are torsional direction taken as the p-p maps (1). For simplicity, Q includes only initial conditions in 0 direction, and there are always -y(0) = 0, 7(0) = 0 • Letting k = K'JK„ =0.1, the domains of attraction are determined using the proposed CM method as shown in Figure 2, which show occurrences of different motions as functions of v - 9(0), V ~ 9(0) and e(0) - 9(0) • Motions are classified as: damped stable motion (to trivial equilibrium position), limit cycle oscillation (LCO), complicated periodic motion (with period >2), chaotic motion and divergent flutter. The result demonstrates that the small initial conditions, say |9(0)|<1.5 or |9(0)|<0.6> result damped motions for V<26.2 m/s, unsymmetrical LCOs over the velocity range 26.2
Conclusion
Because of the introduction of MTP technique to the spatial Poincare sections based cell mapping method, both the amount of cells and the computation time can be greatly reduced. The global dynamic properties of all cells in analysis sequences can exactly be determined. The definition of the initial condition analysis region leads the method especially appropriate to predict the stability boundary as a function of initial conditions.
645
The proposed CM method has been proven to be efficiency in revealing the global behaviors of high-order nonlinear dynamic systems.
. «"«>*- ' W W
I I 111 M I I.I I I I I I l i » I I I I I I 111 I I I I I I
H-H I I I T I + H I.I
I
I u-f.-H n n'hfrl i i i:+x i I I i l l 111 I J I 11 i.n nil.11.1,1.1.1.11"
» « . * « « » •"• »•+•• * W W 7 HI(K»11»»I1+|."^TO:
" » » » ! " • " » ' : ' TO77 ««iiMx«ii«i<--vggw.. " » " » » • « • •?TO^r
• X X « H » X X^M. ' W C W • » « » « | ' « » ».#*- -* VAAAiV X M X M X K X X X+-. • S A A W »»1(«HI<«»I<^':OTTO:V
iji n i l i
f
» « » » « « " »*+'• WAA
MI
1111«
»»).i««++-'OTvW •"""•"^-WyVV ,».»ii»ii»W:-TOTO7 ;*KX.*XX+<».«
-de •.!!"!! ! ! !
U l-H;t;.l,l I I * "
;W\W
'** M" x.x.+
: XXXXXX X X > l A " . W W
-15 .-*+**"«•<».-
*»»««"« »*+':• W A V .xxx*i«ii.xx>*. •;\AAXV: 4o;
45
"50-
'•° rf+XXXXxx: ." -f+XXXX.KX-
;ss y
55 y
111 §
x •*++++++.++«'.•*.*:* «•.« x x + + + + + + + + • Vii,x> x x ® x x x x x x « x x
:
+ ©•_+:. ^_=+ • * ' • - . * . , • '••:;. '+ + . + „ Q .... . + + + .
x . x ; x y x
-IS
0(0)
0(0) V=15 m/s
y=45 m/s
Figure 2. Domains of attraction: '+' — damped stable motion; 'x' — LCO, '*'— period 2 motion; ' • ' periodic motion with period greater than 2; 'V' — chaotic motion; and ' ' (blank).— Divergent motion.
References Hsu C. S., A theory of cell-to-cell mapping dynamical system, Journal of Applied Mechanics 47 (1980) pp. 931-939. Hsu C. S. and Guttalu R.S., An unravelling algorithm for global analysis of dynamical systems: an application of cell-to-cell mappings, Journal of Applied Mechanics 47 (1980) pp. 940-948. Hsu C. S„ Cell-to-cell mapping: a method of global analysis for nonlinear system (Springer-Verlag, New York, 1987). Tongue B. H. and Gu K., Interpolated cell mapping of dynamical systems, Journal of Applied Mechanics 55 (1988) pp. 461^166. Levitas J., Weller T. and Singer J., Poincare-like simple cell mapping for non-linear dynamical system, Journal of Sound and Vibration 176 (1994) pp. 641-662. 6. Levitas J. and Weller T., Poincare linear interpolated cell mapping: method for global analysis of oscillating system, /. of Applied Mechanics 62 (1995) pp. 489^195. 7. Hancock G. J., Wright J. R. and Simpson S., On the teaching of the principles of wing flexure-torsion flutter, Aeronautical Journal 89 (1985) pp. 285-305.
HIGH RATE DYNAMIC RESPONSE OF STRUCTURE USING SPH METHOD Z. S. LIU Computational Mechanics Division, Institute of High Performance Computing, 1 Science Park Road, #01-01 The Capricorn, Singapore Science Park 11, SINGAPORE, 117528 E-mail: [email protected] S. SWADDIWUDHIPONG AND C. G. KOH Department of Civil Engineering, The National University of Singapore,10 Kent Ridge Crescent SINGAPORE, 119260 E-mails: [email protected]; [email protected] The dynamic responses of structures under high rate loading using Smooth Particle Hydrodynamics (SPH) approach are studied. The SPH equations governing the elastic and elasto-plastic large deformation dynamic response of solid structure are derived. The proposed additional stress points are introduced in the formulation to mitigate the tensile instability inherent in the SPH approach. The incremental rate approach is introduced and the solution algorithm is developed. Examples on high velocity normal impact of the solids are presented and results from the proposed SPH approach compared with finite element solutions illustrating that the high rate dynamic response such as high velocity impact problem can be effectively solved by proposed SPH approach.
1
Introduction
In many engineering problems, the transient response of solid involving high rate deformation and loading is often encountered. If the rate of onset of the load is high compared to the time needed to reach the steady state, wave propagation phenomena have to be considered. The dynamic response due to high velocity impact is a special case, in which inertial effects must be considered in the governing equations and the stress wave propagation plays an important role in the analysis of this class of problems. For this reason, the high rate dynamic response becomes quite complex. With the development of high performance computing, the most popular and cost-effective approach for solving high rate (such as high velocity impact) problem is the discretization method, such as finite element or mesh-less method. In the past few decades, the finite element method has been developed to simulate the dynamic response of structures subjected to high rate loading (e.g. high velocity impact) and the method has been widely used. However, one of the main drawbacks of the mesh-based method like FEM to treat high velocity impact is the need to remesh for severe element distortions especially when the solid continuum undergo high rate large deformation. Unfortunately the remeshing procedure introduces projection error and reduces the accuracy of the numerical solutions. In order to
646
647
remove this inaccurate remeshing procedure, a meshless method (or particle method) such as Smooth Particle Hydrodynamics (SPH) has been developed to solve the large deformation and high rate dynamic problems in solid mechanics. SPH is a meshless Lagrangian method that offers considerable promise as a numerical tool for modelling problems involving large deformations and large distortions whereby the motion of a discrete number of particles of a solid is followed in time. SPH was first introduced and developed for treating astrophysics problems, and was applied successfully to high velocity impact problems [4]. Since then, credibility of the SPH method for modelling solid media has been verified with numerous experimental impact results. As SPH uses a Lagrangian formulation for the equations of motion, it does not involve a distortion limiting grid and is therefore very attractive for high velocity impact simulation. In this paper, the dynamic responses of structures under high rate loading using SPH approach are presented. The proposed additional stress points are introduced in the formulation to mitigate the tensile instability inherent in the SPH approach. The incremental rate approach is introduced and the solution algorithm is developed and implemented. 2
Governing equations of SPH method for solid mechanics
The foundation of SPH is the interpolation theory. In solid mechanics, the SPH form of the conservation equations can be expressed as [4,5]. dt dt dE,
(i
r
j)
dx*
Pj
, ' p] a«
p v
p) dx*
, „ ;
dW,
dt p) Y a*" where i and j represent the particle number; m; and p ; are the mass and density of particle j ; of , v° are the stress tensor and velocity of the particle j , respectively. £ is the energy of particle i and Wtj is a kernel function which satisfies some special properties. Although a few possible kernel functions exist, the most widely used cubic B-spline kernel is adopted in this study.
648
3
Constitutive equation
In classical plasticity, hydrostatic pressure p is usually calculated using the linear Hooke's law when p is small. For severe hydrostatic pressure, the pressure should be evaluated with Equation of State (EOS) having the functional p = p(p, E). The EOS employed in this study is the well-known Mie-Gruneisen EOS for solids [2]. In the elastic regime, the deviatoric stress rate can be determined through Hooke's law, S = 2Gz'. For finite rotation, the deviatoric stress should be determined through the incremental plasticity theory. To account for the large rotation effect, the elastic deviatoric stress rate, Safl, is computed using the Jaumann rate definition: S"B =2G(e c , p -(l/3)5 t , p e Y r ) + 5 aY ri PY + 5 Y,, n aT
(2)
where eaP and £2°* are the strain and rotation rate tensors, respectively. The SPH forms for evaluating the strain and rotation rates are E* = ( l / 2 ) £ ( m , /p.)[(v; -v:W^
+(v] -v*)WijM]
CX* = ( l / 2 ) £ ( m . /p 7 )[(v; -v, a )W^ -(vj -vf )WiJa]
(3)
j
As large deformation elasto-plastic transient dynamic analysis is pathdependent, the incremental procedure is adopted in the present study. 3.1 Elastic Case The incremental stresses can be expressed in terms of incremental strains. In the hydro-dynamics analysis, the incremental stress can be defined in terms of hydrostatic pressure and traceless symmetric deviatoric stress: '^af^Sf-P^ and p = -(l/3) ,+ >7 (4) which '*" of is the increment stress of particle i from time t to t + At. The incremental deviatoric stress, '^Sf, is computed using the Jaumann rate definition as stated in equation (2). 3.2 Elasto-plastic Case For elasto-plastic case, the incremental stress can be expressed as functions of incremental strain in an average sense [1]. If ,+^Ca(iY5 are elastic-plastic stiffness coefficients during the time interval (t, t + At), the constitutive equation can be given by /+4/
a|J _I+4/(-•
1+4/pv8
(C\
649
4
Tension instability treatment
Standard SPH methods have been plagued by a serious problem referred to as tension instability.In 1-D problems, tension instability will cause the simple elastic bar to break apart in tension. For 2-D and 3-D problems, tension instability will produce a clustering of particles which may lead to a premature fracture. Dyka et al. [3] proposed a stress-point method to treat this problem. The basic principle includes calculating the values of stress at points other than the SPH centroids in order to remove the instability. This approach completely eliminates tension instability for a 1-D bar producing accurate solutions for several SPH formulations. The concept is adopted and expanded to cover 2-D and 3-D problems. The stress-point method for SPH is analogous to full integration in FEM. For standard SPH, the stress components are calculated at the centroid of the SPH particle analogous to a reduced integration form of FEM. In this approach, stress, internal energy, and density are calculated and tracked at the stress points while the displacement, velocity and acceleration are all calculated and monitored at the centroid of each particle [2]. As the stress tensors in particle are included in the linear momentum equation of the same particle, the tension instability is eliminated. 5
Numerical examples
In order to validate the performance of the proposed SPH approach, two examples are presented here. The first one is to analyze the dynamic response of two aluminum bars at high speed impact as shown in Figure 1. A FE simulation with commercial software ABAQUS is also conducted. Figure 2 shows the stress profiles along two impact Aluminium bars at time 0.35 micro-sec. Comparison of the results shows that the effective stress by SPH agree well with FE solutions. In order to overcome the numerical instability caused by shock wave, artificial viscosity is adopted in SPH motion equation. The second example is a square aluminum plate subjected to a high velocity impact by a steel cylinder as shown in Figure 3. The deformed particle position contour at time 2.0 micro-sec. is shown in Figure 4.
Figure 1. Problem description and SPH model
Figurer 2. Stress profiles along two impact Alu. bars
650
Figure 3. SHP model for impact of plate and cylinder
6
Figurer 4. Deformed particle position contour
Concluding remarks
In this paper, the dynamic responses of structures under high rate loading using SPH approach are presented. The SPH equations, which govern the elastic large deformation dynamic response of solid structure, are derived. Two examples considering high velocity impact of structures are presented. The results illustrate that the high velocity impact problem can be effectively solved by the proposed approach and the SPH is a reliable method and could be used to deal with high rate dynamic response. References 1. Chen, J. K., Beraun, J. E. and Jih, C. J., A corrective smoothed particle method for elastoplastic dynamics, Computational Mechanics, 27 (2001) pp. 177-187. 2. Drumheller, D. S., Introduction to Wave Propagation in Nonlinear Fluids and Solids, (Cambridge University Press, Cambridge, 1998). 3. Dyka, C. T., Randies, P. W. and Ingel, R. P., Stress points for tension instability in SPH. Int. J. Numer. Meth. Engrg. 40 (1997) pp. 23252341. 4. Libersky, L. D., Petschek, A. G., Carney, T. C , Hipp, J. R. and Allahdadi, F. A., High strain Lagrangian hydrodynamics, a three dimension SPH code for dynamic material response. J. Comp. Phys., 109 (1993) pp. 67-75. 5. Liu, Z. S., Swaddiwudhipong, and Koh, C. G., Stress wave propagation in 1-D and 2-D media using Smooth Particle Hydrodynamics method. Structural Engineering and Mechanics. 14 (2002) pp455-472.
T H E G E N E R A L I Z E D D I F F E R E N T I A L QUADRATURE RULE
T. Y. WU AND Y. Y. WANG Computational Mechanics Division, Institute of High Performance Computing, #01-01 The Capricorn, Singapore Science Park II, Singapore, 117528 E-mail: [email protected], [email protected] G. R. LIU Department of Mechanical Engineering, National University of Singapore, 10 Kent Ridge Crescent, Singapore, 119260. E-mail: [email protected] The basic idea of the differential quadrature (DQ) method is to approximate a derivative of a function at a point as a weighted linear sum of the function values at all the discrete points. The present authors have advanced the generalized differential quadrature rule (GDQR) expressing the DQ as a weighted linear sum of both the function values at all the discrete points and the function derivatives at points wherever necessary. The conventional DQ method is usually applied to solve differential equations which are constrained by one condition at one point. The GDQR aims at solving high order differential equations, which may have more than one boundary/initial condition at any discrete point. The GDQR enforces the same number of independent variables as the number of constraint conditions at any discrete point. Now that the GDQR becomes the DQM when the number of conditions at any discrete point equals one, we can conclude naturally that the GDQR is a generalization of the DQM. The authors have extended the DQ technique to some cases where the DQM has never been used. The conventional 5-point technique proposed by Bert and associates has been completely eliminated. The GDQR is a general method to solve differential equations in a global form, as opposed to the finite difference (FD) method in a local form. This paper reviews its recent various applications and points out some restrictions and further potential applications.
1
Differential quadrature method (DQM)
Bellman and Casti [1] proposed the DQM in 1971 to solve nonlinear partial differential equations. The DQM has since been applied to diverse areas and gradually demonstrated itself as a numerical method to solve initial and boundary value problems [2,3]- The DQM approximates a derivative of a function y/(x) at a discrete point x, (i=l,2,..., AO as:
^ > ax
=
X 4 V ; (* = l,2,...,iV;r>D . =1
where y/py^jc,) and A]p are the weighting coefficients for a rth-order derivative at point Xj, N is the total number of discrete sampling points in the domain. The review paper [2,3] presented both the state of the art of the DQM and a survey of its application fields. It should be emphasized that the conventional DQM could only cope with differential equations that have one condition at one point, since the DQM only chooses one independent variable (function value) at one point. As reviewed [2], the DQM is also called the generalized collocation method. In order to apply the DQM to high-order differential equations with multiple conditions at a point, a 5-point technique [3] was proposed and used to structural beams, plates, and shells in the last decade. The 5point technique forced an adjacent domain point to act as a boundary point. Therefore, one condition still corresponded to one point.
651
652
An apparent and natural choice of trial functions for the DQM is the Lagrange interpolation polynomials, while their general weighting coefficients have been said to be first found by Shu and richards [11] and named as Generalized differential quadrature (GDQ). As pointed out [3], the Lagrange interpolation polynomials are only one choice of trial functions. Then Shu called the coefficients as "Shu method" in his monograph [10]. In fact, Michelsen and Villadsen [9] derived these coefficients in the name of collocation method in as early as 1972. It is about the same time when the DQM was proposed. Shu and richards [11] found an alternative way to obtain only the diagonal terms in the differentiation matrices.
2
Generalized differential quadrature rule (GDQR)
The GDQR aims at solving high order differential equations without using the 5-point technique. As opposed to the DQM, the GDQR considers a more general situation, where the field function y/(x) is governed by a differential equation and constrained by a set of given conditions at any point. The solution domain is divided into points x, (/=1,2,..., AO that include all the points with given conditions. If the function !//(*) has to satisfy n, conditions (equations) at JC„ the GDQR expresses its differential quadrature as: [12,15-18]
"*
j=\ 1=0
k=\
where EJp (which are a convenient expression of E%') are the GDQR's weighting nt is the number of the total independent variables Gk:
{GvG2,...,Gk...,GM h k > \ W,)....,W'-1)>...,<. vtfUvtf"- 0 } where V/ = V (*/) (£=0,1,2,..., n,-\) are its Ath order derivatives. It is shown that the GDQR forces the same number of independent variables V' (•*/) (k=0,l,2,..., n—l) as that of the equations at a point and that its independent variables are chosen as the function value and its derivatives of possible lowest order wherever necessary. One of the most important parts of the DQ technique is the determination of the weighting coefficients. The authors have derived the GDQR's explicit coefficients using the Hermite interpolation functions for third-, fourth-, sixth-, and eighth-order boundaryvalue problems and initial-value differential equations of second to fourth orders [12-19]. It is apparent that the GDQR's application to high-dimensional problems are quite different from the DQM's same applications [18,22]. The notation for Hermite functions should be differentiated clearly. The Hermite orthogonal function has a domain [-°°, +°°]. The often-discussed Hermite interpolation functions only define the function values and their first derivatives at all the discrete points. The Hermite interpolation functions can be generalized to use function values and any corresponding lowest order derivatives at any discrete point. In interpolation theory,
653
the Hermite interpolation functions with various lowest order derivatives at any discrete point are also called the generalized Lagrange interpolation functions or Hermite-Fejer interpolation functions. In their differentiation forms, it is clear that the GDQR is a generalization of the DQM. The present authors not only generalize the DQ method itself but also obtain various explicit weighting coefficients.
3
GDQR's applications and discussions
The following applications show that the GDQR is a general method for solving high order differential equations. 1. Third-order boundary-value ordinary differential equations (ODEs): Blasius and Falkner-Skan equations. [5] 2. Fourth order boundary-value ODEs: beam, circular plate, and shells of revolution equations. [12,13,14,18,19,20,21] 3. Sixth-order boundary-value ODEs: Onsager equations in fluid mechanics and circular arch equations in solid mechanics. [5,6,16] 4. Eighth-order boundary-value ODEs: cylindrical barrel roof equations. [8,17] 5. Second to fourth order initial value ODEs: Duffing equations. [4,12,15] 6. Domain decomposition applications for structural beams, circular plates and circular arches. [6,13,14,19,21] 7. Partial differential equations: rectangular plate and beam vibration problems. [18,22] As compared with the 5-point technique, the GDQR presents a straightforward application of multiple conditions. The FEM and FDM are suitable for complex geometrical and discontinuous problems due to their locality, while the DQ methods have corresponding difficulties. In essence, this means that the DQ methods may primarily be a complementary approach, to be efficiently used for nonlinear problems with simple geometry and high smoothness, rather than being a real alternative for the FEM or FDM.
References 1. 2.
3. 4.
5.
Bellman R. and Casti J., Differential quadrature and long term integration. Journal of Mathematical Analysis and Applications 34 (1971) pp.235-238. Bellomo N., Nonlinear models and problems in applied sciences from differential quadrature to generalized collocation methods. Mathematical and Computer Modelling 26 (1997) pp.13-34. Bert C.W. and Malik M., Differential quadrature method in computational mechanics: a review. Applied Mechanics Review 49 (1996) pp. 1-27. Liu G. R. and Wu T. Y., Numerical solution for differential equations of Duffing-type non-linearity using the generalized differential quadrature rule. Journal of Sound and Vibration 237 (2000) pp.805-817. Liu G. R. and Wu T. Y., Application of generalized differential quadrature rule in Blasius and Onsager equations. International Journal for Numerical Methods in Engineering 52 (2001) pp. 1013-1027.
654
6.
7. 8.
9. 10. 11.
12. 13.
14.
15. 16.
17.
18.
19.
20.
21.
22.
Liu G. R. and Wu T. Y., In-plane vibration analyses of circular arches by the generalized differential quadrature rule. International Journal of Mechanical Sciences 43 (2001) pp.2597-2611. Liu G. R. and Wu T. Y., Multipoint boundary value problems by differential quadrature method. Mathematical and Computer Modelling 35 (2002) pp.215-227. Liu G. R. and Wu T. Y., Differential quadrature solutions of eighth-order boundaryvalue differential equations. Journal of Computational and Applied Mathematics 145 (2002) pp.223-235. Michelsen M. L. and Villadsen J., A convenient computational procedure for collocation constants. The Chemical Engineering Journal 4 (1972) pp.64-68. Shu C., Differential Quadrature and its Application in Engineering (London, Springer-Verlag, 2000). Shu C. and Richards B. E., Application of generalized differential quadrature to solve two-dimensional incompressible Navier-Stokes equation. International Journal for Numerical Methods in Fluids 15 (1992) pp.791-798. Wu T. Y. and Liu G. R., A differential quadrature as a numerical method to solve differential equations. Computational Mechanics 24 (1999) pp. 197-205. Wu T. Y. and Liu G. R., Axisymmetric bending solution of shells of revolution by the generalized differential quadrature rule. International Journal of Pressure Vessels and Piping 11 (2000) pp. 149-157. Wu T. Y. and Liu G. R., A generalized differential quadrature rule for analysis of thin cylindrical shells. In Computational Mechanics for the Next Millennium, Vol.1, {Proc. of fourth Asia-Pacific Conference on Computational Mechanics, December, Singapore, 1999), ed by C. M. Wang, K. H. Lee and K. K. Ang, (The Netherlands, Elsevier Science Ltd., 1999) pp.223-228. Wu T. Y. and Liu G. R., The generalized differential quadrature rule for initial-value differential equations. Journal of Sound and Vibration 233 (2000) pp.195-213. Wu T. Y. and Liu G. R., Application of generalized differential quadrature rule to sixth-order differential equations. Communications in Numerical Methods in Engineering 16 (2000) pp.777-784. Wu T. Y. and Liu G. R., Application of the generalized differential quadrature rule to eighth-order differential equations. Communications in Numerical Methods in Engineering 17 (2001) pp.355-364. Wu T. Y. and Liu G. R., The generalized differential quadrature rule for fourth-order differential equations. International Journal for Numerical Methods in Engineering 50 (2001) pp. 1907-1929. Wu T. Y. and Liu G. R., Vibration analysis of beams using the generalized differential quadrature rule and domain decomposition. Journal of Sound and Vibration 246 (2001) pp.461-481. Wu T. Y. and Liu G. R., Free vibration analysis of circular plates with variable thickness by the generalized differential quadrature rule. International Journal of Solids and Structures 38 (2001) pp.7967-7980. Wu T. Y., Wang Y. Y. and Liu G. R., Free vibration analysis of circular plates using generalized differential quadrature rule. Computer Methods in Applied Mechanics and Engineering 191 (2002) pp.5365-5380. Wu T. Y. and Liu G. R., Application of the generalized differential quadrature rule to initial-boundary-value problems. Journal of Sound and Vibration (in press)
RECOVERY BASED SUBMODELING FINITE ELEMENT ANALYSIS HAI GU AND ZHI ZONG Institute of High Performance Computing, 1 Science Park Road, #01-01, The Capricorn, Singapore Science Park II, Singapore 117528 E-mail: [email protected]. zonezki@ ihpc.a-star.edu.se Submodeling analysis is a technique to achieve efficiency when detailed analysis is required in local region of a large structure. Traditionally, results of the global finite element analysis, internal nodal forces or displacements, are directly applied as submodeling boundary conditions. In present paper, a new approach is developed, which uses the stresses obtained by superconvergent patch recovery procedure to determine the forces on submodel boundary. The proposed method is convenient to implement because recovered stresses are represented by polynomial expansions, it also has higher accuracy because recovered stresses are generally more accurate than raw finite element results.
1
Introduction
Submodeling finite element analysis is a technique intending to effectively obtain accurate information in local critical regions of a large structure, in which, the local region of interest is broken out as a submodel after the initial global analysis and analyzed separately using refined meshes with boundary conditions derived from the initial global results. Obviously, the accuracy of analysis depends on the quality of boundary conditions. Traditionally, the raw results of the initial global analysis, displacements or internal nodal forces, are directly applied. Several techniques are available in literature to enhance the accuracy of displacements boundary conditions [3], but few can be found for forces although using forces as boundary conditions is more likely to obtain accurate results. In present paper, recovered stresses, which are generally more accurate than raw finite element results, are employed to determine the boundary forces. By this recovery based submodeling procedure, both accuracy and applicability can be remarkably improved. Two-dimensional large deformation problem using bilinear element is mainly concerned, but the concept of the procedure can be extended to more general cases. 2 2.1
Recovery based submodeling finite element analysis Stress recovery along submodel boundary by Superconvergent Patch Recovery method
After initial global analysis, stress along submodel boundary is recovered. An effective recovery procedure discussed in [2] which is a modified version of Superconvergent Patch Recovery (SPR) procedure [1] is adopted here. Its outline is as follows. After finite element analysis, a patch is defined for each vertex node inside of the domain by the union of elements sharing the node. Over the patch, a continuous field represented by a polynomial expansion a* is assumed for each stress component aj as, <x/=Pay=[l
x
y
xy x1
y 2 ] [a) a)
655
a)
a)
a)
a«]
T
(1)
656
The unknown parameters a, are determined by solving a least square problem as F(a,)=minKaJ
(2)
f(i j )=| i k(x ffl ,yJ-(r;(^,yj] 2 = I i ^(^,yJ-P(^,y m )iJ 2
(3)
with
where ns is the number of integration points inside the patch, {xm,y„) denotes their coordinates in the deformed configuration; a){xm,ym) represents raw FEA result. Then, stresses at the assembly vertex node, central point of elements and the midpoint of element edges sharing the assembly node are computed by substituting their coordinates into Eq.(l). For some of those points, different recovered values may be computed from the patches overlapping at them. In this case, the final recovered value is determined by a weighted average scheme. That is
*;(0= ^>;>y*,J
(4)
where, o-j(x/n) is the final recovered value of stress component j at node in,
cr;M=SALto
(5)
in=l
where Nin(x) denotes the value of shape function related to the in-th node. It is worth noting that not only the nodes of the mesh but also central point of element and mid-point of element edges are recovered in previous stage, the number of nodes nn = 9. 2.2
Evaluate forces at boundary nodes using recovered stresses
The recovered stresses are used to compute nodal forces on submodel boundary. Precisely, the recovered, continuous stresses, 6P, are substituted into Cauchy equation, Eq.(6), to compute tractions f on submodel boundary. f = 6"n
(6)
where, n is the outward unit normal vector of the boundary. After that, equivalent nodal forces f on submodel boundary r are evaluated in a standard way of FEM formulations as follows. f = J r N r frrfr
(7)
where, t is thickness in deformed configuration and N the shape function matrix for displacements interpolation. Because recovered stresses are represented explicitly by polynomials, it is easy for the proposed procedure to compute forces at any nodes newly introduced by mesh
657
refinement which is the primary difficulty for traditional method. Better accuracy can also be expected by the new procedure since recovered stresses are generally more accurate than raw finite element results. 3
Numerical investigations
A classical model of hyperelasitic large deformation problem as shown in Fig. 1(a) is studied for numerical investigation. Mooney Rivlin hyperelastic material is adopted which is defined by the strain energy potential function U =Cm(ll-3)+Cm(i2-3) with material parameters clo = 0.l863A/pa and c01 = 0.00979Mpa . /, and l2 are the first and the second strain invariants respectively. Four cases are analyzed. Their precise definitions are shown in Table 1, for instance, case 1 is a plane strain problem, mesh Sub-1 is used in submodeling analysis and its accuracy is evaluated by comparing it with global analysis using mesh G-l. In plane stress cases, original thickness is 2mm. L
82.5 mm
j
n n o n o n o r w
(a): Model of the problem
(b): Original global mesh: driving mesh
D "C (c): Refined global mesh: G-l
D C (e): Refined global mesh: G-2
(d): Mesh used in submodeling anlysis: Sub-1
(f): Mesh used in submodeling anlysis: Sub-2
Figure 1. Model of example and meshes used in global and submodeling analysis
Global analysis is run with mesh shown in Fig. 1(b). This initial global mesh is not fine enough to capture details at adjacency of the hole, precisely region FGHDE (12,), where stress concentration occurs. Therefore submodeling techniques are applied. Submodeling analysis is run with two refined meshes Sub-1 (Fig. 1(d)) and Sub-2 (Fig. 1(f)) obtained by halving and quartering edges of elements of the initial mesh respectively. An error factor T) defined in Eq.(8) is used to indicate the accuracy of submodeling results by comparing them with appropriate global solutions, that is, submodeling solutions using Sub-1 and Sub-2 are respectively compared with global solutions of G-l (Fig. 1(c)) and G-2 (Fig. 1(e)).
658 Aie'ng
"n
l\ni°£g'Ak/ng
(8)
xlOO%
In Eq.(8), n is the region for which the error factor is computed; ne is the number of elements in the region and ng the number of integration points of each element; Ajt is the area of element before deformation; a"f4g denotes the Mises stress at integration points of reference solution. a'eJs indicates the difference in Mises stress between submodeling solution and correlative reference solution. Both the proposed method (RSM) and the traditional method using displacement boundary conditions (DM) are applied. Error factor r\ is computed for both £ls and n , . These results are listed in Table 1. From this table, first of all, it can be seen that both the two methods give satisfied solution with error less than 5%. Moreover the proposed RSM is obviously much better than DM in accuracy. Due to the effect of boundary conditions, the error for ils is greater than the error for a,. This is the reason for requiring that the submodel boundary should be far away enough from the area of interest. A tendency is observed from Table 1 that the superiority of RSM is more significant at area of interest (£i,) which is far from submodel boundary. In plane strain cases, error of RSM is half of that of DM for as, almost one third for ii,, while in plane stress cases error of RSM is about 38% of that of DM for n s , but less than one third for £2,. This indicates that the RSM is more capable to capture the information at area of most interest if the submodel boundary is located at proper distance from that area. Table 1: Case definition and error factors
Case 1: Plane strain G-1 Sub-1 DM RSM 1.64 0.67
va, Vcis
4
1.97
4.05
Case 2: Plane strain Sub-2 G-2 DM RSM 0.56 0.19 0.63
1.22
Case 3: Plane stress Sub-1 G-1 RSM DM 0.78 0.22 0.51
1.38
Case 4: Plane stress G-2 Sub-2 DM RSM 0.25 0.08 0.16
0.42
Conclusion
A new submodeling procedure is developed, by which the drawback of traditional procedure using forces boundary conditions is overcome and the result accuracy is remarkably improved. References 1. O.C. Zienkiewicz, J.Z. Zhu. The superconvergence patch recovery and a posteriori error estimates, part I: the recovery techniques, Int. J. Numer. Meth. Engng., 33(1992) pp. 1331-1364. 2. H. Gu, M. Kitamura. A modified recovery procedure to improve the accuracy of stress at central area of bilinear quadrilateral element, J. of The Society of Naval Architects of Japan, 188(2000) pp. 489-496. 3. N.G. Cormier, B.S. Smallwood, G.B. Sinclair, G. Meda. Aggressive submodelling of stress concentrations, Int. J. Numer. Meth. Engng. , 46(1999) pp. 889-909.
A HIERARCHICAL APPROACH TO SURFACE PARTITION OF POLYGONAL MESHES J. SHEN AND D. YOON Dept. of Computer & Information Science, University of Michigan, Dearborn, MI 48128, USA E-mail: [email protected] Given a surface polygonal mesh in three dimensions, an algorithm is proposed to find a partition of the mesh into k subregions on the basis of discontinuity of surface normal and curvature. The algorithm consists of three main steps in a hierarchical manner. First, an input polygonal mesh is decomposed w.r.t. discontinuity of surface normal. Secondly, flat regions are identified on the results of step 1. In the third step, the polygon mesh is further decomposed w.r.t. discontinuity of surface curvature. The resulting surface partition can be used in shape optimization or other surface manipulations based on geometric characteristics of the mesh. The execution time of the algorithm is linear, but pre-computation of some data structures takes 0(nlog n)» where n = m ax(N N ) > N
and N
are
the numbers of elements and nodes in the mesh, respectively. Numerical
experiments have been conducted to show the effectiveness of the algorithm.
1
Introduction
Surface partition of unstructured meshes is important to many problems in engineering and science. The partitioning problem may arise from the requirement of computation on multiprocessor architectures. If a surface mesh is involved in a computation, mapping such a surface onto a multiprocessor machine generally requires partitioning the surface mesh into a number of subregions and assigning these subregions to different processors. Existing algorithms include 1) simulated annealing [1] motivated by physics, 2) schemes based on geometry like straight coordinate bisection [2], bisection direction by principal axes of inertia [3], and stereoscopic projection [4], and 3) graph based schemes such as graph bisection methods [5], spectral partitioning methods [6] and min-max method [7]. In addition, the surface partition may come from the requirement of shape optimization or surface manipulations based on geometric characteristics of polygonal meshes. In contrast to the case of parallel computation, the objective of the decomposition herein is not to generate equally-sized subregions with minimized boundaries. Instead, the decomposition is controlled by the geometric characteristics such as discontinuity of surface normal and curvature. In this paper, we focus solely on this type of decomposition, and limit our attention on polygonal meshes with discontinuity of surface normal or curvature. As to a mesh without any discontinuity of surface normal or curvature, there is no need to break it down into several subregions from the perspective of shape optimization or surface manipulations. If there is a requirement from other perspectives, existing algorithms for parallel computation can be used to handle this special case. The main contribution of this paper is to propose a new algorithm for surface partition of polygon meshes w.r.t. geometric characteristics, i.e., the discontinuities of surface normal and curvature. The outline of the algorithm is introduced in Section 2, and numerical experiments are presented in Section 3.
659
660
2
Methods
Since the surface partition may be dependent upon several factors such as surface normal and curvature, in order to make things simple, we propose to divide the entire task into three main steps in a hierarchical manner as follows: Step 1: surface partition by G discontinuity Step 2: identification of flat regions Step 3: surface partition by discontinuity of curvatures Step 2 is conducted on the result of Step 1, while Step 3 is carried out on the result of Step 2. To find out G discontinuity, the angle formed by the surface normal of adjacent elements is used as an index that is compared with a predetermined threshold. Whenever the index is greater than the threshold, we consider that the edge between these two adjacent elements is a sharp edge or feature edge. In order to find each surface partition enclosed by this kind of sharp edges, a breath-first search is proposed to traverse over the polygonal mesh as follows: (1) set up element neighbor list for each surface element (2) calculate normal angle change between adjacent elements (3) perform a breath-first search (3.1) initiate from an arbitrary surface element (3.2)
(3.3)
propagate over the surface until a G discontinuity line is encountered, which is identified by a condition: normal angle change > a user-specified angular threshold. go back to (3.1) and repeat this breath-first search over unprocessed regions until all surface elements are covered by a partition.
In order to simplify things, we propose an idea of identifying flat regions as early as possible such that the task of surface partition by curvature in Step 3 could be reduced. To identify a flat region, the angle formed by the surface normal of adjacent elements is used. If this angle is smaller than a very small angular threshold, we consider that these two elements form a small portion of a flat region. Similar to the idea of finding sharp edges, another pass of breath-first search over the polygonal mesh is conducted as follows: (1) loop over each surface patch generated by Step 1 (1.1) perform a breath-first search (1.1.1) initiate from a surface element that has, w.r.t. each neighboring element, a normal angle change < a user-specified angular threshold for flat planes (1.1.2) propagate over the surface until a boundary line of a flat plane is encountered, which is identified by a condition: normal angle change > a user-specified angular threshold for flat planes. (1.1.3) go back to (1.1.1) and repeat this breath-first search over unprocessed regions until all surface elements are processed. (1.2) group the remain elements that do not belong to flat regions into one or more different surface patches by means of their connectivity.
661 With the surface partition produced by Steps 1 and 2, a third pass of partition on the basis of curvature discontinuity is conducted. This pass is extremely important to curved surfaces, especially when we want to separate features like fillets from others. Our basic strategy is to let the breath-first search find out one subregion with low curvature and group the remaining elements into one or more subregions. The search is controlled to start from an element with all its nodal curvatures smaller than the nodal curvature threshold. In the search propagation, if normal curvature w.r.t. an element edge is smaller than the nodal curvature threshold, the propagation continues. Otherwise, it terminates at that element edge. Overall, the following procedures are proposed to conduct surface partition by discontinuity of curvatures: (1) loop over each surface patch generated by Steps 1 and 2, which is not a flat region (1.1) calculate nodal curvatures (1.2) calculate average nodal curvature (1.3) perform breath-first searches (1.3.1) initiate only from an element with a curvature < average curvature (1.3.2) propagate over the surface until a termination condition is satisfied: curvature > curvature threshold. (1.3.3) If there are still some elements unprocessed, go back to (1.3.1) to initiate another breath-first search. (1.4) group the remaining elements, which do not belong to regions formed by breath-first searches, into one or more different surface patches by means of their connectivity. 3
Numerical Experiments
The algorithms introduced in this paper are implemented in VC++ and tested on a Pentium III HP PC. Table 1 shows the execution time and rate on different test meshes. Since the major part of the proposed algorithm is three passes of breath-first searches over all elements of a mesh, its time complexity is 0{n), where n = ma x(N , Nv) > Ne and Nv are the numbers of elements and vertices in the mesh, respectively. However, element neighbor relationship needs to be set up as a pre-computation, which takes O(nlogn) time. Thus, overall time cost of the proposed approach is O(nlogn)- Figure 1 gives a surface partition of a typical mesh model. Table 1: Execution time and rate on different test mesh models. Vertex Element Time Model name (second) bumper 473 432 0.17 bracket 236 186 0.07 deck lid 8807 8624 3.0 curverl 143 120 0.04 56 block 46 0.01 12640 25328 6.13 base 156 ellipsoid 308 0.06 1521 3042 torus 0.67
Rate (element/sec) 2541 2657 2871 3000 4600 4132 5133 4533
Figure 1. Surface partition of a bracket model by discontinuity of surface normal and curvature. 4
Acknowledgements
This work is supported in part by University of Michigan - Dearborn, Campus Research Grant and University of Michigan OVPR research grant. We thank Frank Massey for his advice in differential geometry. References 1. Williams, R. D., Performance of Dynamic Load Balancing Algorithms for Unstructured Mesh Calculations. Concurrency: Practice and Experience, 3 (1991) pp. 457-481. 2. Simon, H. D., Partitioning of Unstructured Problems for Parallel Processing. Computing Systems in Engineering, 2(1991), pp. 135-148. 3. Farhat, C. and Lesoinne, M., Automatic Partitioning of Unstructured Meshes for Parallel Solution of Problems in Computational Mechanics. International Journal for Numerical Methods in Engineering, 36(1993) pp. 745-764. 4. Teng, S. H., "Points, Spheres, and Separators, A Unified Geometric Approach to Graph Partitioning." Ph.D. School of Computer Science, Carnegie Mellon University, 1991. 5. Vaughan, C , Structural Analysis on Massively Parallel Computers. Computing Systems in Engineering, 2 (1991), pp. 261-267. 6. Barnard, S. T. and Simon, H. D., A Fast Multilevel Implementation of Recursive Spectral Bisection for Partitioning Unstructured Problems. Concurrency: Practice and Experience, 1994. 7. Kiwi, M., Spielman, D., and Teng, S. H., Min-Max-Boundary Domain Decomposition. Theoretical Computer Science, 261 (2001), pp. 253-266.
A COMBINED MESHFREE METHOD AND MOLECULAR DYNAMICS IN THE MULTISCALE LENGTH SIMULATION Q. X. WANG, T. Y. NG, K. Y. LAM, HUA LI AND X. J. FAN Institute of High Performance Computing,1 Science Park Road, #01-01, The Capricorn, Singapore Science Park II,Singapore 117528 E-mail: [email protected] Multiscale simulation technique for material modeling has been gaining much attention in many research realms, and it has emerged as a promising approach for addressing the challenges in efficient and accurate simulation method development. Thus, a new methodology, the combined meshfree method and molecular dynamics, is developed to simulate the multiscale length coupling between the continuum and the atomistic region. Numerical examples are presented to verify the developed methodology.
1
Introduction
In recent years, multiscale simulation techniques have received much attention in the science and engineering for the description of a wide range of physical phenomena. Traditional mono-scale approaches are obviously inadequate for the analysis of certain physical problems where the studied characteristics at different orders of length-scales, for example, the turbulence problem [1] and the crack propagation problem [2]. In coupling the continuum and the atomistics, much work has been done using the traditional finite element method (FEM) and molecular dynamics (MD). In this paper, element-free Galerkin (EFG) method [3] replaces FEM for continuum analysis and is combined with molecular dynamics (MD) simulation via the development of an appropriate handshaking region. A source code is developed to simulate example problems. The results present this methodology is efficient, and also possesses the additional advantage of simplicity of implementation.
2
Methods
2.1 Molecular
dynamics
Molecular dynamics (MD) simulation involves the classical trajectories of atomic nuclei by integrating Newton's second law of motion of a system. For the MD region in the present work, the two-dimensional Lennard-Jones (LJ) "12:6" potential [4] is used. The LJ potential model is given mathematically as f
0(rff) = 4e
\
<7 r
ij
r.. < r. v
(!)
where r.. = iy , r(- = r ; — r . , £ is a parameter characterizing the interaction strength, and a defines a molecular length scale. rc is the cutoff distance, namely (j>(rr) ~ 0, if rtj >rc. The force corresponding to the potential 0 ( r r ) is computed as
f = -V0(r,) 663
(2 )
664
and the force that atomy exerts on atom i can be expressed mathematically as
48e
( a V4
r a \8
r < r l]
r
—
(3)
'c
\« J Applying Newton's second law of motion, the equation of motion of the system can be obtained. And then integrating the motion equation by the leapfrog method [5], the velocity and coordinate of the atom i can be obtained. The corresponding internal stress tensor is given by [4] N„-\
~ W„
-
, s
l
Na
I>,v,.v,. - £ £r,7f,y 1
•
(4)
7='+'
where V is the area in two-dimensional systems or volume in three-dimensional systems, for the simulation cells. Na is the total number of all atoms, and m, and v, are the mass and velocity of atom (', respectively. 2.2 Element Free Galerkin (EFG) Method The element-free Galerkin (EFG) method is used here to describe the far-field region of the simulation. It employs the moving least-square (MLS) interpolate uh(\) to construct approximation of function u(x). For a given domain Q, the moving least-squares (MLS) approximation uh(x) of a function «(x) is given as = pT(x)a(x)
u*(x) = ]T pj(x)a,(x) ;=i
(5)
where p(x) is a complete polynomial of order m in the space co-ordinates x T = [x,y ] . The coefficients rt/x) in Eq.(5) are functions of x and a(x) at any point x are obtained by minimizing a weighted discrete least-squares norm / as follows 7 ( a ) = £ w ( x - x , . ) [ p T ( x , ) a ( x ) - u,]2
(6)
where n is the number of points in the neighbourhood of x for which the weight function w(x — X •) ^ 0 , and w, is the nodal value of u at x=x;. The neighbourhood of x is called the influence domain of x. The minimum of the weighted discrete least-squares norm 7(a) in Eq. (6) with respect to a(x) leads to the following relation between a(x) and u a(x) = A _ 1 ( x ) B ( x ) u
(J)
where A(x) and B(x) are the matrices and obtained by A(x) = £ w,(x)p T (x,.)p(x ; ) = X w ( x - x , ) p T ( x , . ) p ( x , . ) i=l
(8)
i=l
B(x) = [w,(x)p(x,), w 2 (x)p(x 2 ),..., wn (x)p(x„)] uT
(9) (10)
=[ul,u2,...,un]
Hence, we have
'(x) = ££/7 7 .(x)((A- 1 (x)A(x)) jl .« I .=£y I -(x)" i i=\ j=\
i=l
(11)
665
where y/;(x) is termed the MLS shape function and defined as ^,(X) = XPJ(X)(A-1(X)B(X));,.
(12)
More details of the moving least-square (MLS) interpolate and EFG can be found in the paper by Belytschko et al. [3]. 3
Implementation of the Multiscale Simulation Technique
As shown in Figure 1, we consider a problem domain consisting of 3 sub-regions, the atomistic region QMD (lattice region), the continuum region £2EFG (far-field region) and the handshaking region QHs (transition region). The EFG method is for QEFG and the MD formulation for QMD- The atomistic region and the continuum regions are joined by the handshaking region QHs- The compatibility conditions in QHs, namely the displacement and stress compatibilities, play a critical role in the performance of the present multiscale simulation technique. The detail description of this compatibility technology can be found inKohlhoffetal. [6]. In the simulation of multiscale problems, the generation of computational data sets is a very important task. A source code is developed here to generate the computational data points for the present multiscale simulation technique. The data points used in MD region is generated automatically. For the continuum EFG region, however, the domain is discretized by scattering irregularly distributed nodes, where the nodal density is adjusted according to the nature of the problem.
Figure 1. Problem Domain — QEFG (continuum), QMD (atomistics) and £2Hs (handshaking).
4
Numerical Results
The presently developed coupled EFG/MD multiscale simulation technique is applied for a kind of face-centered crystal (FCC) silver (Ag) plate, in the (001) plane. A uniaxial tension is applied at the two ends of the plane. Using the above coupled EFG/MD multiscale technique. The deformation of the plate and the stress distribution are computed. Figure 2 shows the distribution of the stress o xx along the plate symmetrical
666
line in x (tension) direction through the entire computational region. This figure shows that the MD result possesses a large stress oscillation at the initial time step. This is probably due to the application of the simple LJ pair potential for the MD simulation. However, once it reaches the equilibrium state, the results agree well with the analytical solutions as well as those of Kohlhoff et al. [6], using coupled FEM/MD. The numerical example demonstrates the viability and efficiency of the presently developed coupled EFG/MD method. The results are very encouraging, showing distinct advantages, such as acceptable accuracy and ease of implementation.
Analytical result
/
EFG result MD result 0.0 -20
-15
-10
-5
0
5
10
15
20
x(k) Figure 2. Stress distribution a „ along the plate symmetrical line in x direction. References 1. Hou T. Y., Wu X. H., Chen S. Y., and Zhou Y., Effect of finite computational domain on turbulence scaling law in both physical and spectral spaces. Physical Review E, 58(5) (1998), pp. 5841-5844. 2. Abraham F. F., Broughton J. Q., Bernstein N., and Kaxiras E., Spanning the continuum to quantum length scales in a dynamic simulation of brittle fracture. Europhysics Letters, 44(6) (1998), pp. 783-787. 3. Belytschko T., Lu Y. Y., and Gu L., Element-free Galerkin methods. International Journal for Numerical Methods in Engineering, 37 (1994), pp. 229-256. 4. Blonski B., Brostow W., and Kubat J., Molecular-dynamics simulations of stress relaxation in metals and polymers. Physical Review B, 49(10) (1994), pp. 6494-6500. 5. Rapaport D. C , The Art of Molecular Dynamics Simulation. Cambridge University Press (1995). 6. Kohlhoff S., Gumbsch P., and Fischmeister H. F., Crack propagation in b.c.c. crystals studied with a combined finite-element and atomistic model. Philosophical Magazine A, 64(4) (1991), pp. 851-878.
SELF-SIMILAR PROBLEMS IN MULTIDIMENSIONAL CONSERVATION LAWS
SUNCICA CANIC Department
of Mathematics,
University E-mail:
Department
of Mathematics,
University E-mail:
Department
of Mathematics,
California
of Houston, Houston, [email protected]
Texas 77204-3008,
USA
Texas 77204-3008,
USA
B A R B A R A LEE K E Y F I T Z of Houston, Houston, blkQmath.uh.edu
EUN HEUI KIM
E-mail:
State University, Long Beach, CA USA [email protected]
90840-1001,
We report on an approach to analysing hyperbolic conservation laws in several space variables by examining two-dimensional Riemann problems. Use of selfsimilar coordinates reduces the problem to a system of conservation laws in two variables; however, the system now changes type, and a complete analysis requires solving unusual boundary-value problems for degenerate elliptic and degenerate hyperbolic equations, as well as free-boundary problems for such equations. Recent work has resolved some of these difficulties. The talk illustrates this by solving some problems related to weak shock reflection in prototype equations.
1
M u l t i d i m e n s i o n a l C o n s e r v a t i o n Laws
Modeling by conservation principles is fundamental to fluid mechanics, and the importance of multidimensional systems is widely acknowledged. However, there are no general existence theorems for weak solutions of systems of conservation laws in more than one space dimension, as the tools which form the basis of a theory for hyperbolic conservation laws in a single space dimension do not extend to higher dimensions. To be specific, the principal method of analysis is through solution of the Riemann problem; this constitutes a nonlinear version of the method of characteristics. The role of characteristics in propagation of solutions of hyperbolic equations is complicated in several space dimensions, even for linear and semilinear problems, and a nonlinear formulation has not yet been found. Recently, we have started to analyse two-dimensional Riemann problems. One goal of the research is to learn what sorts of singularities appear generically — that is, what are the two-dimensional analogues of shock discontinuities. Related to this, we hope to establish a priori bounds on weak solutions. In addition, a number of self-similar problems are of interest in themselves. For example, the so-called "von Neumann paradox" in weak shock reflection focuses on the failure of shock polar analysis to explain the nature of shock reflection when the waves are weak enough that the nonlinear acoustic waves dominate the linear entropy and vorticity waves. This problem can be studied in prototype equations which are simpler than the full equations of gas dynamics. We have examined the unsteady transonic small disturbance (UTSD) equation and the nonlinear wave system (NLWS).
667
668 2
Self-Similar R e d u c t i o n
A working definition of a Riemann problem (not the only definition possible), is one for which the data depend only on x/y and hence self-similar solutions in x/t and y/t are expected. A system of conservation laws in two space dimensions and time, Ut + F{U)X + G(U)y = Ut + A(U)UX + B(U)Uy = 0, where U(x, y, t) e K n and F and G are smooth maps on R n , becomes a system in two variables £ = x/t, r\ = y/t, which can also be written in conservation form: F( + Gv = (F - £U)e + (G-
r,U)v =
-nU.
A typical system of conservation laws, for example the equations of isentropic or polytropic compressible gas dynamics, is hyperbolic in space and time, with a pair of nonlinear acoustic wave speeds and a number of linear, degenerate characteristics corresponding to entropy or vorticity waves. The reduced system is hyperbolic only far from the origin and changes type at the sonic line, corresponding to the acoustic wave cone 1 ; there is a bounded set {{£,,v) £ ^ } m which the system is elliptic (if n = 2) or of mixed type (if n > 2). The reduced system is often called 'quasisteady', and there is a close analogy with the equations of steady transonic flow, which are also much used in applications but for which there is not a complete theory. In the prototype systems we have studied, the UTSD equation and NLWS, the elliptic part can be written as a second-order equation which appears to be tractable. The Euler system is more complicated. In any case, fl is not known a priori, but depends on the solution U; typically the boundary of fl is at most Lipschitz. In the hyperbolic region, solutions of the reduced system may be relatively simple. For example, for Riemann data which is piecewise constant in sectors, the far-field solution can be found by the elementary construction of solving one-dimensional Riemann problems. Interactions in the hyperbolic region of these one-dimensional waves can be analysed for small data (as a consequence of one-dimensional theory), and in some case have simple selfsimilar solutions by elementary constructions 1 . At least two types of behavior at dQ have been identified. If U is continuous at d£l then the elliptic equation is degenerate at dfi. This is the case even for linear equations such as the two-dimensional wave equation, whose fundamental solution has a square-root singularity at the wave cone. When U is also constant at dfi, the nonlinear equation possesses a nonlinear version of the same anisotropic degeneracy, which is of a type first analysed in work of Keldysh 2 ; it is different from the Tricomi singularity, which appears when the steady transonic potential equation is written in hodograph variables. This nonlinear equation had not been previously studied. Canic and Keyfitz 3,4 , and Canic and Kim 5 found solutions in weighted Sobolev spaces and in Holder spaces (see also related work of Zheng 6 ), and found that nonlinear Keldysh equations, as distinct from linear equations, may in addition have solutions which are continuously differentiable up to the degenerate boundary. Both singular and regular behavior occur, often in the same problem, on different parts of the boundary 7 .
669
The segments of d£l at which U is continuous and constant correspond to spacelike surfaces; that is, the problem of posing Dirichlet data on d£l is well-posed. However, there are configurations in which locally well-posed solutions outside of fl are not constant on dfl and do not extend to a solution in all of R 2 . Thus, even when a solution which is continuous across the sonic line is expected (from the absence of compression waves in the data, for example), it is not always possible to predict the location of 0. based on the supersonic solution alone. A second type of behavior occurs when transonic shocks appear in the solution. In this case, the solution is discontinuous across dfi. The equation may be strictly hyperbolic on one side and the elliptic part of the operator strictly elliptic on the other; however, the boundary itself is now unknown a priori. This leads, then, to a free boundary problem in which the position of the shock and the subsonic flow are coupled by means of the Rankine-Hugoniot equations, a system of nonlinear equations relating the shock slope, the (known) state outside the shock and the unknown state inside ft. In simple cases, the equation governing the subsonic flow is strictly elliptic, the shock may change continuously from supersonic to transonic, crossing a degenerate part of <9fi as it does so. Even without this additional complication, the free boundary problem is not of a standard type, as the underlying elliptic equation is quasilinear and the coupling between the shock slope and the states is highly nonlinear. This has turned out to be the principal challenge of the project up to this point.
3
Oblique D e r i v a t i v e Free B o u n d a r y P r o b l e m s
In work with Lieberman 8 which proves a stability result for steady transonic flow, and which we have extended to establish weak 9 and strong 1 0 regular reflection patterns in the UTSD equation, at least in a neighborhood of the interaction point, we have found a method to prove existence of the free boundary and the corresponding subsonic solution. The method is classical, but seems well-adapted to quasilinear equations and nonlinear boundary conditions. It is based on formulating the elliptic equation as a second-order equation Q(u) = 0, whose coefficients do not involve the derivatives of u (here u is one state variable); and on casting the Rankine-Hugoniot as an evolution equation for the shock position and an oblique derivative boundary condition, (3 • Vu = 0, on the free portion of the boundary. Taking an approximate position for the shock in an appropriate Holder space /C of curves, a mapping on K. is defined by solving the quasilinear fixed boundary problem for u and then solving the evolution equation to define a new curve. The key is is a gain of regularity in this mapping, due principally to estimates one can obtain in the oblique derivative problem; we can show that the mapping is compact and has a fixed point. Kim has shown that in some cases the solution is unique 1 1 . The lack of regularity of dil requires the use of weighted Holder norms. The lack of uniform ellipticity in the case of a shock adjacent to a continuous sonic boundary is handled by elliptic regularization. We have solved two prototype problems for the UTSD equation 9 , 1 0 , but we expect the method to work quite generally. Up to this point we have assumed that the oblique derivative boundary condition is uniformly oblique. This is true in cases
670 where the shock itself is oblique and never normal. However, in many interesting problems, such as the formation of a Mach stem, the shock is normal at one point (the foot or symmetry point), and such appears to be, in fact, the generic situation for transonic shocks. For example, a uniform planar shock spanning a subsonic region has this property at its mid-point. Our current work focuses on adapting the compactness estimates to include this degeneracy. Acknowledgments Research of the first author (SC) supported by the National Science Foundation (NSF), grant DMS-9970310 and by the Texas Advanced Research Program (TARP) grant 003652-0112-2001. Research of the second author (BLK) supported by the Department of Energy, grant DE-FG-03-94-ER25222 and TARP grant 003652-00762001. Research of the third author (EHK) supported by NSF grant DMS-0103823. References 1. S. Canic and B. L. Keyfitz. Quasi-one-dimensional Riemann problems and their role in self-similar two-dimensional problems. Archive for Rational Mechanics and Analysis, 144:233-258, 1998. 2. M. V. Keldysh. On some cases of degenerate elliptic equations on the boundary of a domain. Doklady Acad. Nauk USSR, 77:181-183, 1951. 3. S. Canic and B. L. Keyfitz. An elliptic problem arising from the unsteady transonic small disturbance equation. Journal of Differential Equations, 125:548574, 1996. 4. S. Canic and B. L. Keyfitz. A smooth solution for a Keldysh type equation. Communications in Partial Differential Equations, 21:319-340, 1996. 5. S. Canic and Eun Heui Kim. A class of quasilinear degenerate elliptic equations. Journal of Differential Equations, to appear. 6. Yuxi Zheng. Existence of solutions to the transonic pressure-gradient equations of the compressible Euler equations in elliptic regions. Communications in Partial Differential Equations, 22:1849-1868, 1997. 7. S. Canic, B. L. Keyfitz, and E. H. Kim. Mixed hyperbolic-elliptic systems in self-similar flows. Boletim da Sociedade Brasileira de Matemdtica, 32:1-23, 2002. 8. S. Canic, B. L. Keyfitz, and G. M. Lieberman. A proof of existence of perturbed steady transonic shocks via a free boundary problem. Communications on Pure and Applied Mathematics, LIIL1-28, 2000. 9. S. Canic, B. L. Keyfitz, and E. H. Kim. A free boundary problem for a quasilinear degenerate elliptic equation: Regular reflection of weak shocks. Communications on Pure and Applied Mathematics, LVt71-92, 2002. 10. S. Canic, B. L. Keyfitz, and E. H. Kim. Free boundary problems for the unsteady transonic small disturbance equation: Transonic regular reflection. Methods and Applications of Analysis, 7:313-336, 2000. 11. Eun Heui Kim. Boundary behaviors and uniqueness of solutions for a class of quasilinear degenerate elliptic equations. Submitted, 2002.
V A R I A N C E R E D U C T I O N OF M O N T E C A R L O M E T H O D S F O R OPTION P R I C I N G U N D E R STOCHASTIC VOLATILITY MODELS X. Q. LIU, Y. Y. WONG Department of Mathematics, National University of Singapore 2, Science Drive 2, Singapore 117543 E-mail: [email protected], [email protected] The Clark-Funke-Shevlyakov-Haussmann-Davis-Ocone (CFSHDO) formula is used to construct perfect control variates for vanilla and exotic option prices under stochastic volatility (SV) models and Longstaff and Schwartz? least squares Monte Carlo method is employed to estimate the conditional expectations in the CFSHDO formula. The resulting variance reduction effect is very significant and well worth the additional efforts to compute the control variates. Our results shed light on the success of a systematic approach, against various existing ad hoc techniques, to constructing control variates for Monte Carlo valuation of exotic options under complex models.
1
T h e C F S H O formula and its applications
Based on the Clark's [2] representation theorem on functionals of a Brownian motion, the stochastic integral representation of functionals of the solution to a stochastic differential equation (SDE) was derived by Funke and Shevlyakov [4], Haussmann [5], Divis [3], and Ocone [8] by different approaches. This formula results in an explicit representation of optimal portfolios for utility maximization in Ocone and Karatzas [9]. It is also used in [7] for developing variance reduction methods for simulated diffusions. An extension of the formula is found in Aase et al [1], where the extended formula is applied to the explicit calculation of the hedging function for a European call option. This paper aims to adapt the variance reduced Monte Carlo (VRMC) method of Newton to option pricing under stochastic volatility models. To approximate the conditional expectation in the representation formula, we employ the least-squares Monte Carlo estimation proposed in Longstaff and Schwartz [6] for American option pricing. Numerical experiments show that the specification of a linear form for the regressions is always adequate to provide a powerful control variate. 2
Heston's model and the C F S H D O representation of option payoffs
Hestons model is developed to capture the volatility smiles. It allows for a correlation between the asset return and the volatility as follows: dSt =
rStdt+y/VtStdWt1,
dVt = {UJ- evt) dt + £VtdW2, d[W\W2]t
= pdt.
The correlated Brownian motions W£ and W2 can be expressed by two independent Brownian motions B\ and B2, namely, dW} = dB\ and dW2 = pdB\ + 2
VT^dB .
671
672 The VRMC method for vanilla and exotic options under Heston's model requires regular MC simulations of the underlying asset prices St subject to the simulated volatilities Vj and the computation of the control variates. As a matter of fact, the payoff functional is not differentiable when the option happens to be at the money at maturity. However, this does not prevent us from applying the representation theorem to the pricing of the options because such events are of probability zero. The validity can be justified alternatively by approximating the payoff functionals using a sequence of differentiable functionals. 3
T h e V R M C algorithms for options under Heston's SV models
3.1
Algorithm for vanilla options
To reduce computing work on the control variate, the regressions are performed only on a partition with a coarser space relative to the partition for the MC simulations. The VRMC algorithm for vanilla options is detailed as follows: (1) Regular MC simulations of the underlying asset price: "m+i
=
bm + rhbm + yVmom
[Bm_^i — Bmj ,
•Vm+1 = Vm + (co- 9Vm) h + £Vm [ p (Bi+1
+ V l - P 2 {B2m+1 - B2m)] •
- Bl)
(2) A backward recursion on a k—times coarser partition $n+l = I A " )++ ( ^ "n\kh
oil
\o
+
nfc 2 \ / V.nk ^Tik
o
yB(n+i)k ~
iP
B
nkj
(S(2»+D'= - s ' 0 '
+ o i^r^)
/ 1 if ST > K _ , \0iiST
PN-I
(3) Least-squares regressions for the conditional expectations: O-Unk 0>12nk
&nk
0>13nk
0-2lnk 0-22nk
vnk
0-23nk
(4) The integration for the control variate: AT-l
Plain: I =
0-2\nk 0-22nk M-l
Interpolation: J =
\ m=0
ai2n \ 021r 022n
,
Ollnfc C-12nk
^
a
13n
fl23n
Vnk
(0,\3nk
o-nk [Bln+1)k - B\kj ,
V a23nk
r dllm
a\2m
fll3n
0-2\m 0,22m
G23n
= Interpolation ((°Unk
ai2nk
)
\ \ 0.21nk 0,22nk )
°~m {Bm+i , (ai3nk \ d23nk
~ Bm) ,N
673 (5) The computation of the VRMC price:
1=1i
3.2
The algorithm for vanilla options
The algorithm for Asian option varies from that for vanilla options in that, (i) it requires the computation of the average price St of the underlying asset on the coarse partition; (ii) the recursive scheme for sample values of the regression changes due to a different form of the Riesz characterization of the differential of the option payoff functional; (iii) the regressions are tri-variable: (2') A backward recursion: PN = 0, A l n
_ J kh\iSN> " \0
K
HSN
'
A
_ ,
.
"-^m,uj,
Pn = Pn+l3>n+l + A n . (3') Regressions: Snk = — — T (nSn-i
( 4
O-Unk Ul2nk
+ S„k) , ai3nib \
0-2\nk 0-22nk Q23nk
I
l31nk
1
0,32nk d33nk
Numerical implementations and conclusions
The algorithms in the last section are implemented to price vanilla and Asian options under Heston's model.
aw am Numbardpathi
3500
woo
Figure 1: Price vs Number of Simulations
1ffl>
200
300
wo
Nuflbor ol Samples lot Rao;i*M]n
Figure 2: VR Effect vs Size of Regressions
674 Figure 1 shows the different trends in the price of a particular Asian option produced by the MC and VRMC methods respectively as the number of simulations increases. The ratios between the variances of the MC and VRMC methods respectively are plotted against the size of the regression in Figure 2. Substantial numerical results show that the VRMC method always reduces the variance of MC method dramatically with limited increase in computing time. The computational work on the control variate can be diminished since the effect of variance reduction is insensitive to the size of regressions. Refinement of the coarse discretization and interpolation in the numerical integration for the control variate is found to enhance the reduction of variance significantly. References 1. Aase, K.; 0ksendal, B.; Privault, N.; Ub0e, J. White noise generalizations of the Clark-Haussmann-Ocone theorem with application to mathematical finance. Finance Stoch. 4 (2000), no. 4, 465-496. 2. Clark, J. M. C. The representation of functionals of Brownian motion by stochastic integrals. Ann. Math. Statist. 4 1 (1970) 1282-1295. 3. Davis, M. H. A. Functionals of diffusion processes as stochastic integrals. Math. Proc. Cambridge Philos. Soc. 87 (1980), no. 1, 157-166. 4. Funke, R.; Shevljakov, A. Ju. A generalization of Clark's formula. (Russian) Theory of random processes, No. 5 (Russian), pp. 93-96, 114. Izdat. "Naukova Dumka", Kiev, (1977). 5. Haussmann, U. G. On the integral representation of functionals of ltd processes. Stochastics 3 (1979), no. 1, 17-27. 6. Longstaff, F. A.; Schwartz E. S. 2001. Valuing American options by simulation: a simple least-squares approach. Review of Financial Studies 14 (2001), no. 1, 113-147. 7. Newton, N. J. Variance reduction for simulated diffusions. SIAM J. Appl. Math. 54 (1994), no. 6, 1780-1805. 8. Ocone, Daniel Malliavin's calculus and stochastic integral representations of functionals of diffusion processes. Stochastics 12 (1984), no. 3-4, 161-185. 9. Ocone, D. L.; Karatzas, I. A generalized Clark representation formula, with application to optimal portfolios. Stochastics Stochastics Rep. 34 (1991), no. 3-4, 187-220.
A SUPERLINEARLY CONVERGENT ALGORITHM FOR LARGE SCALE MULTI-STAGE STOCHASTIC NONLINEAR PROGRAMMING* FANW E N M E N G , R O G E R TAN A N D G O N G Y U N ZHAO Centre for Industrial Mathematics, National University of Singapore, 2 Science Drive 2, Singapore 117543 E-mail: [email protected] and [email protected] This paper presents an algorithm for solving a class of large scale nonlinear programming which is originally derived from the multi-stage stochastic nonlinear programming. With the Lagrangian dual method and the Moreau-Yosida regularization, the primal problem is transformed into a smooth convex problem. By introducing self-concordant barriers, an approximate Newton method is designed. The algorithm is shown to be of superlinear convergence. At last, preliminary numerical results are provided.
1
Introduction
In this paper, we consider the following large scale nonlinear programming: min{/(a;) | Ax = a, U{x) <0,ie
I = {1, 2, • • -,0},x
G Rn}.
(1)
where f,fi,i&J, are smooth, convex on Rn and A G fj™*n with rank(yl) = m < n. It is known that a lot of practical problems can be formulated into (1). In particular, the multi-stage stochastic convex nonlinear programming (MSSCNP), which has been intensively studied in the past few years ([5,10,11]), can be written in the abbreviated form as (1) (see, [11]). Some linearly convergent algorithms have been developed for solving (MSSCNP). However, at present there are not faster algorithms for (MSSCNP) in the literature. Thus, it is a very interesting and meaningful work to investigate some rapidly convergent algorithm for problem (1). Let T = {x e Rn | fi(x) <0,i e I}. It is known that in (MSSCNP), / and T are separable into scenarios, while the nonanticipativity constraint Ax = a is not separable. Thus at first we seek to relax the constraint Ax = a by using the Lagrangian dual of problem (1) as follows: mm{C(u)\ueRm},
(2)
where C(u) = max{-f{x)
+ uT{Ax
- a) | x e F}.
(3)
A substantial obstacle in solving problem (2) is that £ is nondifferentiable. To overcome this, we convert (2) into another convex problem by using the so-called Moreau-Yosida regularization of £: mm{ri(u)\u€Rm},
(4)
•THIS RESEARCH IS SUPPORTED BY THE GRANT FROM IHPC-CIM RESEARCH PROJECT: R-146-000-036-592.
675
676
where V(u)=
mm{av)+-\\v-u\\2M},
vERTn
(5)
Z
where M is an n x n symmetric positive definite matrix and | | W | | M := wTMw for any w e Rm. In this paper, we take M = jl, A > 0, for simplicity in discussion. Problems (2) and (4) are equivalent in the sense that the solution sets of this two problems coincide with each other. It is well known that r\ is smooth with VTJ(U) := g{u) = (1/A)(u — p(u)), where p(u) is the unique solution of (5). In general, there are a number of iterative methods that can be used as a procedure to obtain the approximate solution of subproblem (5), such as [1,3]. However, these methods tend to spend more inner iterations as the outer iteration proceeds. Hence, how to improve the efficiency of this subproblem is a key question for the whole problem. In this paper, we will investigate a method to solve subproblem (5) effectively. Roughly speaking, we add a self-concordant barrier function b to the objective function of problem (3). Thus, we obtain a new function £(//, u) = max{— f(x) — pb(x) + vT(Ax and its corresponding function r,(p,u)=
— a) | x £ i n t F } ,
p > 0,
(6)
rj(p,u):
mm {C(p,v) + l/(2X)\\v-u\\2},
A > 0.
(7)
Based on some nice properties estalished in this paper, we can use the higher order derivatives to solve the smooth subproblem (7). 2
Self-concordant P r o p e r t y
Since 77 is convex, continuously differentiable with the Lipschitz continuous gradient, hence we should develop some suitable gradient-based algorithm for solving (4). However, as we stated above, 77, V77 are obtained through the optimal solution of problem (5), which is difficult to solve. In order to overcome the nondifferentiablity of C> we add a self-concordant barrier b to this function and obtain problems (6) and (7). The definitions and properties about self-concordant functions are referred to [8]. Throughout, we make the following assumptions: ( A l ) A has full row rank, ( A 2 ) T is a compact convex set and int T ^ 0, ( A 3 ) b : T —> R is a 9—self-concordant barrier, ( A 4 ) / : T —> R is (3—compatible with b. Under the above assumptions, it is evident that problem (7) has a unique solution denoted by x(n,v), y, > 0 and v E Rm. Let q((i,x) := f(x) + fd>(x), e(fj,,v,x) := -q(fj,,x) + vT(Ax - a) and p(n,v,u) := C(M.U) + (V2-MIIW _ u ll 2 Then, we derive the following important result: P r o p o s i t i o n 1. For any fixed u e Rm, p(p,v,u) is a strong self-concordant family with parameters a(p) — p/(l + /3) 2 , u>(p) — (1 + 0)91/2/p, i/(p) —
[(i + 0)e^2 + i/2}/p.
677 It is well known that an important class of functions which can be minimized by path-following algorithms in polynomial time are self-concordant families. Therefore, p(p, v, u) plays a great role as we construct the algorithm in the next section. 3
Algorithm and Convergence Theorems
In Section 2, we have shown that for any p > 0,u G Rm, problems (5), (6) and (7) have unique solutions p(u), x{p,u) and v(p,v), respectively. Then, we have P r o p o s i t i o n 2. For any fixed u G i ? m , v(p,u) converges to the optimal solution of (5) as p —> 0. Let e > 0 , u G Rm, if there exits a vector p£(u) G Rm such that C(Pe(u)) + 2t\\Pe(u) — w|| 2 < v(u) + e> then we call p£(u) an e-approximate solution of rj(u). P r o p o s i t i o n 3 . Let u G Rm, for any e > 0, there exits p > 0 such that for each p G [0,p], v{p,u) is an e-approximate solution of r](u). Now, suppose pe(u) is an e-approximate solution of ri(u), we define the approximations of r)(u) and Vr)(u) by r]e(u) = £(p £ (w)) + l/(2A)||p e (M) — u\\2 and gE(u) = (l/\)(u—pe(u)) respectively. Then, we can compute rjE(u) and gs{u) to be arbitrarily close to r](u) and g(u) respectively as long as the parameter e is chosen small enough. Furthermore, with help of Proposition 3, we only need to compute v(fi, u) which can be chosen as an e-approximate solution of r)(u) for some small positive Jx. Next, we investigate the algorithm for problem (4). The Newton direction used for minimizing p(/j,,., u) is as what follows. Av = -(V2p(fj,,
v, u))-lVp(n,
v, u),
(8)
where V2p(p,v,u) = y 2 C(/i,^) + (1/A)/, Vp(p,v,u) = VC(t*,v) + (1/A)(i> - «)• Denote 5(p,,v, u) := y / 'p,~ l Av T V 2 p(n, v, u)Av. Regarding the outer problem (4), since it is impossible to solve an exact generalized Hessian V G <9B(?(U), we hope to compute it approximately. It is evident that if £ is twice continuously differentiable at p(u), then dsg{u) consists of the single element, namely V2??(w) = (1/A)7 — ( 1 / A ) [ / + AV 2 C(p(w))] _1 - So, we can develop an approximate Newton method for solving problem (4). Now we state the following heuristic algorithm. Algorithm Step 1. Choose e 0 > 0, e 0 := Meo) > °> 7 e (0.1). £o > 0,/x0 > 0,v°,u°,X > 0. Let k = 0. Step 2. Let v = vk, u — uk. Step 2.1. Maximize e(pk,v,.) and obtain x(nk,v). Step 2.2 Construct the Newton direction At; by using (8). Step 2.3 Choose a step size a > 0. Set v+ = v + aAv. Step 2.4 If 5(pk,v+) < efc. Set w(fc+1) = v+ and go to Step 3. Otherwise, set v = v+ and go to Step 2.1. Step 3. If pk £ £fc, set p^ = PQ, go to Step 4. Otherwise, set Pk+i = IPk- Set k = k + 1 and go to Step 2. Step 4. Let p£k(uk) = t/ fc+1 >. Compute geic(uk) = (1/A)(u fc - p £ t ( u f c ) ) , pick a positive definite symmetric matrix 14 (details are given below) and compute the search direction dr = —V^g£k(uk).
678 Step 5. Choose a step size Tfc > 0(0 < Tfc < 1), set uk+1 = uk + Tkdk. Choose a scalar 0 < Ek+i < £k- Let k = k + 1, go to Step 2. There are some ways to choose Vk, such as if V2C(pefc(wfc)) exists, choose Vk = (l/A + 7fc)J- (1/A)[7 + AV 2 C(p £ f c (u f c ))]-\ and Vk = Vk-i otherwise. Here V^ = I and 7fe is a small constant to ensure Vk is positive definite. Then, we get the following covergence theorems of the above algorithm. T h e o r e m 1. Suppose that there exists a constant n > 0 such that {Vkh,h) > rc||/i||2, for all h e Rm and all k. Suppose {Tfc} tends to 1 as k —> oo. Then, any accumulation point of {uk} is an optimal solution of problem (4) as ek —• 0. Corrollary 1. Suppose the assumptions of Theorem 1 are satisfied. Let u* be an optimal solution of (4). Then {vk} converges to u* as uk —* u* and ek —• 0. T h e o r e m 2. Let u* be an optimal solution of problem (1.4). Let X(/J,,V) and v(fx, u) be unique solutions of problems (6) and (7), respectively. Then x(fi, u) converges to an optimal solution x* of problem (1) as u —> u* and /J, —> 0. T h e o r e m 3. Suppose that the conditions of Theorem 2 are satisfied and u* is an optimal solution of (4). Assume that g is BD-regular and semismooth at u*. Suppose that Sk ~ o(\\g(uk~1)\\2); for all large A;, rk = 1; l i m ^ o o dist(Vk,dBg(uk)) = 0. Then {uk} converges to u* at least 2-step superlinearly. At last, we test Manpower planning problem and Production planning problem, which were described in [7]. Numerical results show that the algorithm proposed in this paper, which combines the barrier Lagrangian dual and Moreau-Yosida regularization, can solve problems in reasonable time. References 1. A. Auslender, Numerical methods for nondifferentiable convex optimization, Math. Programming Study, 30(1987), 102-126. 2. F.H. Clarke, Optimization and Nonsmooth Analysis. New York: Wiley, 1983. 3. R. Correa and C. Lemarechal, Convergence of some algorithms for convex minimization, Math. Programming. 62(1993),261-275. 4. M. Fukushima and L. Qi, A Global and Superlinear Convergent Algorithm for Nonsmooth Convex minimization, SIAM J. Optimization, 6(1996),1106-1120. 5. J.L. Higle and S. Suvrajeet, Stochastic decomposition, Kluwer Academic Publishers, 1996. 6. J.B. Hiriart-Urruty and C. Lemarechal, Convex Analysis and Minimization Algorithms. BerlimSpringer Verlag,19. 7. A.J. King, Stochastic programming problems: examples from the literature, Numerical Techniques for Stochastic Optimization, Y. Ermoliev and R. J-B Wets eds. Springer-Verlag, 1998, 543-567. 8. Y. Neesterov and Nemirovskii, Interior-point Polynomial Algorithms in Convex Programming SIAM, Philadephia, P E , 1994. 9. R.T. Rockafellar, Convex Analysis. Princeton, New Jersey, 1970. 10. G. Zhao, Lagrange dual method with self-concordant barriers for multi-stage stochastic nonlinear programming, Report, 1999. 11. G. Zhao, A log-barrier method with Benders decomposition for solving twostage stochastic programs, Mathematical Programming, 90(2001), 507-536.
COMPUTATION OF NETWORK DELAY WITH PRIORITISED TRAFFIC INVOLVING THE MULTI-PRIORITY DUAL QUEUE ANTHONY BEDFORD AND PANLOP ZEEPHONGESKUL Department
of Mathematics
and Statistics, RMIT University, Plenty Road, Bundoora Victoria, 3083, Australia E-mail: anthony.bedford® ems. rmit.edu.au
East,
We continue our work on the unique differentiated service network, involving the multi-priority dual queue (MPDQ), by investigating exclusively the 'high' Quality of Service (QoS) criteria - delay (waiting time). The MPDQ is a new scheduling regime shown to reduce congestion for multi-class traffic over conventional scheduling disciplines. To gain insight into the advantages of a mixed MPDQ / First In First Out network with prioritised traffic, simulations are performed on networks with and without a dual queue. The simulation analysis is discussed, including the enormity of files created for cumulative density construction, and criteria for ceasing the runs. We construct a new statistic, the Adjusted Average Network Delay (AAND), which removes the differences in class service rates to establish a relative measure of class transient times through the network. We also look at high-class traffic under different offered loads, and provide a comparison of the delay characteristics. These findings provide communication service providers valuable information in determining the improvement in QoS to differentiated networks. They also highlight the importance of simulation as a tool of evaluation of networks, and the best quantity of MPDQ's to include in a network scenario.
1
Introduction
Simulation analysis using Arena [1] is used to continue our work on delay in networks with and without a MPDQ. The simulations undertaken contained huge data files totalling 40Mb. The simulation time was set at 50000 units to allow for the loss levels (see [2]) to reach steady state. This was determined through pilot simulations, and can be seen in other work in this area [3]. Here we concentrate on class wise delay using a cumulative density function (CDF) for the three broad networks described in our earlier paper, and also use the networks described in that work [2]. 2
Route Delay
As a measure of network performance, the average waiting time for this type of traffic (differentiated with independent service rates) is not a perfect reflection of source and destination delay. We arbitrarily assigned a longer service rate for the higher class of customer in order to emulate an increased instantaneous network demand from this class. This may also mean the delay time in the network is longer. Therefore Table 1 lists an adjusted average network delay (AAND) from source to destination. This is given by AANDC>Bir = " l (w''r -{Nn
679
680
dual queue is one of the links. This also does not adversely effect the F-F link, with times almost identical to the F-F-F network. Class 3 is disadvantaged by the dual queue, with the largest delay time of all traffic in these 3-node networks. All FIFO networks show no discrimination in their delay times, with the AAND near identical for the classes. Table 1. AAND by Class and Network. ]Route
Networks 3-node: F-F-F, DQ-F-F 4-node: F-F-F-F, DQ-F-F-F 5-node: F-F-F-F-F, DQ-F-F-F-F
Class 1 2 3 1 2 3 1 2 3
F-F
DQ-F
F-F-F
DQ-F-F
54.27, 55.00 54.47, 54.70 56.75, 57.10 49.08, 48.36 49.02,48.14 50.83, 50.72 38.89, 39.42 39.15, 39.97 41.20,41.56
- , 33.50 - , 35.20 -,71.50 - , 30.63 -,31.74 -, 63.41 - , 25.78 - , 26.25 -,51.22
72.68, 63.49 73.58, 64.77 76.37, 81.90 58.76, 59.27 59.20, 59.54 61.89, 63.01
-,54.13 - , 56.23 - , 89.86 -, 44.25 - , 46.47 -,71.64
For the 4-node networks, the dual queue improves service indirectly to the FIFO nodes. The DQ-F-F-F shows slightly improved delay times for the F-F route for all classes and greatly improved delay times for the F-F-F route over the FIFO network. We begin to see a flow-over effect of resorted traffic by the dual queue to the FIFO nodes. Notably, Class 3 is again disadvantaged in the dual queue network. As the networks increase in size, the delay to this traffic is closing in on the FIFO network delay times. In the 5-node networks, the F-F route is now virtually identical for the two variants, with the dual queue network now a little larger in delay than the FIFO network along the F-F route. This is also the case in the F-F-F route. Also the DQ-F and DQ-F-F routes in the dual queue network continue to deliver lesser delay times than the FIFO. The margin is smaller, indicating that the effectiveness of the dual queue in re-sorting inter-nodal arrival traffic is reduced. With more non-sorted departures within the network, the influence of the dual queue, whilst still effective, may not deliver the requirements needed to justify higher priority traffic or the cost of implementing a dual queue. 3
CDF of delay
The behaviour of the traffic along all routes is now modelled for each class. The delays are no longer adjusted for their service times as in the previous section as we wish to investigate what is happening to the traffic in its entirety. We use the CDF of delay time to model each network, and compare them on a route basis for each class. In this way, as before, administrators can decide what prioritised customers should expect in their service requirements. We first look at Figure la, which contains the 3-node network CDF's. The optimal curve is one that reaches a probability of 1 as rapidly as possible. All delay CDF's have the familiar 'S' shape common in delay models. What is noticeable in Figure la is the desirable characteristics in the first three curves. Class 1 and 2 traffic along the DQ-F route achieves excellent delay times. The sorting influence of the dual queue is also evident, with Class 1
681
Figure 1. (a) 3-node network (b) 4-node network (c) 5-node network (d) Overall network delay
traffic along the F-F route in the dual queue network also achieving excellent delay. This influence is a way of sorting traffic, as it exits the dual queue in class order, and arrives at the next node in class order. The intermixing of this traffic with traffic from other nodes will be in a semi-ordered form. This result is excellent for this class. Class 1 traffic can be guaranteed low delay times irrespective of the route in the 3-node dual queue network. Furthermore, Class 2 and 3 traffic in the same network receives virtually identical delay functions as the FIFO counterparts. The only poor result in the dual queue network is for Class 3 traffic. The FIFO network routes all receive near identical delay functions. We next consider the 4-node networks. From Figure 1(b), we again see the Class 1 and 2 traffic in the dual queue network receiving excellent service. Class 2 and 3 customers along the F-F route in the dual queue network also do well, with them having near identical delay functions to that of the FIFO network. Unlike in Figure 1(a), Class 1 customers are now delayed substantially more in the dual queue network along the F-F route. The influence of the dual queue may have waned for Class 1 customers, as the dual queue is not necessarily adjacent to both of the F-F routes. In the 3 node routes in the 4-node network, as seen in Figure 1(b), the delay functions are closer. Notably, the magnitude of the x-axis has increased as we are now analysing traffic through 3 nodes. The results are excellent for the dual queue network, with Class 1 and 2 traffic in this network on both types of routes experiencing the shortest delay times. Class 3 again has the longest delay in the dual queue network along both routes. In the FIFO network, Class 2 and 3 are again superior to Class 1 traffic. This is due to the longer service time of Class 1 traffic. Finally we look at the 5-node networks in Figure 1(c). We again see the Class 1 and 2 traffic in the dual queue network receiving the best service. The results are virtually identical to those found in the 4-node network. The increase in capacity and nodes has little effect on the 2-node routes. However there is a distinct advantage in the dual queue network. All six class and route combinations in the dual queue are superior in delay times
682
to the three FIFO combinations. Class 1 and 2 dual queue traffic experience the best delay times, whilst the Class 1 F-F-F traffic is delayed the longest. Through the mid-section of the CDF, the gap is increasing between the priority and non-priority networks. Overall we can see the value to Class 1 and 2 customers in the dual queue networks. For service providers, the choice to allocate a priority network is dependent upon their willingness to sacrifice service to lower class traffic in order to provide better QoS to higher-class traffic. We conclude our delay analysis by looking at the broad delay CDF's for the networks. This comprises all classes of traffic from all routes in each network and gives a broad overview to the network behaviour. Figure 1(d) shows the overall delay. All functions have the same shape, with the exception of the F-F-F network. It guarantees the worst probability for short delays but the best in long delays. There is such a slight difference between the DQ-F-F-F and DQ-F-F networks that in overall terms the set up costs may not be worth the marginal network improvement. 3.1
Delay for Class 1 traffic in the 5-node network
In our final analysis of the networks, we analysed the behaviour of first class traffic in the 5-node network for various load levels. We varied the load by adjusting the interarrival rate for Class 1 customers, which in turn changed the network load. This was undertaken for the DQ-F-F-F-F network. Our finding was that, as the load decreases in the network, the delay time decreases for Class 1 traffic. Furthermore, as the arrival rate increases (and as the load decreases), the gap between CDF's reduces. 4
Conclusion
The implications of a MPDQ in various network scenarios have been explored and delay functions given, providing a framework for service providers interested in setting boundaries and a starting point for further mixed network analysis. Put simply, with FIFO networks, the more nodes, the lower the delay time. The results according to class show that differentiated services benefit in a network with the MPDQ present, with the 4-node network performing the best with the MPDQ. References 1. Kelton W. D., Sadowski R. P., Sadowski D. A., Simulation with Arena, 2nd ed., (McGraw-Hill, 2002) 2. Bedford A. and Zeephongesekul P., Simulation solutions of networks with prioritised traffic involving the multi-priority dual queue, Proc. IC-SEC (2002). 3. Bedford A. and Zeephongsekul P., Analysis of the Multi-Priority Dual Queue (MPDQ) witii Preemptive and Non-Preemptive Scheduling : A Simulation Analysis, Submitted for publication.
SIMULATION SOLUTIONS OF NETWORKS WITH PRIORITISED TRAFFIC INVOLVING THE MULTI-PRIORITY DUAL QUEUE ANTHONY BEDFORD AND PANLOP ZEEPHONGESKUL Department of Mathematics
and Statistics, RMIT University, Plenty Road, Bundoora Victoria, 3083, Australia E-mail: anthony.bedford® ems. rmit.edu.au
East,
In prior work we have shown that the multi-priority dual queue (MPDQ) outperforms conventional scheduling disciplines such as First In First Out in isolation. This work takes the MPDQ into a network situation, and compares the network loss with and without its presence. As the MPDQ improves traffic congestion, most notably in communication networks, it is not necessary to include it at every node, or service centre, within a network. Our aim here is to provide a framework for future researchers, network designers, and service providers to implement this analysis, which involves the investigation of simple three, four and five node multi-class networks. This is aimed as a guide for extension to larger networks. Queueing networks containing differentiated traffic, also known as multi-class networks, are complicated to solve analytically using existing queueing theory techniques. We discuss the complications of exact probability network solutions and describe how we used involved simulation models to obtain performance statistics only possible through high performance computing. The simulation models here are presented with a description of the workings of the MPDQ. To investigate if there is any improvement in loss levels within the network we compare, for each network, the inclusion of one or more MPDQ's. Each node contains a service centre, following specific service times for each traffic type. Traffic arrives and follows a shortest path algorithm to its predetermined destination.
1
Introduction
Priority schemes applied to the multi-priority dual queue (MPDQ) with non-preemptive scheduling provide superior performance over a single queue for a variety of scheduling disciplines such as First In First Out (FIFO) and Last In First Out (LIFO) [1,2]. The simple non-prioritised dual queue (DQ) was shown to provide superior delay and loss to customers over FIFO, Round Robin (in Wireless Local Area Networks (WLAN)) [4] and Deficit Round Robin schemes [3]. Some applications of the MPDQ include IP networks, LAN (Local Area Networks), WLAN and mobile communications [5]. With the DQ and MPDQ's versatility for application in communications established, our aim here is to analyse loss for differentiated classes of traffic in networks with and without a MPDQ via simulation. We investigate the proportion of traffic lost in three types of networks for the classes by route. This is undertaken firstly by defining some basic network structures of nodes (servers) with finite buffer space, in which traffic may arrive externally at any node and then venture through the network to a predetermined departure node. For each network simulation we fixed inter-arrival rates and service times so that comparisons across networks could be made. Traffic could take any feasible route through a network. In these trials, we used three classes of traffic. No discrimination is made between the three classes of traffic in terms of routing within the network. In our previous work [1,2] and in [5], this was seen as a practical number of classes in a differentiated network. All inter-arrival rates (time between arrivals) and service rates follow an exponential distribution. These rates are identical irrespective of the node type. In our performance analysis, we use a network load of 0.525.
683
684
2
The Multi-Priority Dual Queue
As described in [2], the steady-state solution to the MPDQ remains difficult to evaluate due to the complex nature of the solution process. For a dual queueing system with two classes of traffic and waiting space c\ for the primary queue and c2 for the secondary queue, the dimensions of the irreducible generator matrix A of the system are given by
Ac
c q ,c2
fi(c,+l)(c|+3c 2 + C l +2)Vi( e i +l)(c 2 2 +3c 2 + C l +2) 2
e 9^ 2
^
This matrix forms part of the linear system generating all transitional states of the queueing model that is given by nTA = 0 where n is the vector of the steady-state distribution of the continuous-time Markov chain containing the unique normalised nonnegative solution once solved. The dual queue requires exhaustive demands on computational resources due to the rapidly expanding size of A as c\ and c2 increase. It soon becomes apparent that it is impractical to solve systems with a total queueing capacity beyond five. For the dual queueing model with c\ = 4 and c2- 6, the size of A5 is 150 x 150. Adding the MPDQ to a network complicates this and precise analysis is beyond steady state computation, hence simulations are used. The MPDQ and FIFO are illustrated in Figure 1. Nq is the number in queue and c, is the capacity of queue i. If the arriving traffic meets a full primary queue, then it waits in the secondary queue. If the secondary queue is also full then the arriving traffic is lost. If a space becomes vacant in the primary queue, traffic at the head of the secondary queue moves to the tail of the primary. We employ a Highest-Class First (HCF) regime within the DQ models. This means that Class 1 (high-class) traffic jumps to the head of the line over any lower class traffic within the same queue. Service is nonpreemptive hence there is no interruption to traffic being processed.
if Nq
Queue
Figure 1, Multi Priority Dual Queue (MPDQ) and Single FIFO Queue (F).
Networks There are three network structures shown here, each used with and without the inclusion of a MPDQ. Each one, as seen in Figure 2, can be used as a blueprint for constructing larger networks, or treated as a sub-network/LAN. Using the results for these three basic structures, the analysis provides a clue as to the ideal quantity of dual queues to place in a network based on the service manager's QoS objectives. All networks where the MPDQ is included have the dual queue located at node 1. Each node in a network
685
contains only two adjacent nodes. Furthermore, traffic may depart the network or arrive at any node (with rate y), or be moved on to its destination node. The traffic waits at each node based on the queue type. As can be seen in Figure 2, some paths are identical, and we define the transit times between nodes to be the same. After preliminary analysis, we could simplify our results into sets of routes rather than individual routes. This is because our analysis showed that even though external arrivals could occur at any node, routes with the same distance in the same network could be considered equivalent for the same class of traffic. For example, a Class 1 customer travelling from node 1 to 3 was found statistically equivalent to a Class 1 customer travelling from node 3 to 1. This is also known as Burke's Law. Exhaustive analysis confirmed this trend to hold for all like route combinations with the same service queueing regime. So we end up with, at most, four types of paths in a network. For the three networks analysed, each one is evaluated with and without the MPDQ at node 1.
Figure 2. Three, four and five node (sub) networks analysed
Throughout this paper, we use DQ for the multi-priority dual queue and F for a FIFO node. The six networks we will analyse are as follows: F-F-F, F-F-F-F and F-F-F-F-F are networks with three, four and five nodes respectively that all follow a FIFO regime. The other three networks are DQ-F-F, DQ-F-F-F and DQ-F-F-F that have the dual queue at node 1 and all other nodes are FIFO. In the networks analysed all feasible routes are combined into the following sets DQ-F, F-F, F-F-F and DQ-F-F-F. We have employed a shortest path algorithm, so traffic cannot take a longer route even in circumstances of congestion. We define the primary and secondary queue in the dual queue to be of length 5 each. All other FIFO nodes have single queue length of 10. For Class 1, Class 2 and Class 3 traffic, the mean inter-arrival time, (y c ,,, where c = class and i = entry node), and service rate (jic) is 10 seconds and 4 seconds; 5 seconds and 2 seconds, and 2.5 seconds and 1 second respectively. 4
Preliminary Performance Evaluation
All simulations were undertaken using the Arena Simulator. Each simulation was run for 50000 units (seconds). From Table 1, the 3-node FIFO network (F-F-F) shows little difference in loss levels for each node, with the exception of Class 1 for node 1. The introduction of the MPDQ (DQ-F-F) shows that loss levels at the dual queue node are only slightly lower for both Class 1 and 2 traffic. Overall there is a marginal improvement on a class-wise basis for traffic in the DQ-F-F over the F-F-F. The 4-node networks are quite similar to the 3-node. Overall there is a substantial reduction in the loss of approximately 10% from the 3-node networks. The 4-node network has an increased capacity of 25% for the same amount of traffic hence we expect the loss levels to drop. The DQ-F-F-F is again marginally better than its FIFO counterpart for Class 1 traffic. For Class 2 and 3 the F-F-F-F network has lower levels of loss than the dual queue network. There is an overall slight decrease in loss from the 4 to 5-node networks. The increase in
capacity is now 20%. Class 1 again receives the lowest loss statistics in the dual queue network. Table 1. % Loss at node by Class and Network.
Networks F-F-F, DQ-F-F F-F-F-F, DQ-F-F-F F-F-F-F-F, DQ-F-F-F-F 5
Class 1 2 3 1 2 3 1 2 3
1 13.3,12.9 14.9,13.7 14.4,14.6 4.7,4.8 4.7,5.4 5.1,5.3 4.2,3.7 3.9,4.3 4.0,3.7
2 14.6,14.5 14.1,15.0 14.1,14.1 4.5,4.2 4.1,4.6 4.2,5.1 4.2,3.8 4.3,4.4 4.5,4.4
Node 3 14.1,14.2 14.8,14.7 14.4,14.6 5.2,4.8 5.1,5.2 4.9,5.1 4.2,3.5 4.2,4.3 4.3,4.0
4
5
5.1,4.8 5.7,5.1 5.3,5.1 4.0,3.5 4.2,4.0 4.1,4.2
3.9,4.2 4.1,4.4 4.0,4.5
Conclusion
In this paper we have displayed the networks used to evaluate class loss by network. Preliminary analysis suggests Class 1 traffic under a MPDQ suffers lower loss that the other classes. As the MPDQ is situated at Node 1, it can be seen that loss is increased for traffic 2 nodes from it. Further analysis is undertaken in our next paper in these proceedings. References 1. Bedford A. and Zeephongesekul A, Simulation studies of waiting time approximation for the multi priority dual queue (MPDQ) with finite waiting room and nonpreemptive scheduling. In Topics in Applied and Theoretical Mathematics and Computer Science, ed. by V. V. Kleuv and N. E. Mastorakis. (WSEAS Press, Greece, 2001) pp. 220-225. 2. Bedford A. and Zeephongesekul A, Simulation studies on the performance characteristics of multi priority dual queue (MPDQ) with finite waiting room and non-preemptive scheduling. In Topics in Applied and Theoretical Mathematics and Computer Science, ed. by V. V. Kleuv and N. E. Mastorakis. (WSEAS Press, Greece, 2001) pp. 226-231. 3. Hayes D., Rumsewicz, M. and Andrew L., Quality of service driven packet scheduling disciplines for real-time applications: looking beyond fairness. Proc. IEEE Infocom'99 1 (1999) pp. 405-412. 4. Ranasinghe, R., Andrew L., Hayes D., and Everitt, D., Scheduling disciplines for multimedia WLANs: embedded round robin and wireless dual queue, Proc. IEEE Int. Conf. Commun. (2001) pp. 1243-1248. 5. Ogawa M., Sueoka T. and Hattori T., Priority Based Wireless Packet Communication with Admission and Throughput Control, Proc. of the 51st IEEE Conf. Vehicular Technology, (2000) pp. 370-374.
I N V E R S E OF A CERTAIN B A N D TOEPLITZ M A T R I X LIM K A H JIN Department
of Mathematics and Science, Singapore Polytechnic, 500 Dover Road, Singapore 139651, e-mail:[email protected], Tel:68790377
In his paper 'Inversion of certain symmetric band matrices', Lars Rehnqvist 1 gives an algorithm for the inverse of a band Toeplitz matrix A of order n x n arising from certain statistical problems. The elements of A are atj = k — \i — j \ , if \i — j \ < k and aij = 0, if \i — j \ > k for integer k < n. In this paper a key idea of Rehnqvist is exploited to find the exact inverse of a generalization of A.
{1
k-\i-j\-\
if \i- j\
where the non-zero elements in A are the modified Chebyshev polynomials defined recursively as So(x) = 1; S\(x) = x; Sj(x) = x • Sj_i(x) — Si-2(x), i > 2. The result confirmed Rehnqvist report that the inverse matrix is dependant on the value of k and n. The determinant of the matrix is also found and thereby proving a conjecture by E.L. Allgower2. A range of values for x that guarantees the non-singularity of the matrix is also determined.
1
Introduction
Ci{x) is the modified Chebyshev polynomials denned recursively as Co (a;) = 2; Ci(x) = x; Ci{x) = x • C;_i(a;) — C;_2(a;), I > 2. In the elements 5s Z of A~l, a means a _ 1 a n d i,j are its row and column positions and sz refers to the size of A~x. Also s in Us indicates its dimension. For brevity, Si(x) and Ci(x) are written as Si and C; respectively. Also defined are 5_i = 0, and 5_2 = —5oDefine the band symmetric tridiagonal matrix T whose first row is [x, - 1 , 0 , . . . , 0] and x is real. LetT • A = M. Then m ^ = 0 for 1 < i, j < n except for m*1'1) = mSn'n^ = Sk, fh^1^ = rh^n'n+l-^ = Sk-j-i for 2 < j < k + 1, m ( ^ ) = Sk - Sk_2 = Ck for 2 < i < n - 1 and rh{-i'i+k^ = — So for 1 < i < n — fc. For k = 4 and n = 2
I M,(2fc+l) -
\ M,(nfc+l+a)
for
(1.1)
1
E
m.nfc+l+a
U,(n-k+a)
<
•y.
So 0 -So 0 0 0 0\ Si 0 0 0 -So 0 0 0 0 c 40 d 0 0 0 -So 0 0 0 0 0 0 C4 0 0 0 -So 0 0 0 0 C4 0 0 0 -So -So 0 -So 0 0 0 C4 0 0 0 0 -So 0 0 0 C4 0 0 0 0 0 0 -So 0 0 0 C4 0 0 0 0 0 -S0 0 So Si S 4 / Si
<
k is partitioned such that 1^ ( n f c + l + a )
If M( n .fc +1+Q )
M
and
U(n.k+a)
are non-singular
then
( n .fc+i+a) c a n b e f o u n d f r o m U(n1k+a) b v t h e bordering method. U{^k+a) is expected to be sparse since it is reduced invariant under a large number of sub-l spaces. Unfc + l is first determined, then next [/,nfc+l+a is obtained from U.nfc+l for
1 < a < fc-1.
687
2
T h e inverse o f U(nk+i+a)
Unk+i c a n be found if its elements in the first k rows and columns are known. The elements w„fc+i is determined from w^n-Dfc+i recursively for n = 2, 3 , . . . and for 1 < hj
U3
'
2.1
Elements
~\F
V-CZ-VP"1
Uy)
U-1 +Uy1FP-1EU~1
J
in the first k rows and columns of U~k+l
For n > 1, u„fc+i = 0 for 1 < i, j < k except for C Snk/S(n+i)k " ^
i=j = 1
= < 5„fc-i/S(n+1)fc_1 t -Sj-3Sk~l/{S(n+l)k~lS(n+l)k)
The proof is by induction,
4
t = j = 2,3,...fe i = l , j = 2,3, ...fc
is zero, for 1 < i, j < k except for uk'
(2)
=
k j]
Sk-i/S2k-u 1 < i < k-1, uk ' = -(S j _ 2 S' f c _i)/(SfcS 2 f c _i) ) 1 < j < k - 1 and uCc'fc) = 1/5/.. Partition t/fc+i as in eqn(l) with /? = k + 1 and 7 = fc. Simplify eqn(l) a n d the base case of n = 1 is established: 4 + 1 = Sfc/S^fc, 2 < i < k, 4 V ? = S}-3Sk-i/(S2k-iS2k), 2 <j
=
S
(n+\)k/S(n+2)k,
^{n+l)k+l
=
5
( n + l ) f c - l / 5 ' ( n + 2 ) f c - l . 2 < l < k,
5
"(n+i)fc+i = - 'j-35'fc-i/(5' ( n + 2)A : -i5' ( n + 2)fc), 2 < j < k and elsewhere U(n+i)fc+i = 0, for 1 < i, j < k and eqn(2) is established.
2.2
Elements in the first k rows and columns of Unkl+1+a for 1 < a < k — 1
U~k+1+a is partitioned as in eqn(l) with /3 = nk + l+a ,7 = nk+1 and u„k+i+a = 0 for 1 < i,j < k except for o
i = 7= 1,2,... , a
S ( n + l ) f c S ' ( n + 2)fc-l 7 ('.i) x
i = a + 1, j = a + 1
S ( n + l)fc
nfc+l+a
—
Sj-a-sSk-1
Snk-1 i
•S(„
+ 1
z = a + l , j = 1,2, . . . , a
)t_
i = a + 1, j = a + 2, a + 3 , . . . , k i = j = a + 2, a + 3 , . . . , k
1
(3)
689 2.3
All elements of U~£+1+a, 0 < a < k - 1
Multiply each of the first k rows of U~£+1+a with Unk+\+a to obtain k difference equations. Solving these equations gives the elements in the first k rows of Unk+\+aSince Unk+i+aU~k+1+a = I, the remainding non-zero elements of U~£+1+a are obtained and 3
Kk+i+a
>ior0
—5fc+j-a_3S(T.+
l
1)fc-l-5(c+1)fc_1
Sfc-lS(„+i)fcS(n+2)/e-l ^(n-c)fS(r-(-l)|t-l
l,r
j = a + 1, r > c
^(»-r)l'S(t+l)4-l
a+ 2 < j < k, c ^ n
—5J-a-3-S'(c+l)fc-l"'5(r+l)fc-l
(4)
_(rfc+i,cfc+i) nk+l+a 0 < a < fc- 1, 0 < r , c < n l
r
'S'(r+l)-fc-l'>5(n + l - c ) f c - l Sk-l-S(n 2)k-l +
r
^(r+l)-fc-l1'S'(Ti-c)fc-l Sk-i-S(n+f)k-i
,
r = c 5 ( r + lS) f- cf c- -l l' S' ^( („* i2 +) f cl -- rl ) f c - l r — c • 5 ( rS+flc)_- flc- -Sl(' '„S 'i()r fi -cr_) f e - l +
1
S(c+l)k-l-S(n-r)k-l Sk-l-S(n +
1)k-l
+
r >c 3
r >c
•S/t_l'S(„ + 2 ) t _ l
Elements of M nfc x +1+a for 1 < a < A:
Partition Mnfc1+1+Q as in eqn(l) with (3 = nk + 1 + a and 7 = nfc + a then 4fcC+i+Ja for 0 < c < n j = l,c = 0 2<j < a a/ 1
j = a + l,c^0 a 5^ fc
a + 2<j
S(„+i)k-iS(„+i)t Sk-lSnk
+l +
aS(n+2)k-l~a
Sk+j-a~3S(c+1')k-lSk~a-2S(n Sk-lS„k + l + aS(n+2)k-l-c,S(„
Sfc-lS„fc
+
i+
i))b_1S(„+i)t
c«S(n+2)fc-l-c.
•Sj_3-a5fc_Q_2 5(c+1)fc_1 Sk-lSnk
+ l + aS(n+2)k-l~a
- (rfc+i,l) nk+\+a
0
+
Sfc-lS„lfc+l+aS(„+2)fc-l-aS(7i + 2)k-l •5(n-e)fcSfc_Q_2S(„ + 1 ) i . _ 1
m
i = a+1
l)k-l 2)k-l
S'fc_j-lS(„+1_e))c_iS(„
c =£ n i = l,l
+ +
S(„ + Sk-lSnk
+l+
l)kS(n-r)k-l aS(n+2)k-l~a
— '5fe-a-2'5(r+l)fc-l Sfc_iS'„J i + i + ( , S ( „ + 2 ) t - l _ c <
S ( r l _ c ) f c _ 1 Sfc- j _ 1 S ( n + l)fc &k- 1 Snk + l + a S(n
+
2)k - 1 - a
(5)
690 ,(i-l,j-l)
and m^k+i+a = unk+'a
^ o r *>•? —
2 anc
+ j
^\f
a
* *—
a
— ^
— S(n-r+l)k-lSk
a = k, r < c 5=1
S( n +l-c)fcS r fe_i
<5(„ + i_ r )fc5( Tl + i_c)fc
Sfc_iS( n + 1 ) f c
Sfc_i5(n+1)fc5(n+i)fc+i
Sfc_iS( n + i) f c
5fc_i5( Tl+1 j fe 5( Tl -(-i)fc+i
a = fc, r > c i = i 2 < Q < fc-1 2<j < a
— a-2'5( T l „ c ) f c
1+0,5(Tl+2)fe-l-a
'5(n,+l-T-)fc-l'S'fc-a-2'S'fc+i-a-3S'(c4-l)fc-l •5fc-l5nfc + i
^(n + l - r J f c - l ^ + lJfcSfc-j-lSfn+i-cJfc- 1
+ a5(„+2)fc-l-Q'S'(n.+2)fc-l
Sj-aSrk-iS(c+i)k-i
Sfc_i5 n fc
+
i+a5(rl+2)fc-l-aS(;n+2)fc-l
•S'j-3 5( n +i_ r )fcS( c +i)fc_i
5 , fc-iS'( n+1 ) fc 5( Tl+ 2)fc~i
a = fc 2
P t f ° r t n e following
f°rl
1
-5fc_l5'„fe +
exce
5fc_i5(n+i)fc5(n+1)fc+15(n+2)fc-l
5fc_j_iS( r i + i_ r )fc5( n .4_i_ c )fc_i Sfc_iS(7l+1)fc+1<S'(ri-t-2)fc-i
1 < a < fc-1 a +2<j
S'(T1_r+l)fc^l5fc_a-25j-3-a5(c+i)fc_i • ? f c - l 5 n f c + i + a 5 ( n + 2 ) f e - l - a ' S ' ( n + i)fc_i
S ( T l _ r + 1 j f c _ 1 5 ( r l + 1)fc5A:_J_15(n_c)fc_1 S'fc_i5'T1fc + i + c [ 5 ( T 1 , + 2 ) f c - i _ Q i S ' ( n + i)fc_i
S( Tt ,__ r+ i) fc _ 1 5fc_ a _2 5 j _ 3 - a S ( c + i ) f c _ i
5( T l „ T . + i) f c _ 1 5( T l + 1 )fe5fc_j_i5( n _ c )fc_i
5fc-l'5nfc+l+a'5( T l + 2)fc_-l_a'S'(Ti+l)fc-l
S k - l S T i f e + i + a S ( n + 2 ) f c _ l-a-S'(Ti + l ) f c - l
•S'fc-l'^fn+i)/,;-! <5(n-7-+l)fc- l'S'fc-a~2'S , j - 3 - a < 5 ( c + i ) f t _ i
-S(n-r+l)k-l'S'(n+l)fc'Sfc-j-l'S'(Ti-c)fc-l
•S'fc-l'5nfc+l+a'5'(n+2)fe-l~a'S'(n+l)fc-l
Sfc_i 1S„.fc-|_i + a S ( T l + 2 ) f c _ i _ a 5 ( n - ( - i ) f c _ i
1 < a < fc-1 j =fc+ l, r > c if c = n, j ^ k + 1
Scfc-lSfn-^fc^i 5fc_i5( n + i)fc_!
™;&£S ,C * +i forO
2 < a
Sk+j-a-3S(r+l)*-l5'(c+l)fc-l •S(n + l ) f c S ( „ + 2 ) ) ! - l S ' f c - l
Sk-a-2S(r+l)k-lSk-a-2Sk+j-a-3S(c+i)k-l S(n+1)fcSfc_l5„it
5fe_ct_2S(r+l)ib-lS'fc-j-lS(„ Sfc-lS„fc
+
+
+
i+e,S(n+2))!-l-a'S'(7i+2)fc-l
i_c)fc_1
i+aS(n+2)fc-l-cf5(n+2)li:-l
1 < a < fc j = 1 + a, r < c
S(„-c)*:S(r+1)fc_1 • S f c - l S ^ + jjj.
•5(T-+l)fc-l^fc-Q-2S(n-c)fc ' •Sfc_lS(„+1)t5„fc + i-|-c<S(„+2)fc_i_0
1 < a
S(n-r)kS(c+l)k-l
,
1 < a < fc-1 a+2<j
•5fc-l-5(71+1)*!
S'(r+l)fc-l'S) t _ e ,_2'S(„_ c )fc Sfc_iS(rl +
— Sj-a-3S(c+i-)k-lS(r+l)k-l Sfc_iS( n + 1 )fc_iS(„ + i) f c 1
1)fcS„fc+i+aS(n+2)it-l-£>
Sj-3-a
S ( c + 1 ) f c _ i S t _ a _ 2 S(r+l)lfc-l
S( T l + 1 ))t_ 1 S( n + 1 ) f c S)b_iS„fc + i + a S'(„ + 2)fc-i-£,
•Sfc-j-iS'(„„ c )i._ 1 5fe_ C I _2S( T . + i) f e _ 1 S , (n+l)fc-lSt_ 1 S„Ji-|-l-|- a S(„ + 2)t-l-o
(6)
691
Having Mnk1+1+a, post-multiplying by T gives A 1. T is tridiagonal, each row of A is linear combinations of 3 adjacents rows of M~k+1+a. 4
D e t e r m i n a n t o f Ank+i+a
for 1 < a < k
Proposition 1 A^t
A
\ -
J S(n+l)k~lS(n+2)k-lS(n+2)k-a-lSlzt
Since
Proof.Let 5 ( p ) = n L i ^ + i + a _(1,1)
_
°nfc+l+a -
-(1,1)
:E m
' nfc+l+Q
_m
_ (1,2)
_
c
a
ifc+l+/3
p=l
a=l
p=l
and „
1 / r c
P(n+l)fc-l*(n+2)k-aJ/P(„+2)k-lO(n+2)fc
n 4 M ) =n^n*(*>)!!« n —1
l<<X
d e t ( , 4 n f c + 1 + a ) = \l\[^+aa^'l)
f o
nfc + l+a ~
if
0=1
—
^
Qfc-1 a
—a r*a —1 Q (n+l)fc-l°(n+2)fc-l°(n+2)fc-a-l
Thus part 1 of the proposition and by similar arguments part 2 is obtained. When x = 2, Sn = n + 1 and Allgower's conjecture 2 is included below det(A„ f c + 1 + Q ) - | 5
( n+ 1)fc-ifc
Non-singular values o f
if Q = 0
(8)
Ank+i+a
T h e o r e m 1 Let N e M „ ( C ) and AT = I
j so i/iaf J4 is r x r . / / rank(N)
= rank(D)=s and det(£>) ^ 0 then N is singular iff A = B • D - 1 • C. Proof: If A = BD~XC then det(iV) = det(D) det(A - BD^C) = 0. If det(N) = 0, det(D) y^ 0 then for 1 < i < r, Ni = auNr+i + ct2iNr+2 + • • • + asiNr+s and (Nx N2 ••• Nr) = ( Nr+1 Nr+2 • • • Nr+S) • K, where k ^ = ai:j for 1 < i < s and 1 < j < r. Thus ( ^ J = ( ^ J AT and A =
BD~XC.
In deriving M"1, U must b e non-singular. The proof that U is non-singular for x — 2 is by induction. Us is non-singular for 1 < s < A;.Partition Uk+i as in theorem(l) with D = Uk, then aW - BD~lC = 2 - l/(fc + 1) ^ 0 and by theorem(l) f/fc+i is non-singular. Similary Uk+i+a is non-singular for 1 < a < k. Assume Unk+i+a is non-singular for 1 < a < fc. f/(„+i)fc+i+Q is non-singular for 1 < a < k as a*1'1) - B • D'1 • C = 2 - u £ + 1 ) j t + a / 0. Similarly T is non-singular for x = 2 and t/„fc+i+ a and T are non-singular for x > 2. References 1. Rehnqvist,L.:Inversion of certain symmetric band matrices BIT 1 2 , 90-98 (1972). 2. Allogower, E.L.-.Exact Inverses of Certain Band Matrices, Numerische Mathematik 2 1 , 279-284 (1973)
TIME-SPLITTING SINE-SPECTRAL A P P R O X I M A T I O N FOR T H E N O N L I N E A R S C H R O D I N G E R EQUATIONS WEIZHU BAO Department of Computational Science National University of Singapore, Singapore 117543 E-mail: [email protected] In this note we review the time-splitting sine-spectral (TSSP) method, recently studied by the author, for nonlinear Schrodinger equations (NLS) in the semiclassical regimes, where the Planck constant e is small. The time-splitting spectral method under study is unconditionally stable, time reversible and time transverse invariant. Moreover, it conserves the position density and performs spectral accuracy for spatial derivatives and fourth-order accuracy for time derivative. Numerical tests are presented for linear, for weak/strong focusing/defocusing nonlinearities and for the Gross-Pitaevskii equation. The tests are geared towards understanding admissible meshing strategies for obtaining 'correct' physical observables in the semi-classical regimes. Furthermore, applications to Id, 2d and 3d Gross-Pitaevskii equation for Bose-Einstein condensation are presented.
1
Introduction
Many problems in quantum or solid state physics require the solution of the nonlinear Schrodinger equation with a scaled Planck constant e (0 < e < 1):
^ = -y w + vwr + f{\r\2)r, ^ ( x , * = 0)=^g(x),
t>o, *eiid, (i)
xeRf
(2)
In this equation, V = V(x) is a given real-valued electrostatic potential, / a realvalued smooth function, and ij}e = ipE(x., t) the wave function. The wave function is an auxiliary quantity used to compute the primary physical quantities (or observables) such as the position density rf and the current density Js n e (x,t) = | ^ ( M ) | 2 ,
Je(x,i)=eIm(^(x7i)VV£(x,t)).'
(3)
The general form of (1) covers many nonlinear Schrodinger equations (NLS) arising in various different applications. For example, when / = 0, (1) reduces to the linear Schrodinger equation; when V = 0, f(p) = f)e p, it is the cubic nonlinear Schrodinger equation (called the focusing NLS if /3e < 0 and defocusing NLS if f}s > 0); when V(x) = f |x| 2 with u > 0 a constant, f(p) = 5p with 6 a constant, it is called the Gross-Pitaevskii equation (GPE) 14 which is used to describe BoseEinstein condensation (BEC) i' 10 - 8 ' 5 or nonlinear optics 15 . It is well known that the equation (1) propagates oscillations of wave length e, in space and time, when e is small. The oscillatory nature of solutions of the nonlinear Schrodinger equation with small e provides severe numerical burdens. Even for stable discretization schemes (or under mesh size restrictions which guarantee stability) the oscillations may very well pollute the solution in such a way that the quadratic macroscopic quantities and other physical observables come out completely wrong unless the
692
693
spatial-temporal oscillations are fully resolved numerically, i.e., using many grid points per wave length of 0(e). In 12 , Markowich et. al. study the finite difference approximation to the Schrodinger equation with small e. Their results show that, for the best combination of the time and space discretizations, one needs the following meshing strategy constraint in order to guarantee good approximations to all (smooth) observables for e small 12 : mesh size h = o(e) and time step k = o(e). Failure to satisfy these conditions leads to wrong numerical observables. Much more restrictive conditions are needed to obtain an accurate L 2 -approximation of the wave-function itself. In 6 ' 7 , Bao et. al. study time-splitting spectral approximations to the Schrodinger equation with small e. Extensive numerical experiments suggest the following meshing strategies for obtaining the correct observables: h = 0(e) and k- independent of e for linear Schrodinger equation; h = 0(e) and k = 0(e) for defocusing nonlinearities and weak 0(e) focusing nonlinearities 2 ' 3 ' 6 ' 7 . One can find more numerical approaches for the Schrodinger equation in 9>16'13 and references therein. The note is organized as follows. In section 2 we review the fourth-order timesplitting sine-spectral method. In section 3 we report numerical results for NLS. 2
Fourth-order time-splitting sine-spectral method
In this section we review the fourth-order time-splitting sine-spectral (TSSP) method 3 for the problem (1), (2) with homogeneous periodic boundary conditions. For the simplicity of notation we shall introduce the method for the case of one space dimension (d = 1). Generalizations to d > 1 are straightforward for tensor product grids and the results remain valid without modifications. For d = 1, the problem becomes iei>l = -£-rxx s
i) (x,t
+ V(xW
= 0)=%(x),
+ f(\r\2)^,
a<x
a<x
E
ip (a,t)=ip (b,t)
t>0, = 0,
(4) t > 0.
(5)
Clearly, the Schrodinger equation is time-reversible, so we could pose equations (4), (5) for t e 11. We choose the spatial mesh size h = Ax > 0 with h = (b — a)/M for M an even positive integer, the time step k = At > 0 and let the grid points and the time step be Xj-.= a + jh,
tn := n k,
j = 0,1, • • •, M,
Let ipEj'n be the approximation of ips(xj,tn) t = tn = nk with components ipfn-
n = 0,1,2,---.
and ips,n be the solution vector at time
From time t = tn to time t = t„+i, the Schrodinger equation (4) is solved in two steps. One solves ie^t
= -JTPXX,
(6)
694
for one time step, followed by solving ial4(x,t)
+ f(\il>s(x,t)\2)ilf{x,t),
= V(x)P(x,t)
(7)
again for one time step. Equation (6) will be discretized in space by the sinespectral method and integrated in time exactly. For t £ [£n>^n+i]> the ODE (7) leaves |V>| invariant in t 7 and therefore becomes iet£i(x,t)
+ f(\r(x,tn)\2)
= V(xW(x,t)
r(x,t)
(8)
and thus can be integrated exactly. From time t = tn to t = tn+i, we combine the splitting steps via the fourth-order split-step method and obtain a fourth-order time-splitting sine-spectral method (SP4) for the Schrodinger equation (4) . The detailed method is given by ^(D
= e-t2ii;i*(V(x,)+/(ltfJ"'|
2
))/e ^=,nj
M-l
$1}
^? = E e-^^'
smQuixj-a)),
M-l
^
e-™*krf
= E
$ 3 ) s i n f a f o - a)),
j = 1,2, • • •, M - 1,
(=i ^,(5)
4) 2
=
e-i2u,3k(V(Xj)+ft,\^
\ ))/e
^(4)^
M-l
^6) = ^ ^e,n+l
=
e-fe»»*M?
$») s i n ^ ^ - a ) ) ,
e-i2«,1fc(V(xj)+/(|^
6) 2
" ( | 6 )>) /| £2 -^ S( 6 ) .
^
where ti>i = 0.33780 17979 89914 40851, w2 = 0.67560 35959 79828 81702, w3 = -0.08780 17979 89914 40851, wA = -0.85120 71979 59657 63405 and Uh the sinetransform coefficients of a complex vector U = (UQ, UI, • • •, UM) with [70 = UM = 0. Notice that the only time discretization error of SP4 is the splitting error, which is now fourth order in fc for any fixed e > 0. For the stability of the time-splitting spectral approximations SP4, we have the following lemma, which shows that the total charge is conserved. L e m m a 2.1 The time-splitting spectral scheme (SP4) (9) is unconditionally stable. In fact, for every mesh size h > 0 and time step k > 0, W'n\\p 3
= W'°\\p
= \M\\P,
n=l,2,-~.
(10)
Numerical examples
Here we consider an example of Id Gross-Pitaevskii equation, i.e. in (1) we choose d = 1, s — 1 and V{x) = x2/2, f(p) = Sp. The initial condition is taken as 1
A0{x) = -Y^e-X
2/
I2,
S0(x) = 0 ,
x
ell.
695 Table 1. The error ||^ e (t) - i>*'h'k(t)\\p at t = 2.0 with h time step
fc-
-i-
-
20
K
fc- J-
fc-
-i-
— 40
~
80
K
K
K
— 160
fc- -*K
— 320
±. fcK
-i-
— 640
5 = 10.0
1.261E-4
8.834E-6
5.712E-7
3.602E-8
2.254E-9
1.422E-10
6 = 20.0
6.039E-4
4.293E-5
2.800E-6
1.771E-7
1.110E-8
6.929E-10
6 = 40.0
3.755E-3
2.250E-4
1.482E-5
9.424E-7
5.915E-8
3.696E-9
We solve on the interval [—16,16], i.e. a = —16 and b = 16 with the homogeneous periodic boundary condition (5). Table 1 shows the errors HV^W — V,e'ft'fc(')lli2 at t = 2.0 with a very fine mesh of mesh size ft = ^ for different 6 and k. For more numerical experiments on various Schrodinger equation see 2>3<7'5'4. References 1. M.H. Anderson, J.R. Ensher, M.R. Matthews, C.E. Wieman, and E.A. Cornell, Science 269, 198 (1995). 2. W. Bao, Fourth-order TSSP method for the nonlinear Schrodinger equation and application to Bose-Einstein condensation, preprint. 3. W. Bao, Time-splitting Chebyshev-spectral approximations for (non)linear Schrodinger equation under (non)zero far-field conditions, preprint. 4. W. Bao, D. Jaksch, Numerical methods for solving damped nonlinear Schrodinger equations with a focusing nonlinearity, preprint. 5. W. Bao, D. Jaksch and P.A. Markowich, Numerical solution of the GrossPitaevskii equation for Bose-Einstein condensation, preprint. 6. W. Bao, Shi Jin and P.A. Markowich, J. Comput. Phys. 175, 487 (2002). 7. W. Bao, S. Jin and P.A. Markowich, Numerical study of time-splitting spectral discretizations of nonlinear Schrodinger equations in the semi-clasical regimes, SIAM J. Sci. Comput., submitted. 8. W. Bao and W. Tang, Ground state solution of trapped interacting BoseEinstein condensate by minimizing a functional, preprint. 9. Q. Chang, E. Jia and W. Sun, J. Comput. Phys. 148, 397 (1999). 10. M. Edwards and K. Burnett, Phys. Rev. A 51, 1382 (1995). 11. P.A. Markowich, N.J. Mauser and F. Poupaud, A Wigner J. Math. Phys. 35, 1066 (1994). 12. P.A. Markowich, P. Pietra and C. Pohl, Numer. Math. 81, 595 (1999). 13. D. Pathria and J.L. Morris, J. Comput. Phys. 87, 108 (1990). 14. L.P. Pitaevskii, Zh. Eksp. Teor. Fiz. 40, 646, 1961. (Sov. Phys. J E T P 13, 451, 1961). 15. C. Sulem and P.L. Sulem, Springer, New York, 1999. 16. T.R. Taha and M.J. Ablowitz, J. Comput. Phys. 55, 203 (1984).
Calculating Global Minimizers of a Nonconvex Energy Potential D a v i d G a o 1 &: P i n g l i n 2
Abstract The Ginzburg-Landau Equation is central to material science, which has been subjected to a substantial study during the last twenty years. Since the total potential energy associated with this equation is a nonconvex (double-well) functional, traditional direct analysis and related numerical methods for solving this nonconvex variational problem are difficult. Based on the canonical dual transformation method proposed recently in [1], an algorithm is presented for solving the nonconvex variational problem. This method provides a parameter (one component of the dual vector) which can serve as an indicator for the global minimization.
1
Primal Problem and Canonical Forms
Let t h e region of space O C I 2 occupied by t h e material be a smooth, bounded simplyconnected domain with boundary dQ. T h e configuration u : fi —> R is a real-valued function (i.e. the so-called order-parameter, which is used to denote a field whose values describe the phase of the system under consideration, see [3]). Consider a nonconvex potential energy
^W = /nYlv«|2dn + ^ ^ ( A - ^ 2 ) 2 d n - ^ U f f d n ,
(i)
in which, the double-well function
is the "coarse-grain" free energy, whose wells define the phases. fco,A > 0 and /i are material constants, g(x) is a given internal source field. Ginzburg-Landau equation in superconductivity and Oseen-Prank liquid crystal model are examples of this kind of energies. Let Uk = {u£
£ 4 ( 0 ) | V u 6 £ 2 ( n , K 2 ) , u(x) = wo Vz e dQ}
be t h e admissible space. Physically it is interested in finding solution w of t h e following primal minimization problem OP) :
P{u) = inf P ( « )
Vw 6 Uk.
'Department of Mathematics, Virginia Tech, Blacksburg, VA, 24061, USA. E-mail: [email protected] Department of mathematics, NUS, Singapore 117543. E-mail: [email protected]
2
696
(2)
697 Due to the nonconvexity of P, the traditional analytic methods and associated algorithms are very difficult. The numerical results of all direct approaches depend on the initial iteration point choosed. In order to solve this nonconvex variational problem and to clarify the phase states (i.e. the minimizers of P(u)), we need to study the canonical dual variational problem. By the canonical dual transformation method introduced in e.g. [2], the generalized finite deformation strain vector £ can be defined by £ = A(u) = (grad u , \u2 - \f
= (e, £)T.
(3)
Thus, in terms of £ = (e, £) T , the stored-energy density is the quadratic function
where
The dual variable <; = £* of £ can be defined by « = (
•
Since A is a quadratic operator, £ can be considered as a Green-type strain vector in finite deformation theory, and its dual variable c is a Kirchhoff-type stress. Since the canonical stored energy V is a quadratic function of the canonical strain £, the canonical constitutive relation <; = C£ is reversible, i.e. £ = C~l<;. Thus, the complementary stored energy Vc can be simply obtained by the traditional Legendre transformation
K°(«) = {(?) • « - v«(«)) = \>?
=^
• * + |A2-
(4)
u = u0},
(5)
By introducing an admissible configuration space Ua defined by Ua=Ua = {u€C2(U,M)\
\/ue£.2{Q,^),
the so-called extended Lagrangian L : Ua x S —> R associated with the canonical primal problem is (see [1])
L(u,S) = =
f[A(u)-
(6)
It is easy to see that for each given u £ Ua, L : S —> R is a strictly concave. However, the convexity of L : Ua —> R depends on the sign of <j. <; > 0 indicates the convexity of L in terms of u.
698
2
Primal-Dual Algorithm and Numerical Examples
Motivated from triality theory given in [1, 2] we can derive an algorithm to find a global minimizer of the nonconvex energy functional. The algorithm is involved with both primal and dual variables. In order to make problem easier, we use <x = koVu to eliminate the stress field, the modified Lagrangian can be defined by La{u,s) = J[\k0\Vu\2
+ {\u2-\)<;-±n2
dQ.
(7)
Both primal and dual variables are involved in the Lagrangian. Algorithm 1 Let a computational domain Q, a distributed external source field g(x) and a boundary source UQ be given. (1) For a given stress field <;k(x) G <S„ such that sk > 0, find the configuration u (x) such that La(uk,<;k)= inf £,(«,?*). (8) ueua (2) Let }k = 9 + koA.uk(x). Solving the algebraic equation 2 , 2 ( A + A) = /fc2
(9)
for sk{x), i = 1,2,3. Choosing the positive root
u | a n = «o-
The elliptic probem can be solved by the standard finite element method with piecewise linear basis functions. Next we apply this algorithm to a example. Example 1 Consider g(x) = 0 and UQ = 2+(x—y) 2 . Take fi = 1, the initial guess ?° = 10 and the tolerance u> = 10 - 4 . The algorithm converges pretty fast. Figures 1-2 depict the triangular meshes we use for both circular and rectangular domains and the corresponding solutions of u(x,y) and s(x,y). Since s(x,y) > 0 we expect the solution u(x,y) is a global minimizer.
699
V
k
/
/
Figure 1: Triangular mesh, solution u(x,y) and s(x, y) on a circular domain
illlllgi
iv
~*y
V
y
Figure 2: Triangular mesh, solution u(x, y) and <;(x, y) on a rectangular domain Above example shows that the algorithm works pretty well in certain situation. However, we are still lack of theoretical results on what conditions we should propose to ensure the convergence of the algorithm. Our computational experience shows that the algorithm often diverges if fi is small. Sometimes we don't get positive ? in the whole domain. Nevertheless, the algorithm provides a parameter indicator which can justify that the solution we obtain is a global minimizer by the triality theory. This outstands the algorithm from other direct PDE solvers.
References [1] Gao, D.Y., Analytic solutions and triality theory for nonconvex and nonsmooth variational problems with applications, Nonlinear Analysis 42 (2000), 1161-1193. [2] Gao, D.Y., Duality Principles in Nonconvex Systems, Kluwer Academic Publishers, Netherlands 2000. [3] Gurtin, M.E., Thermomechanics of Evolving Phase Boundaries in the Plane, Oxford University Press, New York 1993.
A QR-type Method for Computing the SVD of a General Matrix Product/Quotient Delin Chu Department of Mathematics National University of Singapore 2 Science Drive 2, Singapore 117543. Email: [email protected] October 2 1 , 2002
Abstract In this paper, a QR-type reduction technique is developed for the computation of the SVD of a general matrix product/quotient A — A*1 A% • • • A"^ with A{ e R n x n and Sj = 1 or s, = —1. First the matrix A is reduced by at most m QR-factorizations to the form Qjj'(Qjj') - 1 , where Qn , Q21' e R.n*™ and (Q(111))TQii) + (Q&'rQw = 7 - T h e n t h e S V D o f A i s obtained by computing the CSD (CosineSine Decomposition) of Q\i and Q21 using the Matlab command gsvd. The performance of the proposed method is verified by some numerical examples.
1 Introduction This paper deals with a new method for the computation of the Singular Value Decomposition (SVD) of a sequence of matrices in product/quotient form. The simplest forms of these Generalized SVD's (GSVD), for two matrices, are the well-known Quotient SVD (QSVD) and Product SVD (PSVD). One of the three possible forms involving three matrices, is the so-called Restricted SVD (RSVD). The GSVD is one of the essential numerical linear algebraic tools in signal processing and identification. Possible applications include source separation, stochastic realization, generalized Gauss-Markov estimation problems, generalized total linear least squares, open and closed loop balancing, etc. Like the QSVD, PSVD and RSVD, the SVD of a general matrix product/quotient has many applications. For example, it is important for the estimation of Lyapunov exponents for dynamic systems. Consider finite difference equations &k+1 = * i » e t l 9o = / , * * € R n x n , sk = 1 or sk = - 1 .
700
(1)
701 The ith Lyapunov exponent is then defined by Xi =
\imk^too\og(ai(Qk))/k,
where cr,(6jt) is the ith biggest singular value of 0^. Discretizations of ordinary differential equations may also lead to sequences of matrix products/quotients. In this paper, we propose a QR-type reduction technique for computing the SVD of a general matrix product/quotient /i — yi1 / i 2
sim
with Ai £ R nx ™, Sj = 1 or Sj = —1. We will show that, if not all Sj are the same, the matrix A can be reduced b y m - 1 QR-factorizations to the form Q^iQ^)'1 with Q^, Q$> e R " x " , (Q$)TQ$ + T = a s are (Q21) *?2l I'i ^ " i equal, then we need TO QR-factorizations. The main advantage of this QR-type reduction is the way in which quotients are dealt with. Finally the SVD of A can be obtained by resorting, e.g., to Van Loan's CSD method .
2
A QR-Type M e t h o d
Consider a matrix A of the following form: • ASI A32 • m > 2, • -^i ^ 2 . Ai is nonsingular if st = — 1.
A 6 R " x n , Si = 1 or s{
(2)
Assume for simplicity that the matrices Ai in (2) are square. The method we develop in this paper is as follows: Algorithm 1 Input: Matrix A of the form (2). Q${Q21l)-1. Output: Matrices Q$,Q$ € R n x n such that (Q$)TQ$ + {QZIVQTL ••I andA = Init: If all Si = 1, set Am+i := I, sm+i := —1;TO:= m + 1, If all Si = - 1 , set Am+i := Am,Am := I,sm = l,sm+i := - l ; m := m + 1, If —«! = . . . = —Sj = Sj+j = ... = sm = 1, apply procedure to AT. Determine maximal j such that SJ V21
—
1 and
-1.
Sj+i
Set sm+i
:= - 1 ,
A
3+l-
Loop: for i = j , j - 1 , . . . , 1, do: A
• Case Si = 1 and s;_i = 1. Compute the QR factorization of
iQll V21
AiQn r>('+1) W21
Q2i
Q22
0
, *w,Qi?,Q$,Q£,< >£
Case st = 1 and Sj_i = —1. Compute the QR factorization of
^Q^1} V21
Q2\+1)
Qn
Q12
Q21
^2
0
-Sj+2,
Q
O+i) = J,
702
Case Si = - 1 and s;_i = 1. Compute the QR factorization
^ Q <6 '1 +1 1 )
of
n («+l)
V21 r
.T_I«-I-I1
n
^ST'
r
„r«l
V21
„ M
oS
1
r
„
0
1
«,(«) & «r>W & «nW §',«&
^
6R"X".
i!22
- 1 . Compute the QR factorization
Case s, = — 1 and s,.
of
n (i+l)
V21
AJQ^
3i?
V21
Q21
Bnd loop. 5etQW+i):=Q(i))Q««)
= < $ .Loop: /or
Qlr ' ^r*
Q21
TJ"
i = j + 2, j + 3 , . . . , m do:
Case Sj = 1 and s;+i = 1. Compute the QR factorization
1
r>W r>M «(') n(') c
R(>)
0
Q22
°1
of
AJQV]
0
Q22
Case Sj = 1 and Sj+i = — 1. Compute the QR factorization
of
sir1'
AJQ^]
AM-V
Q«
0
QS
(i) Q£ o
, it
, Q j j , Q12, Q211Q22 ^ " •
V22
Oase Sj
-1 and Sj+i = 1. Compute the QR factorization 0
V12 AiQ 21 Case ;
-1 and Sj + i :
of
, .R w , Q n , Q 1 2 , Q211022
e
R-'
Q22 J 1. Compute the QR factorization
of j4j
AiQ 21
QS «$
R® Q22
J
0
,
Rlii,Q$,Q$,Q$,Q$eRn*».
End loop. In this algorithm, we first determine a value j such that Sj-i = 1 and Sj = - 1 . Prom there, we work further to the left and subsequently to the right, as explained above (note that in our implementation so allows to take into account the type of operation required for i = j + 2). If Sj_i = 1 = — Sj does not apply, but instead we have Sj = 1 and Sj-\ = —1, then we can work with AT instead of A. In these cases, we need only m — 1 QR-factorizations. Only if si = S2 = • • • = sm, we have to plug in an artificial I and the method requires m QR-factorizations. In Algorithm 1, the explicit computation of AJ1 and explicit solution of the corresponding triangular linear system are avoided if Sj = — 1.
703 After reducing A to Q n ( Q 2 i ) 1 by Algorithm 1, we can compute the SVD of A by computing the CSD of Q!Q and Q 2 i by the Matlab command gsvd. In the following we explain that the computations involved in Algorithm 1 can be posed as left and right orthogonal transformations of a large matrix whose sub-blocks are the Aj or their transposes, several unit matrices, and the rest being zero matrices. For simplicity, we assume without loss of generality that j = m - 1 in Algorithm 1, i.e., sm = —1 and s m - i = 1. Define Am—1 -Am
Mm-l For i = 1,
» = i, Q21
, m — 1.
Q22 J
,m-2,
AT
M{:=
0 In
0
0 In
Ai
M{:=
, if Si — 1 and Sj_i = 1, or at — 1 and Sj_i = —1,
0
if S{ = —1 and Sj_i = 1, or Si = —1 and s,_i = —1.
Set Qm-l
0
0 0
V:
•• 0
e
0
771—1
if m is odd,
0 QT2
In 0
0 Qm-2
0 0
0 0
0 0
0 0
'•• 0
0 Qi
M =
if m is odd,
F :=
0
AC-3
0 ^m-4
0 0
0 0
0
T
Af
U :=
M m_x 0
Mm-2 Xm_3
0
0
0
0
0 A4 m _4
if m is even,
0
'••
0
0
0
Q[J
In 0
0 Qm-2
0 0
0 0
0 0
0 0
'• 0
0 e2
if m is even,
if m is odd, Ml
Mi
0 0
0 0 if m is even.
Ml 0
Ml A4?" J
Then U and V are orthogonal matrices, and UMV = R,
(3)
where
fl =
•Rm-l 0 o 0
Km-2 1lm-3 o 0
0 Km-4
0 0
0 0
'•• 0
'•• %
o K,
if m is odd,
704
ftm_! 0
ftm_2 TCm_3
0 7J m _ 4
0 0
0 0
R
if m is even, 0
0 R(m-1)
"R-m-l =
0
%! 0 fl(m-l)
or Tlm-i =
0
%i 6 R n x " , i = 1, • • •, m — 2, are of one of the following forms ' flW * " 0 *
' B«
'
*
0'
*
*
* '
0
flW
1
' * •
0 flW
flW 6 R7**™ (j = 1, • • • , m — 1) are nonsingular. Let X denote the estimate of X computed with finite precision arithmetic, as opposed to exact arithmetic, and let e denote the machine precision. From (3), we have [?] \\UTU - hn\\ ~ £, \\VTV-hn\\~t,
\\UMV-R\\&e\\M\\
(4)
Hence, algorithm 1 is backward stable in the sense that (4) holds.
3
Conclusions
In this paper, we have studied the computation of the SVD of a general matrix product/quotient sequence. First we reduced the sequence by at most m QR-factorizations to the form Qn {Q\i) -l , with Q ^ . Q ^ e T (i) ii + T i.V2i) R « x " a n d ( Q ii W )f QvW ( Q ^ ) >^2i Q ^— - I- Then we obtain the SVD of A by computing the CSD of Q\{ and Q21' -.a) using the Matlab command gsvd. An advantage of our QR-type reduction is its flexibility for adding one more matrix from left or right to the matrix A of a matrix product/quotient, this feature is very useful for the applications like the estimation of Lyapunov exponents of dynamic systems. Some numerical examples were given to show the performance of the presented method.
NEWTON'S METHOD FOR NON-DIFFERENTIABLE EQUATIONS: CONVERGENCE A N D APPLICATIONS
D E F E N G SUN Department
of Mathematics and Center for Industrial Mathematics, National of Singapore, Singapore 117543, Republic of Singapore. Email:matsundf@nus. edu.sg
University
Newton's method has been proved to be the most effective approach for solving nonlinear systems of equations. While the convergence properties of Newton's method for differentiable equations have been well understood for a long time, its behavior for solving non-differentiable equations has only been discovered successfully quite recently. In this talk, we first present the recent advances in convergence analysis of Newton's method for solving non-differentiable equations and then briefly introduce its applications in fields of optimization, variational inequalities, best interpolation, inverse eigenvalue, optimal control and computational mechanics.
1
Introduction
Suppose that F : $tn —> 3ftn is a locally Lipschitz continuous functions, i.e., \\F(y)-F(x)\\
= 0.
(1)
When F is continuously differentiable (smooth), the most effective approach for solving (1) is probably Newton's method. For example, in 1987, S. Smale 4 wrote "If any algorithm has proved itself for the problem of nonlinear systems, it is Newton's method and its many modifications. ... Thus a relation between the simplex method of linear programming and Newton's method, is no surprise. ... " The most attractive feature of Newton's method for solving smooth systems is its quadratic convergence when the initial point is sufficiently close to the solution. However, in applications in fields of optimization, best interpolation, computational mechanics, and many other fields, it is often found that F is not smooth everywhere. Hence, Newton's method is no longer valid for solving (1). There are counter examples in the literature proving the non-convergence of Newton's method when F is not smooth. In this paper, we will introduce a smoothing Newton method for solving non-differentiable equations and analyze its rate of convergence. 2
S e m i s m o o t h Functions a n d S m o o t h i n g Functions
For a locally Lipschitz continuous function F, by Rademacher's Theorem, we know that F is differentiable almost everywhere. So, Clarke's generalized Jacobian is well defined 1 : dF(x) = conv{lim F'(y),y
705
—» x,y €
Dp}.
706 Here Dp denotes the set where F is differentiable and conv^l denotes the convex hull of a set A. For example, for F(x) = max{0, x}, x £ 5R, we have 0F(O) = [O,1]. A locally Lipschitz continuously function F : 5ftn —»5ftn is semisinootn 3 at x if lim
Vh'
(2)
Vg8F(i+th')
U0 n
exists for every nonzero h € R . F is semismooth at x implies that F is directionally differentiable at x. Another equivalent definition is that F is said to be semismooth at a; if F is directionally differentiable at x and for any h —> 0 and V € dF(x + h), F{x + h)-F{x)-Vh F is said to be strongly semismooth
= o(\\h\\).
(3)
at x if F is semismooth at x and = 0(\\h\\2).
F(x + h)-F(x)-Vh
(4)
One may use the definition to check that the following two functions are strongly semismooth: F(a,b) = Va2 + b2 , (a,b) 6 K2 and F ( e , a, b) = Ve2 + a2 + b2 , (e, a, 6) € 5R3 . A function G : K x 5R" —> K n is called a smoothing function of a nonsmooth function F : 5ft™ —> JJ™ if G is continuously differentiable on [ft x 3i n except 0 x Sftn and for any x € Jft™, lim
G(e,y) = F ( i ) .
(5)
elO.y—>x
In general, the existence of smoothing function G is proved in Sun and Qi 5 by using Steklov's averaged function. In practice, easily computed smoothing functions can be constructed. For example, let F(t) = max{0,t},t € 5ft. Then the defined function
G(e,t) = ±(t+Vt2 + e2)
(6)
is a smoothing function of F . 3
A Smoothing Newton Method
Suppose that G is a smoothing function of F . Let E : 5ft x 5ftn —> 5ft x 5ft™ be defined by E(e,x)
:=
£
G(s,x)
707 Then, F(x) = 0<—>E(e,x) = 0, which implies that solving a nonsmooth system of equations is equivalent to solving a smoothing (nonsmooth) system of equations. Before we introduce the smoothing Newton method, we need the following assumptions. A s s u m p t i o n 1: (i) G is a smoothing function of F. (ii) For any e > 0 and x £ 5ftn, G'x(e,x) is nonsingular. A s s u m p t i o n 2: G is semismooth at (0, x*), where x* is a solution. A s s u m p t i o n 3: G is strongly semismooth at (0,x*). Choose s e 5ft++ and 7 G (0,1) such that -ye < 1. Let z := (e,0) € 5ft x 5Rn. Define the merit function -0 : 5Rn+1 —» 5R+ by
V(*) := ||2?(*) II2 and define f3 : 5ftn+1 -> 5K+ by /?(z) : = 7 m i n { l , V ( z ) } . Let n •= {z : = {£,x) e 3* x K n | e > /3(z)e}. Then, because for any 2 £ 5Rn+1, /3(z) < 7 < 1, it follows that for any x 6 5Rn,
(e, x) e n. A S m o o t h i n g N e w t o n M e t h o d : [Qi, Sun and Zhou 2 ] S t e p 0. Choose constants 5 £ (0,1) and a 6 (0,1/2). Let e° := e, x° € » n be an arbitrary point and k := 0. S t e p 1. If E{zk)
= 0 then stop. Otherwise, let pk := /3(zk).
S t e p 2. Compute Azk := ( A e \ Ax fc ) e » x 5ftn by E(z fc ) + £;'(z fc )Az fc =
ftf.
(7)
S t e p 3. Let Zfc be the smallest nonnegative integer I satisfying iP{zk + SlAzk) Define z
fc+1
fc
< [1 - 2
(8)
fc
:= z + #*Az .
S t e p 4. Replace k by fc + 1 and go to Step 1. T h e o r e m 1. Suppose that Assumption 1 is satisfied. Then an infinite sequence {zk} is generated by the above algorithm with lim 4){zk) = 0 k—*oo
708 and each accumulation point z of {zk} is a solution of E{z) = 0. Moreover, suppose that Assumption 2 is satisfied and that z* := (0,x*) is an accumulation point of the infinite sequence {zk} generated. If dE(z') are nonsingular, then the whole sequence {zk} converges to z*, \\zk+l-z*\\
o(\\zk-z*\\)
=
and ek+1 =
o(sk).
Furthermore, if Assumption 3 is satisfied, then \\zk+l 4
- z* || = 0{\\zk
- z' ||2)
and
ek+1 =
0(ek)2.
Conclusion
In this paper we introduce a smoothing Newton method of quadratic convergence for solving the nonsmooth system of equations. Its applications in fields of optimization, variational inequalities and inverse eigenvalue problems have been well studied (http://www.math.nus.edu.sg/~matsundf). The smoothing newton method for solving nonsmooth equations arising from best interpolation, optimal control, computational mechanics and other fields are being investigated. New discoveries will be posted in the above webpage. References 1. F. H. Clarke, Optimization and Nonsmooth Analysis (Wiley, New York, 1983). 2. L. Qi, D. Sun, and G. Zhou, A new look at smoothing Newton methods for nonlinear complementarity problems and box constrained variational inequality problems, Math. Program., 87 (2000), 1-35. 3. L. Qi and J. Sun, A nonsmooth version of Newton's method, Math. Program., 58 (1993), 353-367. 4. S. Smale, Algorithms for solving equations, Proceeding of International Congress of Mathematicians, Edited by Gleason, A. M., American Mathematics Society, Providence, Rhode Island, 1987, pp.172-195. 5. D. Sun and L. Qi, Solving variational inequality problems via smoothingnonsmooth reformulations, J. Comput. Appl. Math., 129 (2001), 37-62.
Numerical Solution of Blow-Up Problems Using Mesh-Dependent Variable Temporal Steps K.W. LIANG, P. LIN and R.C.E. TAN Department of mathematics, National University of Singapore
1
Introduction
The mesh adaptive methods have played important roles in solving the parabolic PDEs, whose solutions may develop singularities in a finite time X. All adaptive meshes, however, are sets of some regular discretization nodes including the property that spatial nodes have the same temporal increments among time-levels [2, 3, 4, 5, 6]. Due to the blow-up of the solution, all the adaptive methods have to be stopped as soon as t approaches T. The reason is that the numerical solution is unstable after T, which is caused by shock, since the solutions (or the derivatives of the solutions) become very sensitive and blow-up with respect to time. In this circumstance, we present a new method to automatically generate an adaptive irregular mesh, which can overcome the above limitation. Our new method will focus on the classical quenching type partial differential equation, ut = uyy +
9,
0 < y < a, 0 < t < T,
u(y,0) = u o ,w(0,t) = u(a,t) = 0,
ye (0,a),i 6 (0,T)
where 9 > 0 and 0 < uo < 1- As figure 1.1 showed, our adaptive irregular mesh is achieved through letting the spatial nodes at same time-level have different temporal increments based on ut. If the variation of Ut at node A is smaller than it at node B, then the temporal increment T& is bigger than TB- As a result, the line will be replaced by a curve at a time-level. Repeatedly, the next temporal increment TA* and TB< are determined by the variations of the function ut similarly. In practice, since the interpolating process is very complex in the constructure of the time curve, we will replace the curve by multi line. The computation of the solution of (1) is important for several reasons. First, although very simple, the problem (1) provides a fundamental combustion model to many quenching processes. And the quenching behavior in this problem is typical singularities (blow-up) of a wide class PDEs modeling many important physical phenomena. Second, quenching problems have been widely analyzed and it is well known about the property of solution when t approaches X, thus it is a excellent problem for testing the performance and verifying the efficiency of our new method. Third, the middle location arrives quenching firstly, which is the only proved character about the quenching problem [1], while the post-quenching behaviors of solution are unknown till now, and then using our new method can lend insight into the post-quenching characters.
2
Discretizations and The Difference Scheme
Note that for y/a = x, problem (1) can be conveniently reformulated into the following form ut = ^uxx + ^, az (1 - u)a u(x,0) = uQ,u(0,t) =u(l,t)
0<x
x € (0,1),t € (0,X).
It assumed that 0 = xo < • • • < XN = 1 are the spatial nodes on [0,1] and hi = Xi+i — x*. And we denote tj^ as the j-th discrete time step at node xt and TJJ as the j-th temporal increment at node Xj, where tj+i,t = tj,i + Tji. With our new method of mesh generation, the temporal coordinates and temporal increments are determined by ut as mentioned in section
709
710 1. And we adopt the arc-length monitor function on ut to determined r,,; with To, T\ and T2 given [5], T^ = Tj_xti
+ {(ut)j-2,i
~ ((«t)j-l,i - («t)j-2,t)2 ,
~ {Ut)j-3,if
(3)
where indices i, 0
= h—
k—+2
+
«0 Qx2
(4)
Together with the three neighboring points of (ZOJ*O) as figure 2.1 showed, we immediately have the following equation
-h 0 h
h ^ 1 r 02 l2 2 J
u(xi,ti)
our, L
=
dx2 -J
-u0
u(x2,t2) -u0 u(x3,t3) -u0 _
(5)
(X2,t2) (Xl,ti)
Fig. 1.1 Since the 3 x 3 matrix in the equation (5) is non-singular, then we get the implicit difference scheme
fc+MaW
«£i+(i+
2r,-,i
„J'+i
:i£ + •
6j.f i,ja 2 /i 2
°j+l , i ( l " ^ )
where «:? = «(£;,!,'), 6j+i,j = 1 + ——-— 2
3+ 2
'l+
~ tj+l,i\'j+l,i+l
=
tj+l,i+l
—
(6)
and / j + i ] i _ 1 ( ^ + l i , + i ) is the tempo-
ral spacing between mesh points (x;_i,£j+i) and (xi,tj+\)((xi+i,tj+i) •j + l . i - l = tj + l,i-l
S
and (XJ,£,- + I)), i.e.,
ij' + l.i)-
Remark 3.1 For square regular mesh, the irregular difference scheme (6) reduce to the usual implicit finite difference formulae with 6 = 1 .
3
Convergence and Stability Analysis
We assume that -^4% and ^jy are continuous in [0,1] x [0,X] and the following conditions are satisfied I = uiax\ljti\ < Ch and 6j-+i,i > 1 (7) Considering u is the exact solution of (2) and using Taylor's series expansion, we have the local truncation error of (6)
\EJ+1\
(8)
711 where C, K\ and K2 are positive constants and At = max|T,;|. Let Uf is the numerical solution of the implicit difference scheme (6) and e{ = u\ - Jj{. If we let e>+l = max* \e{+11, then we obtain e^ < (1 + K3AtYe0 of ,
H
^ E, where K3 is the maximum magnitude K, .(f_ u)9+ i • Since e° = 0 and (8), we have e?' —> 0 as h and At —> 0.
Theorem 3.1 The implicit difference scheme (6) is stable if &,-,< > 1 for all i and j . Remark 3.2 The following restrictions are necessary to guarantee the condition bjti > 1 in (7). If lj,i-i * lj,i+i < 0 then lj,i-\ + lj,i+i must be nonnegative. In practical computation, we require a extra procedure to obtain the function values of ut in (3) for improving the accuracy of solution. Taking the time derivative of equation (2), we have d 1 d2 6 with the initial-boundary conditions ut(x,0)
= l,ut(0,t)
= ut(l,t)
= 0 , i 6 (0,1), te (0,T).
As for (6), (9) can similarly be approximated by the implicit difference scheme 2r,,i
Tj,<
b]+h,aW «Cl+ U + bj+hia2h2
j+i
J+1 Oj + l ,iO?h?
^ „ '
= ,r? •
1
°3+l
, i ( i - «3r+)\'9
«f (10)
where vj = ut(xi,tj). Associated with (6), we can solve a system of equations on u and u ( without extra computational cost, since (6) and (10) have the same triangular coefficient matrix.
4
Numerical Experiments
We apply the new method in sections 2 to solve the problem (2) with the only case of 0 = 1 since other cases involving 9 > 0 are similar. Without the loss of any generality, the initial value uo is set to be zero. The spatial mesh step size h varies from 0.1 to 0.01, while the initial temporal step size is chosen to be 0.01 — 0.001. We observe that, in figure 4.3, the function ut(xi, t) at Xi = 0.4(0.6) grows rapidly and the peak of it exceed 15, the same as it is at Xi = 0.5. It implies that the solution u at X{ = 0.4(0.6) quenches following the node x; = 0.5. Although the values of the solution u at other spatial nodes are below 1, or far away from 1 at some nodes, the derivative functions ut at related spatial nodes have been increasing in figure 4.2. Especially, the phenomenon of the rapid increase of ii ( is obviously at x = 0.3 and the value of Ut has reached 3.2594. So we can conclude that the solution of problem (2) quenches finally for the whole spatial domain except two boundary nodes. The contour maps in figure 4.5 are also indicated the conclusion, since the contours change to flatness while t increases. Table 4.1 Quenching time T(a) and maximal temporal coordinates maxtiJ i i,j
ft = 0.1 max tj:i Ta a- 2 a= n a = 10 a = 25
2.5826 1.5449 0.9162 0.8503
1.8029 0.7893 0.5275 0.5024
'
h = 0.0.5 max i, i Ta
h = 0.02 maxi, i Ta
h = 0.01 maxi,' i T
1.4757 0.9274 0.8208 0.6677
0.9815 0.7669 0.6027 0.6064
0.8638 0.6049 0.5599 0.5531
1.1010 0.6005 0.5112 0.5014
±
i,3
0.8469 0.5713 0.5010 0.5010
a
0.7890 0.5400 0.5008 0.5004
Figure 4.4 displays the curve of spatial nodes at one time-level immediately before the quenching. It is observed the nodes near the boundaries are further from the mid location with respect to t. In fact, the nodes but the middle point can be extended to a further temporal
712
domains for enlarging the maximal temporal spacing. So that we can well study the postquenching behavior of solution. In table 4.1, we list newly computed quenching time Ta and the maximal temporal coordinates max £,; for various given values of a and h. We also note that enlarging the temporal spacing leads to the delay of the quenching time, i.e., it reduces the accuracy of the quenching time. The reason is that the error of approximation of finite difference scheme depends not only on the derivatives of solution and the sizes of spatial and temporal steps, but also on the spacing and the shape of mesh's cells. The shape of cells change to narrow and the angles between mesh lines become small if the temporal spacing is enlarged. On other hand, we can improve the accuracy of the numerical solution through decreasing the spatial step size h and the initial temporal step size To- Furthermore, the delay of quenching time do not affect the conclusion of quenching characters of the solution.
References [1] H. Kawarada, On solutions of initial-boundary problem for ut — uxx + 1/(1 — u), Pul. Res. Inst. Math. Sci. 10(1975), 729-736. [2] Q. Sheng and A. Q. M. Khaliq, A compound adaptive approach to degenerate nonlinear quenching problems, Numer. Meth. for PDEs, 15(1999) 29-47. [3] Q. Sheng and H. Cheng, A moving mesh approach to the numerical solution of nonlinear degenerate quenching problems, Dynamic Sys. Appl., (to appear). [4] Q. Sheng and H. Cheng, An adaptive grid method for degenerate semilinear quenching problems, Computers and Mathematics with Applications, 39(2000), 57-71. [5] H. Cheng, P. Lin, Q. Sheng and R. C. E. Tan, Solving degenerate reaction-diffusion equations via adaptive Peaceman-Rachford splitting, submitted. [6] Q. Sheng, A monotonicaJJy convergent adaptive method for nonlinear combustion problems, Integral Methods in Science & Engineering (Research Notes in Math., 418), Chapman & Hall/CRC, London and New York, (2000), 310-315.
N O N L I N E A R B O U N D A R Y L A Y E R S OF T H E B O L T Z M A N N E Q U A T I O N SEIJI UKAI, TONG YANG, AND SHIH-HSIEN YU ABSTRACT. We will summarize our recent study on the existence theory on half-space boundary value problem of the nonlinear Boltzmann equation of a hard sphere gas, assigning a Dirichlet data for incoming particles at the boundary and a Maxwellian as the far field, [15]. It shows that the solvability of the problem changes with the Mach number Jit00 associated to the far Maxwellian: If ^ ° ° < —1, there exists a unique smooth solution connecting the Dirichlet data and the far Maxwellian for any Dirichlet data sufficiently close to the far Maxwellian, while, otherwise, such solutions exist only for Dirichlet data satisfying certain admissible conditions and the set of admissible Dirichlet data forms a smooth manifold of codimension 1 for the case — 1 < JC^ < 0, 4 for 0 < M00 < 1 and 5 for JC" > 1, respectively. Then we will discuss the stability of boundary layer solutions for the case when ^ ° ° < — 1.
1. I N T R O D U C T I O N AND M A I N R E S U L T
The Dirichlet problem of the nonlinear Boltzmann equation in the half-space arises in the analysis of the kinetic boundary layer, the condensation-evaporation problem and other problems related to the kinetic behavior of the gas near the wall, [5], [12]. The main concern is to find a solution which tends to an assigned Maxwellian at infinity. An interesting feature of this problem is that not all Dirichlet data are admissible and the number of admissible conditions changes with the far Maxwellian. This has been shown for the linear case by many authors [3],[6],[7],[9], mainly in the context of the classical Milne and Kramers problems. Recently, a nonlinear admissible condition was derived for the discrete velocity model in [14] and the stability of steady solutions was proven in [11]. The full nonlinear problem was solved on the existence of solutions in [8] for the case of the specular reflection boundary condition, whose proof, however, does not work for the Dirichlet boundary condition, whereas in [2], the Dirichlet case has been solved but with the ambiguity that the far Maxwellian cannot be fixed a priori, in addition to some non-physical truncation assumption. We will establish the admissible conditions for each far Maxwellian. Our proof provides also a new aspect of the linear problem. It should be mentioned that K. Aoki, Y. Sone and their group, ([1], [12], [13] and references therein), made an extensive numerical computation on the same nonlinear problem. Our result gives a partial explanation of their numerical results. In [15], we study the existence theory stationary solutions in a half-space x > 0 in which the spatial dependence of the mass density F of gas particles is assumed constant on each plane parallel to the boundary x = 0 but the velocity dependence is fully 3-dimensional, that is, F is assumed to be a function of position x and particle velocity £ = (£1,62, £3) £ R3- Let fi stand for the velocity component along the rr-axis. Then, our problem is,
r fi^x = (1-1)
< F\x=a [ F
oo.eei 3 ,
Q(F,F),
= F0(O, ->• Moc(0
(x->°c),
€1 > 0 , ( 6 , 6 ) e K 2 , (el3.
Here, Q, the collision operator, is a bilinear integral operator (1.2)
Q(F, G)=
[
(F(?)G(£)
- F(OG(t,))
713
q(( - £„ u) d£.dw,
714 with
(i-3)
r = f-[tt-e.)-w]w.
e = f. + [(f-eo-w]w,
where "•" is the inner product of R 3 . In this paper, we restrict ourselves to the hard sphere gas for which the collision kernel q is given by ?(C,w) =
a0\(-u\,
where <7o is the surface area of the hard sphere. Here we shall recall two classical properties of Q which will be needed later. See [4], [5] for details. ( Q l ) Q(F) = 0 if and only if
d-4)
F
=
P
M
„
(
If-"12*
^T^^j^wM-^r-) (27rT) / 3 2
for any constants p , T > 0 and u = (u^, 112,11$) € R 3 . This is a Maxwellian and is the distribution function of a gas in the equilibrium state with the mass density p, flow velocity u and temperature T. (Q2) A function <j){£) is called a collision invariant of Q if <0,<2(F)) = O
for all F,
{,) being the inner product of L 2 (R?). Q has five collision invariants (1.5)
0o = l ,
& = &(t = l,2,3),
4>4 = |f|2,
which indicate the conservations of mass, momentum and energy in the course of the binary collision of particles. The second equation in (1.1) is the Dirichlet boundary condition. The Dirichlet data Fo(£) can be assigned only for incoming particles (£1 > 0), because assigning the outgoing ones (£ : < 0) makes the problem ill-posed as is seen from the a priori estimate given in the next section. This corresponds to the physical situation that only the incoming distribution can be controlled on the wall. The distribution M^ in the third equation of (1.1) is the boundary data at x = 00. It follows from the property (Ql) that (1.1) does never have a solution unless M^ is a Maxwellian. Thus, (1-6)
Moo(0
=
M[poo,uoo,Too}(0,
where the constants px > OjUoo = (2*00,1, "00,2, "00,3) 6 R 3 , and Tx > 0 are the only quantities that we can control. By a shift of the variable £2, ?3, we can assume without loss of generality that "00,2 = ^00,3 = 0, and then, the sound speed and Mach number in the far field are given by
(1-7)
coo = J%, V o
^
= ^Si, Coo
respectively, see [5]. Note that the flow at infinity is incoming (resp. outgoing) if ^#°° < 0 (resp. > 0) and supersonic (resp. subsonic) if \*df°°\ > 1 (resp. < 1). The Mach number ^ # ° c provides significant changes on the solvability of (1.1). Indeed, since the "Dirichlet data" M oc (^) is imposed both for incoming and outgoing particles, it is over-determined and hence (1.1) is not necessarily solvable uncoditionally. Actually, we will show that the number n+ of solvability conditions changes with „#°° as
(1.8)
0, 1, 4, 5,
^ ° ° e (-oo,-l), ^r°°e(-i,o), ^°°G(0,1), ^°°e(l,oo).
715 To be more precise, introduce the weight function (1-9)
WP{0
= (1 + I f D - ^ M l l . i i c c . T J t t ) ) 1 7 2 ,
with /3 e R. The main result obtained in [15] on existence is: T h e o r e m 1.1. Let M^ be the Maxwellian (1.6) with JC^ ^ 0, ± 1 and let /3 > 5/2. Then, there exist positive numbers e^,t\,a, and a Cl map (1.10)
#:L2(R^,£idO—*Kn+,
*(0) = 0,
such that the following holds. (i) Suppose that the boundary data F0 satisfy (1-11)
^(O-AMfll^eoWjCO,
eeR»..
TTiera, i/ie problem (1.1) admits a unique solution F in the class (1.12) \F(x,0 - M ^ O I + 16(1 + K i r 1 F x ( z , f l | < e 1 e - « W / J ( £ ) , j / and on/j/ ?/ Fo satisfies (1.13)
* > °>
?eR3>
* ( F 0 - Moo) = 0.
(ii) The set of F0 satisfying (1.11) and (1.13) forms a (local) C\ manifold of codimension n+. Based on this existence theorem, we will then discuss the stability of the boundary layer. As for the existence theory, we can see that the case when j^°° < — 1 would be simpler than other cases as all the information from the far field goes to the boundary and the solution to the linearized equation is exponential decay. Hence, we will discuss briefly the proof of the following stability theorem. T h e o r e m 1.2. The boundary layer solution obtained in Theorem 1.1 when ^#°° < —1 is nonlinear stable under small perturbation. A c k n o w l e d g m e n t : The research of the first author was supported by Grant-in Aid for Scientific Research (C) 136470207, Japan Society for the Promotion of Science (JSPS). The research of the second author was supported by the Competitive Earmarked Research Grant of Hong Kong # 9040648. The research of the third author was supported by the Competitive Earmarked Research Grant of Hong Kong # 9040645. REFERENCES [1] Aoki K., Nishino, K., Sone, Y., Sugimoto, H.(1991): Numerical analysis of steady flows of a gas condensing on or evaporating from its plane condensed phase on the basis of kinetic theory: Effect of gas motion along the condensed phase, Phys. Fluids A, 3, 2260-2275 [2] Arkeryd, L., Nouri, A. (2000): On the Milne problem and the hydrodynamic limit for a steady Boltzmann quation model, J. Stat. Phys., 99, 993-1019 [3] Bardos, C , Caflish, R. E., Nicolaenko, B. (1986): The Milne and Kramers problems for the Boltzmann equation of a hard sphere gas, Comm. Pure Appl. Math. 49, 323-352 [4] Carleman, T., (1932): Sur La Theorie de l'Equation Integrodiffercntielle de Boltzmann, Acta Mathematica, 60, 91-142 [5] Cercignani, C , Illner, R., Purvelenti, M. (1994): The Mathematical Theory of Dilute Gases, Springer-Verlag, Berline, [6] Cercignani, C. (1986): Half-space problem in the kinetic theory of gases, in: Kroner, E., Kirchgassner, K. (eds.) Trends in Applications of Pure Mathematics to Mechanics, Springer-Verlag, Berlin, 35-50 [7] Coron, F., Golse, F., Sulem, C. (1988): A classification of well-posed kinetic layer problems, Commun. Pure Appl. Math., 4 1 , 409-435. [8] Golse, F., Perthame, B., Sulem, C. (1988): On a boundary layer problem for the nonlinear Boltzmann equation, Arch. Rational Mech. Anal.. 103 , 81-96
716 [9J Golse, F., Poupaud, F.(1989): Stationary solutions of the linearized Boltzmann equation in a half-space, Math. Methods Appl. Sci., 11, 483-502 [10] Liu, T.-P., Yu, S.-H. (2002): Boltzmann Equation: Micro-Macro Decompositions and Positivity of Shock Profiles, to appear [11] Nikkuni, S., Kawashima, S. (2000): Stability of stationary solutions to the half-space problem for the discrete Boltzmann equation with multiple collisions, Kyushu J. Math., 54, 233-255 [12] Sone, Y. (2002): Kinetic Theory and Fluid Dynamics, Birkhauser, Basel [13] Sone, Y., Aoki, K., Yamashita, 1.(1986): A study of unsteady strong condensation on a plane condensed phase with special interest in formation of steady profile, in: Bom, V., and Cercignani, C. (eds), Rarefied Gas Dynamics, Teubner, Stuttgart, II, 323-333. [14] Ukai, S. (1998): On the half-space problem for the discrete velocity model of the Boltzmann equation, in Kawashima, S., Yangisawa, T. (eds), Advances in Nonlinear Partial Differential Equations and Stochastic Series on Advances in Mathematics for Applied Sciences-Vol. 48, World Scientific, Singapore-New York, 160174. [15] Seiji Ukai, Tong Yang and Shih-Hsien Yu, Nonlinear boundary layers of the Boltzmann equation: I, Existence. (To appear in Communications in Mathematical Physics) DEPARTMENT OF A P P L I E D MATHEMATICS, YOKOHAMA NATIONAL UNIVERSITY, YOKOHAMA, JAPAN
E-mail address: u k a i Q m a t h l a b . s c i . y n c . a c . j p D E P A R T M E N T O F MATHEMATICS, C I T Y UNIVERSITY OF H O N G K O N G , K O W L O O N , H O N G K O N G
E-mail address: matyangQcityu.edu.hk DEPARTMENT OF MATHEMATICS, C I T Y UNIVERSITY OF H O N G K O N G , K O W L O O N , H O N G K O N G
E-mail address: mashyuQcityu.edu.hk
A NEW ALGORITHM FOR DIVISION OF POLYNOMIALS LIANGHUO FAN Nanyang Technological
University, I Nanyang Walk, Singapore 637616, E-mail: [email protected]
Singapore
Division of polynomials has fundamental importance in algorithmic algebra, and is commonly encountered in many areas of mathematics as well as in scientific and engineering applications. The existing classical algorithm for polynomial division fails to provide an explicit way of determining the coefficients of the quotient and the remainder. In this paper, I present a new general theorem about division of polynomials, which provides a new and explicit algorithm for division of any two polynomials. A method of expressing a polynomial in polynomials of lower degrees is also obtained, as a corollary of the algorithm.
1 Introduction n
m
Given two polynomials f(x) = ^jajx'
and g(x) = ^bjXJ , where a,
(z = 0,1, 2,...,n) and b} (j = 0,\,2,...,m) are complex numbers and both an and fcmare nonzero, and for convenience we assume n>m, we can easily add, subtract, and multiply the polynomials, namely, n
f(x)±g(x)
= ^ ( a t ±bt)x',
where b} = 0 when m + l<j
and
1=0
n+m
f(x)g(x) = ^ ( y^aibj)xk
. As we know, faster algorithms also exist for
the multiplication of polynomials [1]. However, the division of polynomials, which has fundamental importance in computational algebra and is frequently encountered in many areas of mathematics as well as in scientific and engineering applications, is much more complicated. A conventional algorithm for division of polynomials can be seen in the typically-used proof for the so-called "Division Theorem", which says for any two polynomials f(x) and g(x), as shown earlier, there exist unique polynomials q(x) and r(x) so that f(x) = q(x)g(x) + r(x) and deg r(x) < deg q(x) The algorithm goes as follows: when m>n, clearly q{x)= 0, g(x) = r(x); when m < n, to obtain g(x) and r(x), one can establish a sequence of polynomials f^x), f2(x), ... using the following method:
717
Let/,(x) = f(x)-g(x)*c0xn
m
, where Co is the ratio of the leading
coefficient of/(x) to that of g(x), namely —-. We have deg/(x)>
K deg/;(x). If deg/|(x) =rt,<m, then q(x) = c0x"'m and r(x) = fi(x). Otherwise, we continue to let f2(x) = f](x)-g(x)*c]x"''m, where c\ is the ratio of the leading coefficient of fx(x) to that of g(x), and again we have deg/(x)>deg/ 2 (x). If deg/ 2 (x) = n2<m, then q(x) = c0x"~m + c,x"'"m and r(x) = f2(x). Otherwise, follow the above process to get f3(x), f4(x), ... until fk(x) when degfk(x)<m. Then, q(x) = c0x"~m +cxx"'~m + ... + ck_lx"t^m and r(x)=fk(x). This classical algorithm, found in many relevant texts in mathematical form [2,3] or in pseudo-code form [4], provides a method for computing the quotient and remainder of polynomial division. However, it does not give explicit algebraic expressions for determining the coefficients of the quotient and the remainder to be produced. In 1990, Godbole presented another algorithm for polynomial division by solving a system of algebraic equations involving the coefficients of the dividend, divisor, and quotient, but it can only be used to find the quotient when the remainder is zero, inapplicable when it is nonzero [5]. Below in this paper I present a new and explicit algorithm for computing the quotient and remainder of the division of two polynomials, which is based on a new general theorem about polynomial division. 2 A New Algorithm In polynomial division, if the divisor has a higher degree than the dividend, then obviously the quotient is zero and the remainder is the dividend itself, therefore here I only consider the situation when the degree of the divisor is equal to or lower than that of the dividend. n
Theorem: For any two polynomials/^) = ^ajx'
m
and g(x) = ^bjXj
where at (i = 0,1, 2,...,ri) and bj (j = 0,\,2,...,m) are complex numbers,
,
719
m
Hn-m-i TO
y_i
m
m
J—1
m
\,2,---,m,
-, i = m + l,m +
2,---,n-m,
and r
m-k = am-k ~ Z
(b) When m> — , 2 a„
i = 0,
bm ' Hn-m-i
i n—i tn
\
v1, < 7 „ --m
m-j
-i+j
K
,
i=
l,2,---,n-m,
tn
and a
m -*~Z^A-*-/> i=0 m-k
k=
a
m-*~Z^-*-"
k = 2m-n + l,2m-n + 2,-,m.
l,2,-~,2m-n,
To prove the above theorem, one can use a generalized synthetic division established in Fan [6], which is essentially an easier way of doing long division, but not limited to the situation where the divisor is of the form x-c. The following example explains how the generalized method can be implemented using the so-called "the synthetic array of numbers" when /(JC) = 2x5 - 5x4 + x2 + 2x - 7 and g(x) = x3 - 3x2 + 2x - 3. Notice
720
how the coefficients of fix) and g(x) are used in the synthetic array and how the arrows are placed to indicate the connection of the numbers. The dividend ,2x5 -5x 4 +,0x3 +x 2 + 2 x - 7 v n °
V
2
r -3 X
.£ + -ON
H
/ 5 /0 1\
I
-2 3 Tjy7'
v / T / 1 -1 \ \ / 2 x22 + x - l (The quotient)
t 2
7
-10
\
, * ' 2x 2 +7x-10 (The remainder)
A more detailed explanation about the generalization of synthetic division and the mathematical deduction of the theorem can be found in Fan [6]. To verify the validity of this method provided in the theorem, one can use concrete examples, for instance, let /(x) = 4 x 4 + x 2 + 3 x - 5 , g(x) = 2x2 - x + 3, by applying the method we get q(x) = 2x2 +x-2, and r(x) = -2x +1. Obviously, f{x) = q(x)g(x) + r(x). It is also easy to see when g(x) = x-c, the algorithm turns to be the classical synthetic division. From the algorithm, a method of expressing a polynomial in polynomials of lower degrees, as shown in the following corollary, can be obtained. The proof is immediate. Corollary: For two polynomials f(x) of degree n and g(x) of degree m, and m
n m
, anddeg qxix)<m,X -\,2,..k
.
721
Notice when g(x) = x, the first expression is Horner's rule for evaluating a polynomial. 3
Discussion
The theorem presented in this paper provides a new and explicit way of determining the coefficients of the quotient and remainder of the division of any two general polynomials. Clearly, this algorithm can be used in practical computing as well as in software design. Moreover, because the algorithm reveals direct and explicit algebraic relations between the coefficients of the dividend and the divisor and those of the quotient and remainder, it can help us to more clearly analyze the properties of the quotient and the remainder in terms of the properties of coefficients of the dividend and the divisor. For example, it is obvious from the algorithm that if the dividend and the divisor are over the ring of complex numbers C, then the quotient and the remainder are also over C, but the same kind of property does not hold for division of polynomials over the ring of integers Z. Further application of the algorithm in theoretical analysis in relevant areas of mathematics remains to see. Nevertheless, it should be pointed out that, as one can see without much difficulty, the new algorithm and the existing classical algorithm for polynomial division have essentially the same efficiency in computing in terms of the times of mathematical operations they need to execute to arrive at the final results. References 1. Zippel, R., Effective polynomial computation (Kluwer Academic Publishers, Boston, 1993) pp. 113-120. 2. Merris, R., Introduction to computer mathematics (Computer Science Press, Rockville, Maryland, 1985) pp. 230-234. 3. Akritas, A. G., Elements of computer algebra with applications (John Wiley & Sons, New York, 1989) pp. 102-105. 4. Chapra, S. C. and Canale, R., P., Numerical methods for engineers: with software and programming (McGraw-Hill, Boston, 2002) pp. 163166. 5. Godbole, P. B., Algorithms for multiplication and division of two polynomials, Adv. Eng. Software 12 (1990) pp.133-138. 6. Fan, L., A generalization of synthetic division and a general theorem of division of polynomials. Mathematical Medley 29 (2003), to appear.
GINZBURG-LANDAU SYSTEM A N D SUPERCONDUCTIVITY N E A R CRITICAL TEMPERATURE
XING-BIN PAN Department of Mathematics, National University of Singapore, Singapore 119260. E-mail: [email protected] We investigate superconductivity of a sample subjected to an applied magnetic field and slightly below the critical temperature Tc, and introduce recent results on the estimate of the critical field He •
In Ginzburg-Landau theory, superconductivity is described by a complex-valued function ip (order parameter) and a real-valued vector field A (magnetic potential), 1 and (ip, A) is a minimizer of the Ginzburg-Landau energy functional. Under a proper scale, t h e energy functional can be written as / {|VV> - iAij)\2 + ^ ( l - \i>\2?}dx Jn *
+ — f |curl A - Happi\2dx, M Jn3
where H a p P i is t h e applied field; K is the Ginzburg-Landau parameter here A is t h e penetration depth and £ is the coherence length; 1
(1)
: K = A/£,
4ma2l2(Tc - T) h2Tc c
» - ? -
here T is t h e t e m p e r a t u r e , Tc is t h e critical t e m p e r a t u r e in zero field, % is t h e Planck's constant, I is a typical scale for t h e sample, m is the electron mass, and a is a material constant which is independent of t e m p e r a t u r e . Note t h a t n = \^/]i. In this paper, il is a bounded, smooth and simply-connected domain in H3. Our interest is the superconductivity under applied magnetic fields, with t e m p e r a t u r e T slightly below t h e critical t e m p e r a t u r e Tc (hence fj, is small). Let us consider an applied field of the form H a ppi = crh, where h is a unit vector, a n d a > 0 is a parameter. Letting A = aA, the associated energy can be written as g\ip,A\=
[ {\Vij;-iaAij\2
+ ^(l-\TP\2)2}dx+'^-
[
|curl A - h | 2 d x . (2)
Let F h be a smooth vector field such t h a t curlFh = h ,
div F h = 0
in II3.
We may choose F h such t h a t / n F^dx = 0. Let W 1 , 2 ( f i , C ) be t h e Sobolev space of all complex-valued functions defined on fl, and let D l l 2 ( 7 i 3 , d i v ) = { A : | A | € Lj, c (W 3 ), | V A | € L2(K3),
div A = 0 in
W(fl) = {ty, A ) : V G W 1 , 2 ( f i , C ) , A - F h e D ^ ^ d i v ) } .
722
II3},
723
It is easy to show that the (global) minimizers of the functional Q on W(O) exist, and they are weak solutions of the following Ginzburg-Landau system • V
^
= M(1-M2)V'
in",
2
in
curl A = ^3{ViV C T AV>}xn (V*AII>)-V
= 0
on SO,
^3-
(3)
A-FhGD^^.div),
where xn is the characteristic function of O, namely, xn = 1 on 0 and = 0 in R \ 0 ; v is the unit outer normal vector of 9 0 . It is well-known that, when the applied field is strong, (0, Fh) is the only minimizes namely, the sample is in the normal state. Since we are interested in the existence of non-trivial minimizers, we define a critical field by Hc(h, /j,, K) = inf{er > 0 : (0, Fh) is a global minimizer}. The estimate of the value of Hc(h,p,,n) for a superconductor with small \i was given recently 3 . It involves two numbers: w(h) = / j V w h - F h | 2 cte,
A(h) = A / ^ P llcurl U h | |
L W
,
(4)
where -f„(fidx = mi Jn
—— = Fh • v Qv
curl 2 U h = (Vw h - F h ) x n
in TZ3,
on 5 0 ,
/ Wh dx = 0, Jn
U h 6 D 1 , 2 (ft 3 , div),
f Vhdx = 0. Jn
T h e o r e m 1. For any unit vector h and n > X(h)y/Ji we have, for small /i, Hc(h,K,n)
= JJ^+o{y/ji).
(5)
The asymptotic behavior of the minimizers for small /i depends on the scale of K. 3 To describe the result, we need some notations. Given a unit vector h, and positive constants A and p, we consider the equations Aw" = 0 in O, A 2 curl 2 A'' = p{VwP - AP)xn in II3, 2£-=A<>-v on dQ, AP - F h € D 1 - 2 ( ^ 3 , div).
(6)
There exists exactly one solution (wp, Ap) of this equation in the set y = {(w,A)
: weW1'2^),
A-FheD1'2(7e3,div),
/ wdx = 0, f Adx Jn Jn
0}.
724
vp = \imt-+o{wp+t - wP)/t and B? = \imt-,o{AP+t Avp = 0 \2cm\2BP ^-=BP-v
= {{VwP-AP) on Oil,
- A")/t
exist, and satisfy
+ p{VvP-BP)}Xn BP € D 1 ' 2 ^ 3 , div).
in O, in
ft3,
Note that, when p = 0, {w°,A°) = (ro h ,F h ). If A > A(h), for 0 < a < there exists a unique positive number p — p(a) such that
a2 I \Vwp
Ap\2dx +
(7)
l/y/w(h),
p=l.
J a
Write A a = A'<°>, B a = B", c a = y/pjaj, define ua to be the unique solution of
va = v', wa = w*W. Then we
Aua = \Aa\2 - 2A a • Vwa - 4,(1 - |c a | 2 ) in (I, dUg -waAa • v on 9 0 , / n uadx = 0,
(8)
and set ba =
+ waAa • (Vva - Ba)dx 2a2\ca\2-J* 1 - 2a 2 / (Vw„ - A a ) • Badx
Let -0h be the unique solution of the following equation : AV'h = 2 F h • Vw h - | F h | 2 + w(h) in ofi, ^ + whFh • v = 0 on dil, JQ iphdx = 0.
(9)
T h e o r e m 2. Consider the applied field H a p p i = a^/ph, where h is a unit vector, and a is a fixed number, 0 < a < l/y/u(h). Let (i/v> AM) be the minimizer of the functional Q given in (2). (i) If K = Xy/p with A > A(h) being fixed, then we have, as p —> 0, Vv = CM t 1 + ia\fpwa + a2p(ua + ibava) + o(p)], AM = A Q + a&aBaA/7I + o(y/p), |c M | 2 = |c a | 2 + abap + o(p). (ii) If K > 0 is fixed, then we have, as p —• 0,
h
0(p3/2)],
+ 0( M 3 / 2 )
|CM| = v ^ M h ) + 0 ( v ^ ) Conclusion (ii) describes the behavior of a sample of size much smaller than the penetration depth, and subjected to the applied field below Hc(h, p, K). When
725
the temperature increases to Tc, the applied field penetrates the sample almost completely, however, superconductivity may persist. Conclusion (ii) also implies that, near the critical temperature Tc, type I behavior may be observed in a type II superconductor at certain scale. The minimizers of the Ginzburg-Landau functional exhibit various phenomena for parameters of different scales. If we choose the penetration length A as the length unit, we may take A = 1 and \i = K2. In a rescaled domain (also denoted by Q) we can rewrite the functional (1) in the following form : /{|V^-i^|
2
+ ^(l-|V>|2)2}d*+ /
|curM-74p P i| 2 cto.
(10)
In recent years many authors have used the functional (10) to study the behavior of superconductors of large value of K when the applied fields are close to the upper critical field Hc3 • For a superconducting cylinder with infinite height and constant cross section f2o, for large K we have Hca(it) = •£- + - ^ K m a x +
A)
0(K-1/3),
2
pi'
where /?o is the lowest eigenvalue of the Schrodinger operator with a unit magnetic field on the half plane and 0.5 < /?o < 0.76, Kmax is the maximum value of the curvature of the boundary of £IQ, and C\ > 0 is a universal constant; as the applied field decreases from Hc3, superconductivity nucleates at the maximum points of the curvature (see Lu-Pan 4 and Helffer-Pan 5 ). As the applied field further decreases but is still above Hc2, a thin superconducting sheath forms on the entire boundary, and gradually develops a surface superconducting state 6 . Comparing these results with Theorems 1 and 2 we see that, the behavior of the minimizers for small /x are quite different to those with large K. It is interesting to explore a liquid crystal phase which is an analogy of the surface superconducting state. Some investigations have been carried out 7 . Acknowledgments This work was partially supported by the National University of Singapore Academic Research Grant No. R-146-000-033-112. References 1. 2. 3. 4. 5.
V. Ginzburg and L. Landau, Zh. Eksper. Teoret. Fiz. 20, 1064 (1950). D. Saint-James and P.G. De Gennes, Physics Letters 6, 306 (1963). X. B. Pan, Superconductivity near critical temperature, submitted. K. Lu and X. B. Pan, Physica D 127, 73 (1999). B. Helffer and X. B. Pan, Upper critical field and location of surface nucleation of superconductivity, Ann. Inst. H. Poincare Analyse Non Lineaire, to appear. 6. X. B. Pan, Comm. Math. Phys. 228, 327 (2002). 7. X. B. Pan, Landau-de Gennes model of liquid crystals and critical wave number, submitted.
G E O D E S I C A P P R O X I M A T I O N S OF 2 D H Y D R O D Y N A M I C S WAYNE LAWTON Centre for Industrial Mathematics, Department of Faculty of Science, 2 Science Drive 2, Singapore E-mail: matwmWnus.edu.sg
Mathematics 117543
Euler (1765) derived equations that describe the inertial motions of rigid bodies and inviscid incompressible flows and these were later shown to be described by geodesic flows with respect to Riemannian metrics induced by inertial operators on Lie groups that parameterize the physical configurations. Arnold generalized Euler's equations to general Lie groups and Fairlie, Zachos and Zeitlin developed metrics on SU(N) with N odd whose geodesic flows approximate 2D periodic inviscid incompressible flows and preserve analogues of powers of vorticity. We have developed an accuracte symplectic integrator and used it to study scaling behaivor and turbulence. This talk describes our computational results.
1
Classical and Q u a n t u m Descriptions
Classically, incompressible fluid flow in a domain D is described by a trajectory g : R -¥ SDiff(D) in the infinite dimensional Lie group of volume preserving diffeomorphisms of D. The velocity in space u : D x R -> R 3 defined by u := jft o g-1 clearly satisfies div u = 0 and u • n = 0 where n is normal to dD. For inertial flow of an inviscid fluid, Euler 6 showed that dt u + V u u = —grad p and Moreau 16 showed that g is a geodesic with respect to the right-invariant Riemannian metric defined by (u,u) = JDu-u. Arnold * used this geometric description to relate flow sensitivity to negative curvature and Ebin 5 , Marsden 15 , and Shkoller 17 used it to derive existence, uniqueness, and regularity results for Euler's and Navier-Stoke's equations. For 2D inertial flow u — JV«/>, where J is rotation by ^ and if> is the stream function. The vorticity U E V X U determines if> — —A _1 w and satisfies the equation % = u • Vu — dXlipdX2u — dX2ipdXlbj = [tf>,u] (Poisson bracket) hence the flow preserves the infinite number of Casimirs Ip = JD UJP, p > 1. Quantum formalism provides a powerful, new description of 2D hydrodynamics that is reflected in the expanding literature 3,4,7,8,10,11,19,20 L e t D _ tf/2irZ2 denote the two-dimensional torus. The Lie algebra of SDiff(D) consists of divergence-free vector fields on D with the commutator product or, equivalently, of C°° real-valued stream functions on D with the Poission bracket product. Its complexification, described with the exponential basis Lm{x) = exp{rn-x), m € Z2, m ^ (0,0), 1 6 D , by [Lm, L n ] = (mxn) Lm+n and the Laplacian A Lm = m • m on D admit a continuous family of deformations [Lm,Ln] — K_1 sin(/cm x n), AKLm = K~ 2 sin2(iim) where re > 0 and they can be recovered in the limit re -» 0. Furthermore, the deformed algebra is the algebra of derivations of the C* algebra (noncommutative torus) generated by operators A and B that satisfy AB — eBA where e = etK and AK is the Laplace operator on this noncommutative torus! If re = ^ where N is an odd integer then e is an N-th root of unity, the operators A and B can be realized
726
727 /O 1 0 ...^ by the Weyl matrices
18
(I
0 ••• 0 e
A =
0
0
\
•••
B = 0
0 - 1
\ i o ... o /
\0-.
0 e"-1/
and the mapping Lm -> emim^/'2AmiBm:2 provides a finite dimensional approximation by sl(N,C) which can be identified with complex-valued functions on the discrete torus Z'^ that sum to zero. Since su(N) is the real form of sl(N,C) and AK approximates A, this yields approximations to 2D incompressible inviscid 2-K—periodic flow by geodesic trajectories g in SU(N) with respect to the rightinvariant Riemannian metric defined by (ip, ip) = fD —ipAip. Since u satisfies u(t)=gu(0)g-\
^g-1
= -AKu,
(1)
the N — 1 Casimirs trace up, p = 2 , . . . ,N are preserved. A simple computation shows that in the limit K -> 0, these converge to the continuous Casimirs Ip. 2
Theoretical and Computational R e s u l t s
Hydrodynamic turbulence is related to the breaking of symmetry 9 , a property that is described in the general context of geodesic flows by the following result: T h e o r e m 2.1 Let G be a Lie group with algebra a self-adjoint, positive definite inertial operator, mannian metric from (u,v) = (Au)(v), and define automorphisms of Q that are (-,) — isometries. If Q\ = {x G Q | s(x) = x, s 6 Si} then
G, linear dual Q*, A : Q —» Q* give G the right-invariant Riethe symmetry group S(G,A) of Si is a subgroup of S(G,A) and
1. Q\ is a Lie subalgebra of Q, 2. if g is a geodesic in G and the velocity in space u = ^f <7_1 satisfies w(0) 6 Q\, then for all t, u(t) 6 Q\ and g(t) G Gig(0) where G\ is the Lie subgroup of G associated to Q\. P r o o f The first assertion follows since if s £ S and x,y G Gi then s([z,2/]) = [s(x),s(y)] = [x,y]. The trajectory g is a geodesic if and only if u satisfies Arnold's 2 generalized Euler equation A^ = Ad*u(Au) or, equivalently, if for all v e G, -^(u(t),v) = (u(t),[u(t),v]). If s G S then (s(u),s(v)) = (u,v) and (s(u),[s(u),s(v)]) = (s(u),s([u,v])) = (u, [u,v]) and since s(G) = G, s(u) also solves the generalized Euler equation. If u(0) G G\ then s(u(0)) = u(0) and the first part of the second assertion follows from the uniqueness property of the solutions of initial value problems while the third part follows from the first. We developed a two-step implicit symplectic integrator, based on Eqs. 1 and implemented in MATLAB, and used it to numerically compute vorticity trajectories associated to geodesies in SU(N). One multipole trajectory yielded the following
728 vorticities, represented in R(Z%) by 5 x 5 matrices, at times t = 0,15, 21 / .5 - . 5 - . 5 .5 0 0 0 0 0 \ 0 0
0 0 0\ 00 0 00 0 , 0 0 00 0 /
/ .4913 -.4913 .0118 0 \-.0118
- . 4 9 1 3 .0354 0 - . 0 3 5 4 \ / .5 - . 5 0 0 0 \ .4913 -.0354 0 .0354 - . 5 .5 0 0 0 - . 0 1 1 8 - . 0 8 5 0 0 .0850 ,« 0 0 000 0 0 0 0 0 0 0 0 0 .0118 .0850 0 - . 0 8 5 0 / \ 0 0 00 0 /
The symmetry group S(SU(5), A2») is generated by the translations, rotation by | , and reflection in lines composed with multiplication by —1. The initial vorticity is invariant under two reflection symmetries and the invariant Lie subalgebra has dimension 4. These matrices together with Fig. 1, that illustrates the distance from the initial vorticity and the distance from the subalgebra, show that the symmetry is approximately preserved for some time and, furthermore, suggest that the flow is integrable. The exponential divergence of the distance from the subalgebra is a consequence of numerical roundoff error combined with negative curvature as predicted by Jacobi's equation. 3
Future Studies
Chaos and turbulence are universal phenomena that, unfortunately, characterize social conditions, e.g. epidemics and terrorism, as much as they characterize hydrodynamics, and it is urgent to understand and control them. We intend to study the role of symmetry and curvature on integrability, chaos, turbulence and scaling properties. We have derived results 1 2 , 1 3 that suggest that a single geodesic on SU(N) generically determines the inertia operator up to a scalar multiple and we believe that this property will play a key role. We have also derived preliminary results that suggest wavelet bases may be useful 14 and intend to develop bases related to noncommutative geometry. Iyer and Rajeev n have used the SU(N) approximation to derive a new scaling model for 2D turbulence and plan to use noncommutative geometry to attack problems in 3D hydrodynamics. We remark that the existence/uniqueness problem for the 3D Navier-Stoke's equations carry a US1 million Clay Prize Award as does the problem of resolving the Riemann Conjecture to which Alain Connes has outlined an approach based on eigenvalues of the Laplace operator on noncommutative geometries! Acknowledgments THIS RESEARCH IS SUPPORTED BY THE GRANT FROM IHPC-CIM RESEARCH PROJECT: R-146-000-036-592. References 1. V. Arnold, Ann. Inst. Fourier Grenoble, 16, 319 (1966). 2. V. Arnold, Mathematical Methods of Classical Mechanics,(Springer,NY,1978). 3. V. Arnold and B. Khesin, Topological Methods in Hydrodynamics, (Springer, NY, 1998).
729
dist. from start
50
log dist from subalgebra
100
-70
0
50
100
Figure 1. Evolution of Multipole Vorticity
4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20.
J. Dowker and A. Wolski, Phys. Rev. A, 46, 6417 (1992). D. Ebin and J. Marsden, Annals of Mathematics, 92, 102 (1970). L. Euler, Memoirs de I'Academie des Sciences Berlin, 1765. B. Fairlie and C. Zachos, Phys. Lett. B, 218, 203 (1989). B. Fairlie, P. Fletcher and C. Zachos, J. Math. Phys., 3 1 , 1088 (1990). U. Frisch, Turbulence: the legacy of A. N. Kolmogorov, (CUP,NY,1995). J. Hoppe, Int. J. Mod. Phys. A, 4, 5235 (1989). S. Iyer and S. Rajeev, Modern Physics Letters A, 2, 1 (2002). W. Lawton and L. Noakes, J. Math. Physics, 42(4), 1 (2001). W. Lawton, ScienceAsia, 28, 61 (2002). W. Lawton, Proc. Int. Conf. Optimization of finite-element approx., wavelets, and splines, Saint-Petersburg, Russia, June 2001. J. Marsden, T. Ratiu and S. ShkoUer, Geometry and Functional Analysis, 10, 582 (2000). J. Moreau, Acad. Sci. Paris, 249, 2156 (1959). S. ShkoUer, Applied Mathematics Letters, 14, 539 (2001). H. Weyl, Gruppentheorie und Quantenmechanik, Zurich, 1928. V. Zeitlin, Physica D, 49, 353 (1991). V. Zeitlin, J. Phys. A, 25, L171 (1992).
MULTI-PHASE FLOW MODELS A N D METHODS FOR LAVA L A M P S A N D LIFE S C I E N C E S JIA SHUO Centre for Industrial Mathematics, Department of Mathematics Faculty of Science, 2 Science Drive 2, Singapore 117543 E-mail: [email protected] Multi-phaseflows,that involve immisciblefluids,arise in many natural and industrial processes and their accurate and efficient simulation will play an increasingly vital role in the life sciences. Surface tension, characteristic of these flows, creates a discontinuous pressure drop across each surface separating two liquid phases that is proportional to the surface's mean curvature. The primary computational challenge is to solve for the pressure whose laplacian has a singular term consisting of a distribution supported on the separating surfaces. In this paper we discuss models and methods for multiphaseflowsthat include simplifying assumptions, boundary integral representations, and wavelet-based discretization and multilevel preconditioning. We also discuss the validation of these models and methods through the simulation of lava lamps, whose simple two-phaseflowsand convenient availability provide an ideal laboratory testbed. 1
Introduction
Multi-phase flows are not only part of our natural environment like volcanic activities and air and water pollution, but also are working processes for a variety of industrial branches like conventional and nuclear power plants, combustion engines, propulsion systems, flows inside the human body, oil and gas production and transport, chemical industry, biological industry, process technology in metallurgical industry or in food production etc. The list is by far not exhaustive. For instance everything to do with phase changes is associated with multi-phase flows. The industrial use of multi-phase systems requires methods for predicting their behavior. Zemansky l defines a phase as a system or a portion of the system composed of any number of chemical constituents satisfying the requirements (a) that it is homogenous and (b) that it has a definite boundary. The phases may of course be solid, liquid, or gas. In the lava lamp model, the liquid-liquid flow (i.e. flow of two immiscible liquids) is considered. Actually, some of the principles and methods for liquid-liquid flow can be applied to other types of two-phase flow.
2
Exact M o d e l
In general the density p, coefficient of dynamic viscosity p,, coefficient of bulk (or second) viscosity A, thermal conductivity /t, specific heat at constant volume cv, specific heat c, and internal energy per unit mass E are functions of the thermodynamic state variables pressure p and temperature T. For a calorically perfect gas E = cvT and for a liquid E ~ cT. The acceleration of gravity g will be assumed to be constant. The equations governing the evolution of p, v , T within each fluid in a multiphase system are derived from conservation laws 2 , 6 , 3 . Conservation of
730
731 mass yields the equation of continuity
where - ^ = J ^ + v - V . Conservation of momentum yields the Navier-Stokes p - ^ = - V p + / i A v + ( A + ^ ) V V - v + pg. Conservation of energy yields the heat conduction
equation (2)
equation
DE u •> 2 p — = V - « V T - p V - v + ^ T r a c e ( ( V v ) + (Vv) T ) + (A - - ^ ( d i v v ) 2 . (3) The pressure drop across the surfaces that separate different phases is described by Laplace's formula 3 ) Pi-P2
=a ( ^
+ ~J+nT(a[~a'2)n,
(4)
where n is the outward normal vector to S, a is the surface-tension coefficient, Rx and Ri are the principle radii of curvature and reckoned as positive if they point in the direction of f^, and a'-, j = 1,2 are the viscous stress tensors denned by (dvi
3
dvk
2
dvt\
dve
Computational M e t h o d s
Under the assumption that the temperature variation is sufficiently slow to make a minor contribution to the inertial force, we may take the divergence of the velocity equal to zero within small time intervals. The resulting Boussinesque Approximation 6 , 4 can be regarded as an operator splitting method that vastly simplifies the computations. At each time step we simply solve the incompressible Navier-Stokes equations p—— - - V p + ^ A v + pg, V v = 0 then compute the new temperature and density. We implement the computations using the velocity-pressure formulation. To solve for the pressure we observe that the Laplacian of the pressure A p is the sum of tho components, a continuous component withing the interior of each fluid phase that is a function of the velocity, and a singular component consisting of a double layer distribution that is supported on the surfaces that separate the phases and that can be computed from Laplace's equation. Furthermore, the normal component of the pressure is computable from the Navier-Stokes equation. These facts provide the pressure as the solution of the Poisson equation with Neumann boundary conditions. For infinite domains we impose asymptotic boundary conditions that are equivalent to the gauge conditions discussed in 7 that play a critical role in the navigation of microorganisms. We note that boundary element methods also provide a means to compute the solution of these equations and the gauge conditions provide a means to regularize the resulting equations for unbounded domains 5 .
732 4
Future Research: Lava Lamps and Life Sciences
Lava Lamps illustrate the simplest example of two phase flow. They consist of an inverted lighbulb beneath a glass container filled with water and wax. As the wax, initially settled at the botton of the container, is heated by the lighbulb it expands and exudes rising columns that rise, cool, fall and exhibit captivating motions. Their simulation has elicited a growing enthusiasm and involves the use of sophisticated graphical and visualization techniques. However, the underlying models tend to be animation based and while they offer speed they do not simulate realistic physics that could be extended to analyze more complex two phase flows such as occur during oil recovery, pharmaceutical production, and biomedical processes. Our future research will experiment with various methods to simulate two phase flows and related situations such as the navigation of microorganisms using holonomy. Acknowledgments THIS RESEARCH IS SUPPORTED BY THE GRANT FROM IHPC-CIM RESEARCH PROJECT: R-146-000-036-592. References 1. M. W. Zemansky, Heat and thermodynamics, (5th ed, McGraw-Hill, New York, 1968). 2. G. K. Batchelor, An Introduction to Fluid Mechanics, (Cambridge University Press, Cambridge, 1967). 3. L. D. Landau and E. M. Lifshitz Fluid Mechanics, (Pergamon Press, New York, 1987). 4. C. Pozrikidas, Introduction to Theoretical and Computational Fluid Dynamics, (Oxford University Press, New York, 1997). 5. H. Power and L. C. Wrobel, Boundary Integral Methods in Fluid Dynamics, (Computational Mechanics Publications, Boston, 1995). 6. P. G. Drazin and W. H. Reid, Hydrodynamic Stability, (Cambridge University Press, Cambridge, 1981). 7. A. Shapere and F. Wilczek, Self propulsion at low Reynolds number, Physical Review Letters, (20), 59, 2051, 1987.
A Reynolds—uniform numerical m e t h o d for P r a n d t l ' s b o u n d a r y layer p r o b l e m for flow past a plate w i t h m a s s transfer J.S. Butler Department of Mathematics, Trinity College, Dublin, Ireland J.J.H Miller Department of Computational Science, National University of Singapore, Singapore & Department of Mathematics, Trinity College, Dublin, Ireland G.I. Shishkin Institute for Mathematics and Mechanics, Russian Academy of Sciences, Ekaterinburg, Russia In this paper we consider Prandtl's boundary layer problem for incompressible laminar flow past a plate with transfer of fluid through the surface of the plate. When the Reynolds number is large the solution of this problem has a parabolic boundary layer. We construct a direct numerical method for computing approximations to the solution of this problem using a piecewise uniform mesh appropriately fitted to the parabolic boundary layer. Using this numerical method we approximate the self-similar solution of Prandtl's problem in a finite rectangle excluding the leading edge of the plate, which is the source of an additional singularity caused by incompatibility of the problem data, for various rates of mass transfer. By means of extensive numerical experiments we verify that the constructed numerical method is Reynolds — uniform in the sense that the computed errors for the velocity components and their derivatives in the discrete maximum norm are Reynolds uniform. We use a special numerical method related to the Blasius technique to compute a reference solution for use in the error analysis.
1
Introduction
Incompressible laminar flow past a semi-infinite plate P with mass transfer in the domain D = R 2 is governed by the Navier-Stokes equations. Using Prandtl's approach the vertical momentum equation is omitted and the horizontal momentum equation is simplified, see 2 a n d 3 . For large Reynolds numbers the new momentum equation is parabolic and singularly perturbed, because the highest order derivative is multiplied by the small singular perturbation parameter e = - ^ j . It is well known that for flow problems with large Reynolds number a boundary layer arises on the surface of the plate. Also, when classical numerical methods are applied to these problems large errors occur, especially in approximations of the derivatives, which grow unboundedly as the Reynolds number increases. For this reason Reynolds - uniform numerical methods, in which the error is independent of the singular perturbation parameter, are required. Here we solve the Prandtl problem in a region including the parabolic boundary layer. Since the solution of the problem has another singularity at the leading edge of the plate, we take as the computational domain the finite rectangle f2 = (.1,1.1) x (0,1) on the upper side of the plate, sufficiently far from the leading edge (see Fig. 1) that the leading edge singularity does not cause excessive difficulties for the numerical method. We denote the boundary of fi by T = TL \JYT \JT B (JTR where FL, TT, TB and TR are, respectively, the left-hand, top, bottom and right-hand edges of fi.
733
734 The Prandtl boundary layer problem in D is: ' Find u e = (ue,ve) such that for all (x,y) £ D ue satisfies the differential equations - i ^ (Pe){
+ u E .V« E = 0
V.U e = 0 with boundary conditions ME = 0 and ve — VQ(X) on u £ = u p on TL U T T
TB
where vo(x) is the velocity normal to the plate at which mass is transferred through its surface. Negative values of vo correspond to injection, positive values to suction. We construct a numerical method for which there are error bounds for the solution components and their derivatives, such that the error constants do not depend on the value of Re or VQ- That is, the method is (Re, nonuniform. 2
Blasius solution
Using the transformation described i n 4 , (P£) can be simplified to the well-known Blasius problem, involving a non-linear ordinary differential equation, which is described in 5 . From 4 we have v0(x) = In principle /o can have values in (—00,00), but in practice —0.87 < fo < 7.07. We solve the Blasius problem numerically and then reverse the above transformation to obtain the Blasius solution of the original Prandtl problem. Here the Blasius solution will be denoted by U | 1 9 2 , since we solve the Blasius problem on a mesh with N = 8192. The purpose of finding this independent, semi-analytic solution of Prandtl's problem is to use it as a reference solution for the unknown exact solution in the error analysis of the direct method. Since the Blasius solution is known to converge Reynolds-uniformly to the solution of Prandtl's problem, we can use it to estimate guaranteed error bounds for the approximations generated by the direct method 5 . 3
Direct numerical solution
In this section we construct a robust numerical method to solve the Prandtl problem (Pe) for all admissible values of Reynolds numbers Re and /o S [-0.3,0.3]. We define the rectangular mesh on the rectangle fi to be the tensor product of two one-dimensional meshes fl^ = fi^1 x Q£y, where N=(Nx,Ny). The mesh in the x-direction is the uniform mesh ftf* = {Xi : Xi = 0.1 + iN~\
0< i <
Nx}.
735 The mesh in the y-direction is the piecewise-uniform fitted mesh
n?- = {yr-vj = *j£y,o<j<%-,vi = * +
(i-
It is important to note the position of the boundary layer in order to define an appropriate transition point a from the coarse to the fine mesh, so that there is a fine mesh in the boundary layer. T h e appropriate choice in this case is a = min{^, v^eln Ny}. The factor y/e may be motivated from a priori estimates of the derivatives of the solution u £ or from asymptotic analysis. For simplicity we take NX=NV = N. Since t h e problem (-P£N) is a nonlinear system an iterative method is required for its solution. This is obtained by replacing the system of nonlinear equations with a sequence of systems of linear equations. The systems of linearized equations are With the boundary condition U™ = Uf, 192 on r L , for each i, 1 < i < N, use the initial guess V°\xi = U £ " - 1 \xt-! and for m = 1 , . . . , Mi solve the following two point boundary value problem for U™ (xi, yj) (-eS2y + U f " 1 • D-)U?(xi>yj) = 0, l<j
(D--\J?)(xi,yj)
yj)
= Q,
with initial condition V/" = vo(xi) on FBContinue to iterate between the equations for U™ until m = Mit where Mi is such that maxfltfj* - U^-%t,^\Vt^
- K ^ l x J
< ™-
For notational simplicity, we suppress explicit mention of the iteration superscript Mi henceforth, and we write simply U e for the solution generated by (A^ 1 ). We take tol = 10~ 6 in the computations. We note that there are no known theoretical results concerning the convergence of the solutions U e of (^4^) to the solution u£ of (P e ) and no theoretical estimate for the pointwise error (U £ — ue)(xi,yj). It is for this reason that we are forced to apply controllable experimental error analysis techniques, which are adapted t o the problem under consideration and are of crucial value to our understanding of the computational problems. In what follows V* is defined t o be V* = maxjjN VB • 4
Error A n a l y s i s
In this section we estimate the Reynolds-uniform maximum pointwise errors in the approximations generated by the direct numerical method described in the previous
736 section. For brevity, we show the errors for only one typical value of the mass transfer, /o = 0.3. We compare the parameter uniform maximum pointwise errors in the approximations generated by the direct numerical method of the previous section with the corresponding values of Ufj 192 . Table 1 indicates clearly that the method is Reynolds-uniform for sfeDyU,; since the error stabilises as the Reynolds number increases. We define p^comp by \\D;U£N
N
Pe,comp
[0
&2
\\i-f-TT2N 1J U
\\ y e
and
max^-t/f Pcomp -
OS2 m a x £ \\D-IJ2N
D;u%im\\a»
-
8192 ll U r )V 7 7
~ VB
,»,
llfiiN
-£>-tfI 1 9 2 ||n* _
DyU%W*\\^'
From Table 2 we see that the order of convergence is at least 0.66 for N > 16. Fig 2 shows that the largest error in *JiDyUe occurs, as expected, within the boundary layer region. Similar results also hold for Ue, Ve and D~Ve. The leading edge singularity becomes a problem for D~Ve unless we consider only the subdomain Q.N n [0.2,1.1] x [0, l].The above results show experimentally that the order of convergence is no less than 0.66 for N > 16 and 0.8 for N > 128. 5
Conclusion
We considered Prandtl's boundary layer equations for incompressible laminar flow past a plate with suction/blowing vo. When the Reynolds number is large the solution of this problem has a parabolic boundary layer at the surface of the plate. We constructed a direct numerical method for computing approximations to the solution of this problem using a piecewise uniform fitted mesh technique appropriate to the parabolic boundary layer. We used the method to approximate the selfsimilar solution of Prandtl's problem in a finite rectangle excluding the leading edge of the plate for various values of Re and vo- To analyse the efficiency of the method we constructed and applied a special numerical method related to the Blasius technique to compute reference solutions for the error analysis of the velocity components and their derivatives. By means of extensive numerical experiments we showed that the constructed direct numerical method is (Re, vo)-uniform. References 1. P. Farrell, A Hegarty, J.J.H. Miller, E. O'Riordan, G.I. Shishkin, Robust Computational Techniques for Boundary Layers, CRC Press, (2000). 2. H. Schlichting, Boundary Layer Theory, 7th edition, McGraw Hill, (1951). 3. D.J. Acheson, Elementary Fluid Dynamics, Oxford: Clarendon, (1990). 4. D.F. Rogers, Laminar Flow Analysis, Cambridge University Press, (1992). 5. B. Gahan, J.J.H. Miller, E. O'Riordan, G.I. Shishkin, Reynolds-uniform method for Prandtl's problem with suction-blowing based on Blasius' approach, Numerical Analysis and Its Applications: NAA 2000, Rousse, Bulgaria, June 2000;(L.G. Vulkov, J.Wasniewski and P.Yalamov eds.) / Lecture Notes in Computer Science, Vol. 1988, Springer, (2001).
737
f l M l l f f M l f l v 0
F i g u r e 1: Flow p a s t a p l a t e w i t h s u c t i o n / b l o w i n g
T a b l e 1: C o m p u t e d m a x i m u m pointwise scaled error *Ji\\Dy
Ue —
DyU^92\
n?/rL
w here Ue is
g e n e r a t e d b y (A™) for various values of e, N a n d / o = 0 . 3 .
e\N -u 2-2 2-4 2-6 2-8 2-io 2
2-20 BN
8 9.50e-02 1.87e-01 3.21e-01 3.37e-01 3.37e-01 3.37e-01
16 4.78e-02 9.50e-02 1.87e-01 2.59e-01 2.59e-01 2.59e-01
32 2.41e-02 4.78e-02 9.50e-02 1.63e-01 1.63e-01 1.63e-01
64 1.23e-02 2.41e-02 4.78e-02 9.50e-02 9.89e-02 9.89e-02
128 6.32e-03 1.23e-02 2.41e-02 4.78e-02 5.78e-02 5.78e-02
256 3.35e-03 6.32e-03 1.23e-02 2.41e-02 3.33e-02 3.33e-02
512 1.86e-03 3.35e-03 6.32e-03 1.23e-02 1.89e-02 1.89e-02
3.37e-01 3.37e-01
2.59e-01 2.59e-01
1.63e-01 1.63e-01
9.89e-02 9.89e-02
5.78e-02 5.78e-02
3.33e-02 3.33e-02
1.89e-02 1.89e-02
i_r>tn ut_pmndlLD_y_u'
F i g u r e 2: G r a p h of a n d ^e(DyUE
- DyU%192)
for e = 2 " 1 2 , N = 3 2 a n d f0 = 0.3
T a b l e 2: C o m p u t e d orders of convergence p^co7np, Vwmp f ° r V^(Dy g e n e r a t e d by (A^) for various values of e, N a n d / o = 0 . 3 .
Ue — D% Ur ;B8 1 9 2 ) w here Ue is
e\N 2-° 2-2 2-4 2-6 2-8
8 0.99 0.98 0.78 0.38 0.38
16 0.99 0.99 0.98 0.66 0.66
32 0.98 0.99 0.99 0.78 0.72
64 0.96 0.98 0.99 0.99 0.77
128 0.92 0.96 0.98 0.99 0.80
256 0.84 0.92 0.96 0.98 0.82
2-20
0.38 0.38
0.66 0.66
0.72 0.72
0.77 0.77
0.80 0.80
0.82 0.82
PN
CONSTRUCTING AN OGSA-BASED GRID COMPUTING PLATFORM
WEI JIE, TIANYI ZANG AND ZHOU LEI Institute of High Performance Computing, 1 Science Park Road, Singapore E-mail: [email protected]
117528
WENTONG CAI, STEPHEN J. TURNER AND LIZHE WANG Nanyang Technological University, Nanyang Avenue, Singapore E-mail: [email protected]
639798
Grid is a promising computing platform that integrates resources from different organizations in a shared, coordinated and collaborative manner to solve large-scale science and engineering problems. Recently Grid technologies are evolving towards an Open Grid Services Architecture (OGSA). The OGSA provides a uniform service-oriented architecture and integrates Grid technologies with emerging Web services standards. We are constructing a Grid computing platform based on the OGSA. This platform provides Grid services such as directory service, scheduling services and execution management services to support the execution of various Grid applications. Each Grid service in this platform is viewed as a Web service, and all these services are seamlessly integrated together to form a Grid computing environment. This paper describes the basic requirements and initial design of the architecture of this Grid computing platform, including the essential components of the platform, the key Grid services provided, as well as how these components and services are integrated. The implementation issues of this OGSA-based Grid computing platform are also discussed.
1
Introduction
The Grids [1,2] are positioned as systems that scale up to Internet size environments with resources distributed across multiple organizations and administrative domains. In a Grid environment, resources belonging to different organizations are integrated and work in a shared, coordinated and collaborative manner to solve large scale science and engineering problems. This integration can be technically challenging because of the need to achieve various qualities of service when running on top of different native platforms. The Open Grid Services Architecture (OGSA) [3, 4] is recently presented to address these challenges. The OGSA is a formulation of Grid services as a specialized sub-set of Web services using standard Web service definitions and protocols [6]. It draws from the Globus Toolkit [5], a community-based open source collection of services and software libraries that support Grids, and Web services including SOAP, WSDL and UDDI [8]. The marrying of Globus and Web services meets the demands of an increasingly complex and distributed computing infrastructure: by providing a set of interfaces from which all Grid services are implemented, the OGSA allows for consistent resource access across multiple heterogeneous platforms with local or remote location transparency; it also allows the composition of services to form more sophisticated services without regard to how the services are implemented, and supports integrations with various underlying native computing platforms facilities. The researchers behind OGSA have discussed high-level guiding concepts as well as the lower level interfaces as its foundation. Although OGSA presents a promising framework for Grid computing, no actual OGSA-based Grid computing platform is designed and implemented. In this paper, we present an OGSA-based Grid computing platform. Section 2 describes the basic requirements and initial design of the architecture of this Grid computing platform, including the essential components of the platform, the
738
739
key Grid services provided, as well as how these components and services are integrated. Section 3 discusses the implementation issues. Finally, Section 4 concludes the paper with our future research directions. 2
Architecture
We present the following scenarios to motivate the design of our OGSA-based Grid computing platform. Suppose resources in a Virtual Organization are connected through a Private Virtual Network (PVN). Our Grid computing platform intends to support the execution of parallel programs on a pool of parallel machines managed by different HPC centers. A user submits application through Grid Portal. After the Hosting Environment receives the request, the Factory creates instances of the proper Grid services (subject to the mutual authentication of the user and the factory) in the Grid Service Pool and these services instances work collaboratively to execute the application. The Registry and Mapper in the hosting environment enable users to locate appropriate Grid service. Multiple users may connect to and submit applications through the same Grid Portal (see Figure 1).
Hosting Environment
Figure 1. The OGSA-based Grid Computing Infrastructure
The key components of this OGSA-based Grid computing platform are a set of Grid services in the form of Grid service interfaces. These services include Directory, Scheduling, Security, QoS, Data Management and Execution Management. Instances of Security, QoS, Data Management, and Execution Management services are created for each parallel machine, while instances of Scheduling and Directory services will be
740
created for each Grid Portal. The Scheduling service and the local Load Management Systems will cooperate to schedule the application. The Globus Toolkit is served as the underlying Grid middleware. We now describe in turn the key services provided in this Grid computing environment. • Directory service: the Directory service is responsible for providing both static and dynamic information for the resources in a Grid environment. Although Globus provides some core information services, it is not sufficient for a practical directory information provider. Our OGSA-based Directory service handles information representation and organization, information storage, update and access, as well as how it interacts and integrates with other Grid services. • Scheduling service: the main role of the Scheduling service is to allocate the resources to the applications to achieve high throughput of the resources and best QoS for the applications. There are two layers in our job scheduler: a super job scheduler dispatches jobs at the level of Grid to proper resources, while the local job scheduler is in charge of job running on particular resources within an organization. The Scheduling service needs the information from the Directory service. • Execution Management service: the Execution Management service aims to monitor and control the execution of an application after it is submitted to a Grid environment. This service manages the execution information of an application running in a Grid environment (e.g., information construction, execution information tracing and updating). Execution control mechanisms and strategies, for example, load balancing in a Grid environment, are provided in this service as well. The OGSA only specifies uniform Grid service interfaces and the corresponding semantics for the general Grid services such as Factory, Registry and Notification [4]. We will provide particular service interfaces for each Grid service of this OGSA-based Grid Computing platform. 3
Implementation Issue
The implementation of our OGSA-based Grid computing platform is based on open standard Web services [8]. Grid service description is defined by WSDL, communication adopted by Grid services is based on SOAP protocols, and the information exchange follows UDDI protocols (see Figure 2). All of these mechanisms are XML-based which guarantee Grid services are vendor-independent and interoperable. As a Web service, a Grid service in our platform follows the service-oriented architecture model. A Grid service first publishes its interfaces to a registry. The client then looks up, or discovers, the Grid service from the registry. Last, the client binds to the Grid service to use its services. The hosting environment stands an important role in the OGSA. It not only defines implementation programming model and language, development tools and debugging tools, but also how an implementation of a Grid service meets it obligations with respect to Grid service semantics. Sun J2EE [7] is adopted as the hosting environment in our OGSA-based computing platform. In order to communicate with Globus, the Grid services call Java CoG [9] API for Globus functionality. Java CoG kit defines and implements a set of general components that map Globus functionality into Java framework.
741 4
Conclusion and Future Work
We present a Grid computing platform based on the OGSA. Each component of this platform is viewed as a Web service which may be integrated together to provide the computing framework for the applications. Our future work will mainly focus on the design and implementation of the Grid services, in particular, Directory service, Scheduling service and Execution Management service. The functionality of these services will be developed and these services will be integrated to construct a practical Grid computing platform.
Figure 2. Grid Platform Implementation Based on Web Services
References 1. Foster I. and Kesselman C , The Grid: Blueprint for a New Computing Infrastructure (1999), Morgan Kaufmann. 2. Foster I., Kesselman C. and Tuecke S., The anatomy of the Grid: enabling scalable virtual organizations", International Journal of High Performance Computing Applications 15 (2001) pp. 200-222. 3. Foster I., Kesselman C , Nick J. and Tuecke S., The physiology of the Grid: an Open Grid Services Architecture for distributed systems integration (2002), http://www.globus.org/research/papers/ogsa.pdf. 4. Tuecke S., Czajkowski K., Foster I., Frey J. et al, Grid service specification (2002), http://www.globus.org/ogsa. 5. Foster I. and Kesselman C , Globus: a metacomputing infrastructure toolkit, International Journal of Supercomputer Applications 11 (1997) pp. 115-128. 6. Tony Hey, Unlocking the power of the Grid, IEE Review (2002) pp. 9-12. 7. Information about J2EE is available at http://java.sun.com/j2ee. 8. Curbera F., Duftler M. and Khalaf R., Unraveling the Web services Web - an introduction to SOAP, WSDL and UDDI, IEEE Internet Computing (2002) pp. 8693. 9. Von Laszewski G., Foster I., Gawor J., Smith W. and Tuecke S., CoG Kits: a bridge between commodity distributed computing and high-performance Grids (2000), in the Proceedings of ACM 2000 Java Grande Conference.
AN OGSA-BASED DIRECTORY SERVICE ZHOU LEI, TIANYI ZANG AND WEI JIE Institute of High Performance Computing, 1 Science Park Road, Singapore E-mail:[email protected]
117528
Built on the concepts and technologies of the Grid and Web services communities, Open Grid Services Architecture (OGSA) is viewed as the most promising point in Grid Computing. Our work focuses on OGSA-based directory service research which is the groundwork of Grid computing. This paper describes basic requirement and functionality of a Grid directory service. The architecture of our OGSA-based directory service is discussed. The implementation issues of this directory services are also addressed.
1
Introduction
Grid computing is becoming the mainstream in high performance computing research, which leverages wide-spread sharing and coordinated use of computing resources. The Open Grid Services Architecture [4] was presented to address the challenges in dynamic, heterogeneous, and geographical Grid environment. It is a formulation of Grid services as a specialized sub-set of Web services using standard Web service definitions and protocols [1]. By providing a set of operations from which all Grid services are implemented, OGSA allows consistent resource access across multiple heterogeneous platforms with local or remote location transparency. It also allows the composition of services to form more sophisticated services regardless of how the services are implemented, and supports integration with various underlying native computing platforms facilities. Directory service plays an important role in a computational Grid middleware, providing fundamental mechanism for discovery and monitoring, and thus for planning and adapting behavior [3]. Core Grid services, such as super-scheduler, execution service, and performance diagnosis, highly depend on directory information sources. In this paper, we show our contribution on Grid directory service based on OGSA and Globus which is a de facto Grid computing standard. The requirement and functionality of Grid directory service are depicted briefly in Section 2. The architecture of our OGSA-based Grid directory service architecture is described in Section 3. In Section 4, the implementation issues are addressed. Finally, Section 5 concludes the paper and further work is discussed.
2
Requirement and Functionality
As Grid information is dynamic, heterogeneous and geographically distributed, a Grid directory service should have the following basic properties [6]: • Access to static and dynamic information, • Providing a generic description/discovery/monitoring mechanism for heterogeneous system resources and services, • Integration of wide-area information, • Support for the other core Grid services, such as resource allocation service, execution service, and super-scheduler.
742
743
The basic functionality provided by Grid directory services includes: Information Monitoring: monitor static and dynamic information in a Grid platform, such as real-time status of dynamic resources. • Information Discovery: discover system resources and all kinds of Grid services. • Service API: provide service API for high level services or applications. A Grid user or other Grid services can access these functions through the serviceoriented model. •
3
Architecture
In our directory service, there are two critical components: Basic Information Service Component (BISCO) and Aggregate Information Service Component (AISCO). BISCO is a Web service which communicates with Globus MDS to obtain various kinds of system information in a Grid environment. The information is then converted into XML format. The interaction between BISCO and Globus can be any language supported by Globus, such as C and Java. The relationship is depicted in Figure 1.
3
XML Schema
Globus API
Globus MDS
BISCO Figure 1: BISCO-Globus Interaction Model AISCO is a higher level Web service which extracts XML format system information from BISCO components. Meanwhile, AISCO also responds diverse queries from Information Service users or other services. As Web services, BISCO and AISCO follow the service-oriented architecture model shown in Figure 2. In this model, BISCO first publishes its interface to a registry. AISCO then looks up, or discovers, BISCO from the registry. Finally, AISCO binds to the Web service in order to use its services.
Figure 2: BISCO-AISCO Interaction Model
744
A directory service user accesses Grid information by a single entry point which is called Endpoint. Endpoint invokes security service to obtain authority for the user, then, directory service, which is usually a high level AISCO, and begins to deal with various user queries. 4
Implementation Issues
Considering Globus is the de facto standard for Grid computing, we make fully use of the facilities Globus package provided as underlying support. For instance, BISCO catches system information from Globus MDS and the directory service checks security from Globus API. The definition of the directory services follows the two rules: Firstly, separate XML namespaces are adopted for the five basic parts of WSDL definition, i.e. types, messages, portTypes, bindings, and service descriptions. Splitting up the definitions across namespaces allows us to expand easily any higher level WSDL definition with WSDL import construct. It brings reusability and flexibility for our further work. Secondly, service description follows the OGSA specification [7] presented by the Globus group and IBM as they have done a lot of fundamental work for OGSA research. This improves interoperability and scalability among the Grid services of our Science and Engineering Research Grid (SER-Grid) platform. A Web service is instantiated within a special execution environment, which is called hosting environment. The most important thing for Web service development is to determine which hosting environment should be used. We choose J2EE container as the execution environment of the directory services. J2EE defines not only programming language and development tools, but also how to communicate with Globus. Java is the implementation language and Java CoG, provided by Globus project, is the interface with Globus.
Endpoint Tier
!
J2EE Container
I
Figure 3: The implementation of Directory Services based on J2EE
Globus
745
The implementation of the directory service based on J2EE is shown in Figure 3. Each Grid service is a multi-tier system: Client tier, Endpoint tier, J2EE Container, and Globus tier. JAX-RPC servlet endpoint dispatches all of user requests to J2EE container tier, which is made up of J2EE EJB to accomplish the business logic of services. Java CoG API is the glue to combine J2EE container with Globus tier. 5
Conclusion and Future Work
We presented a directory service based on the OGSA, which is viewed as a Web service integrated together with Globus, the de facto Grid computing standard. Although its design is vendor-independent, we implement it by pure Java language with J2EE hosting environment. All the business logic is described in the information Enterprise Java Beans, which are hosted in J2EE container. Information EJB collects system information by Globus Cog API. This Directory service serves as a fundamental component in our Science and Engineering Research Grid (SER-Grid) platform. In order to achieve a highly efficient and easy-to-use Grid computing environment, there is a lot of work under way. The further system refining, such as the definition of XML vertical schemas and WSDL, is ongoing to improve reusability. A user-friendly interface of the directory service is one of our concerns. Also, we are developing OGSAbased Scheduler Service and Execution Management Service. References 1. Christensen E., Curbera F., Meredith G., and Weerawarana S., Web Services description language (WSDL) 1.1. Technical report, W3C. 2001. 2. Curbera F„ Duftler M. and Khalaf R., Unraveling the Web Services Web - An Introduction to SOAP, WSDL and UDDI, IEEE Internet Computing, 86-93, March 2002. 3. Czajkowski K., Fitzgerald S., Foster, I., and Kesselman, C , Grid information services for distributed resource sharing. In 10th IEEE International Symposium on High Performance Distributed Computing, pages 181-184, San Fransisco, CA, August 7-9, 2001. IEEE Press. 4. Foster I., Kesselman C , Nick J., and Tuecke S., The physiology of the grid: An open grid services architecture for distributed system integration. Technical report, Globus Project, 2002. 5. Foster I., Kesselman C , and Tuecke S. The Anatomy of the Grid: Enabling Scalable Virtual Organization. International Journal of High Performance Computing Applications, 15(3). 200-222. 2001. 6. Laszewski G. V., Foster I., Gawor J., Schreiber A., Pena C. J. InfoGram: A Grid Service that Supports Both Information Queries and Job Execution. High Performance Distributed Computing (HPDC-1I), Edinburgh, Scotland, July 24-26, 2002. 7. Tuecke S., Czajkowski K., Foster I., Frey J., Graham S., and Kesselman C. Gird Services specification. Technical report, Globus Project, 2002.
GRID RESOURCE MANAGEMENT INFORMATION SERVICES FOR SCIENTIFIC COMPUTING H. N. LIM CHOI KEUNG, D. P. SPOONER, S. A. JARVIS AND G. R. NUDD University of Warwick, Coventry CV4 7AL, England, U.K Email: [email protected] LIZHE WANG Nanyang Technological University, Nanyang Avenue, Singapore Email: [email protected]
639798
WEIJIE Institute of High Performance Computing, 1 Science Park Road, Singapore Email: [email protected]
117528
Scientific applications typically have considerable memory and processor requirements. Nevertheless, even with today's fastest supercomputers, the power of available resources falls short of the demands of large, complex simulations and analysis. The solution fortunately lies in the emergence of innovative resource environments such as Grid computing. This paper addresses issues concerning the discovery and monitoring of resources across multiple administrative domains, which can be harnessed by scientific applications. Grid Information Services are an important infrastructure in computational Grid environments as they provide important information about the state and availability of resources. We will present how the Globus Monitoring and Discovery Service (MDS) architecture can be extended to offer a transparent, unified view of Grid resources. Local resource managers ultimately possess the potential to influence where scientific jobs are processed, depending on the availability and load of their scheduler. Consequently, information about the structure and the state of schedulers should be provided to Grid brokers, which will then decide where to schedule the applications. This paper will present the way in which low-level scheduling information is collated from multiple sources and is incorporated into the unified Grid information view. The existence and availability of Grid resource information also allows reliable application performance prediction to be carried out using the Warwick PACE (Performance Analysis and Characterisation Environment) system. Thus, Grid Information Services are a key component in the efficient execution of scientific applications on Grid computing architectures.
1
Introduction
The information services framework used in the global effort to create a Grid infrastructure is the Monitoring and Discovery Service (MDS) [4] from the Globus Toolkit. Its structure consists of a number of configurable information providers (Grid Resource Information Services) and configurable directory components (Grid Index Information Services) [3]. This existing framework has been installed and is being tested at the University of Warwick within the context of developing performance-oriented Grid middleware. An initial stage of the work consisted in defining how various components act individually and collaboratively to serve the execution requirements of scientific applications [5,2]. The process of scheduling scientific applications to be run on the Grid involves many steps, one of which is to obtain exact state information from local schedulers. The scheduler which is used in this work is Titan, a local-area workload manager [6]. Each instance of the scheduler manages a resource pool where the characteristics of the set of
746
747
schedules and those of the resources are highly dynamic. In the Grid context, it must be possible for this kind of local information to be propagated and made available to other remote administrative domains. Only then can the superscheduling of scientific applications take place, based on the local scheduling information of multiple sources. In this context, an information provider is a service that provides a subset of useful information about resources participating in the Grid. Moreover, the structure of the MDS offers a unified solution to the distributed nature and fail-prone information providers. There is a need for information services to be as distributed and decentralised as possible, with providers located on or near the entities they describe [3]. Therefore, it is reasonable to have the scheduler act as an information provider or to have a database near the scheduler providing such services.
2
Titan Scheduler as an Information Provider
Having the Titan scheduler as an information provider increases the likelihood of obtaining dynamic and reliable information about available resources. Likewise, the role of the Grid information service is to focus only on the efficient delivery of state information from a single source, that is the particular information provider. One of the ways in which scheduler information can be made available to the MDS is by speculative evaluation [8] where information from the scheduler is generated at a regular interval. This information is placed in a local backend database which the GRIS can access upon request, as shown in Figure 1. GRIS host
Front End containing -
••::•:•
•CtWCW -
:::: :*lapdBar»r
Q R I S b Ktakvi-
LDA >Data ntarc hang* Forma t (LDIF)
forkOarv i «MC() •
........ i:4_
• J
ProvUar*
Re ad during Search Request •
Pariodic Dynamic Writs
Figure 1: Reading and Writing Dynamic Scheduling Information
This method has been implemented on the Grid testbed at the University of Warwick and it has both advantages and disadvantages. The benefit is that the scheduler itself is not overloaded, since a central repository is accessed for the values of scheduling
748
attributes. Moreover, if the scheduler fails, the latest values of scheduling attributes are still accessible. On the other hand, the data in the backend database is very dynamic and hence, the scheduler might have a very high write frequency and a comparatively low read frequency. However, it is found that since the database is local to the scheduler on the backend of the GRIS server, this does not affect the way its information is pulled from the aggregate directory. The backend database is relational (Postgres) [9], allowing read and write transactions to be handled efficiently. A number of information providers are then created using JDBC (Java Database Connectivity) [7] to access particular fields in the database, corresponding to specific scheduling attributes. The result is a hierarchy of virtual organisations sharing resource information.
3
Other Alternatives for the Information Providers
A slight variant on the above implementation was to use an LDBM (LDAP Database Manager) database [10] instead of the relational one. This allows the GRIS to pull dynamic information from information providers which access this LDBM database in a speculative evaluation method. The information providers for the scheduler are then written using JNDI (Java Naming and Directory Interface). In this case, the method used to access a GIIS would be similar to that retrieving information from the scheduler. But instead of transactions, the scheduler would use commands of this type: ldapmodify -x -h lab-68.cslab -D "Mds-Software-deployment=Titan scheduler,Mds-Voname=local,o=grid" -f modify-makespan.ldif The advantage of using an LDBM database backend for the scheduler is that there is no conversion from LDIF to other data formats, thus keeping the data model uniform. Yet another method which could be used on its own or concurrently with the speculative evaluation method above, is the eager evaluation. This method focuses on caching data which is generated when a search request is first received. Therefore, the scheduler information which the GRIS has accessed from its backend database can be stored in cache for a configurable amount of time. There are a number of advantages with this method: the load on the GRIS host is reduced and the time taken to service a search request is less. However, the drawback lies in the relative staleness of the information. At the other end of the spectrum, there is lazy evaluation where the scheduler generates information only when a search request is received by the GRIS. This method provides the most up-to-date information but the service time increases as well. Another cost is the increased load on the GRIS host, which each service request carries. To obtain dynamic information from the scheduler, the speculative evaluation method is the most appropriate. Up-to-date dynamic information is required, hence making a purely eager evaluation infeasible; on the other hand, a lazy evaluation will only increase the load on the GRIS host. Consequently, a speculative evaluation method is judged as the most appropriate in this case.
4
Conclusions
The work discussed in this paper addresses how information is pulled by a GRIS from a scheduler for attributes including makespan [6], queue length and whether the genetic
749
algorithm is switched on [6]. Since each scheduler monitors a number of resources, there could be a number of schedulers providing important resource information to the MDS. The GIIS is responsible for aggregating this information from a number of schedulers and presenting it in a unified way. Being able to access dynamic information across administrative domains, eventually enables resource-sharing. The knowledge of resource status and availability within the resulting virtual organisation is crucial in determining which resources are involved in performance prediction using PACE [1]. The PACE performance model and the application deadline requirements are then used to calculate the execution time for the particular application on the matched resources. The collation of resource information from multiple geographically-dispersed sources is also very useful in scheduling scientific applications over a large number of resources, thus decreasing the cost of the processing and the total execution time.
5
Acknowledgements
This work is sponsored in part by grants from the NASA AMES Research Centres (administered by USARDSG, contract no. N68171-01-C-9012) and the EPSRC (contract no. GR/R47424/01).
References 1. A.M. Alkindi, D.J. Kerbyson, and G.R. Nudd. Dynamic Instrumentation and Performance Prediction of Application Execution. Proceedings of High Performance Computing and Networking (HPCN2001), Lecture Notes in Computer Science, Volume 2110, Springer-Verlag, Amsterdam:313-323, June 2001. 2. J. Cao, S.A. Jarvis, S. Saini, D.J. Kerbyson, and G.R. Nudd. ARMS: An Agentbased Resource Management System for Grid Computing. In Scientific Programming Special Issue on Grid Computing, 2002. 3. K. Czajkowski, S. Fitzgerald, I. Foster, and C. Kesselman. Grid Information Services for Distributed Resource Sharing. Proc, 10th IEEE International Symposium on High-Performance Distributed Computing (HPDC-10), 2001. IEEE Press. 4. S. Fitzgerald, I. Foster, C. Kesselman, G. Laszewski, W. Smith, and S. Tuecke. A Directory Service for Configuring High-Performance Distributed Computations. 5. H.N. Lim Choi Keung, J. Cao, D.P. Spooner, S.A. Jarvis, and G.R. Nudd. Grid Information Services using Software Agents. Eighteenth Annual UK Performance Engineering Workshop (UKPEW'2002) pp. 187-198 6. D.P. Spooner, J. Cao, J.D. Turner, H.N. Lim Choi Keung, S.A. Jarvis, and G.R. Nudd. Localised Workload Management using Performance Prediction and QoS Contracts. Eighteenth Annual UK Performance Engineering Workshop (UKPEW'2002) pp. 69-80 7. http://jdbc.postgresql.org/ 8. http://www.globus.org/mds/ 9. http://www.postgresql.org/ 10. http://www.openldap.org/
AN OPEN PRODUCER AND CONSUMER MANAGEMENT SYSTEM FOR GRID ENVIRONMENT
TIANYIZANG
ZHOU LEI
WEI JIE
Institute of High Performance Computing, 1 Science Park Road, Singapore 117528 E-mail: zangty@ihpc. as tar. edu. sg Grid technologies have been widely adopted and are entering the mainstream in scientific and engineering computing. Grid technologies and infrastructures support the sharing and coordinated use of diverse resources that are geographically distributed components operated by distinct organizations and individuals with different policies. In a Grid environment, a substantial amount of runtime information requires to be gathered and delivered for various Grid services such as performance analysis, performance tuning, performance prediction, fault detection, co-scheduling and re-scheduling. Because the characteristics of runtime information are fundamentally different from the characteristics of system or program-produced data, systems that collect and distribute the information should satisfy certain requirements. Exploiting producer/consumer model, we design an open producer and consumer management framework to fit with the heterogeneous, dynamic and multi-domain nature of Grids. The security and fault-tolerance mechanisms are incorporated into the framework. Both producer and consumer are designed as Grid services and implemented using Web services technology. Based on the open framework, the management system for Grid applications monitoring is instantiated in our Grid computing project.
1
Introduction
Grid technologies are evolving toward an Open Grid Services Architecture (OGSA) [1,2]. OGSA provides a uniform service-oriented architecture and integrates Grid technologies with emerging Web Services standards. In a Grid environment, management of producer and consumer is critical for enabling Grid computing. The design of producer and consumer management system is made challenging by the characteristics of Grid performance data [4]. Furthermore, the diversity of resources involved and the dynamic nature of Virtual Organization membership also make it difficult Several groups are developing Grid monitoring systems [5,6]. They have recognized the need of interoperability between these systems. The Global Grid Forum Performance Working Group has presented a Producer and Consumer model [4]. But, some crucial issues, such as component creation, control and coordination, are not addressed. In this paper, we present a producer and consumer management system for Grid on the basis of Producer/Consumer model. Both the Producer and Consumer are viewed as Grid Services and implemented as Web services. A Two-stage Producer and Consumer Interaction Protocol (TPCIP) is proposed. Speaking this protocol, the Producer and the Consumer collect and distribute performance data in a Grid environment. The remainder of this paper is organized as follows: Section 2 describes system architecture and TPCIP protocol. The interfaces of the Producer and the Consumer are also presented. Section 3 discusses the OGSA-based implementation of the Producer and the Consumer Grid Services. Finally, Section 4 concludes the paper and outlines our future work directions.
750
751 2
Architecture
A Grid producer and consumer system is different from a general monitoring system in that it must be scalable across different network domains of organizations and encompass a large number of heterogeneous resources. The system must be able to ensure performance data integrity and preserve the access control policies imposed by the owners of the data. Additionally, since the searches in a Grid space have unpredictable latencies that may impact request for performance information, performance information sources discovery needs to be separated from actual performance information transfer. Our producer and consumer management system consists of three types of components as shown in Figure 1. A Directory Grid Service supports information publication and discovery and initiates the communication between the data source and sink. A Producer Grid Service makes performance event data available to other components that are part of the management system. A Consumer Grid Service receives performance event data from a Producer Grid Service.
Figure 1 The Producer and consumer Framework for a VO Grid An event is a typed collection of data with a specific structure that is defined by an event schema. Every event has an associated event type that uniquely identifies the structure for a particular event. 2.1
Producer/Consumer Interaction Protocol
A Two-stage Producer and Consumer Interaction Protocol (TPCIP) is presented to supports to transfer performance data between Producers and Consumers in different interaction modes. • TPCIP-Negotiation In order to guarantee QoS of Grid service invocation and determine interaction mode, some operations should be carried on in advance. Either a Producer or a Consumer may act as an initiator to activate the operations as shown in Figure 2. Initiator lookups Directory Service and requests to create instances of Producer and Consumer services peer. These instances make further negotiation on the benefits of initiator. According to the specific access control policies and different performance detail, the Producers and the Consumers should carry on mutually authentication and authorization and check the delegation of credential. Depending on the invocation semantics the parameters is configured to set up the interaction model, such as subscription, query or notification. • TPCIP-Execution The Producer directly sends one or more performance events to the Consumer in the interaction way determined in the first stage of TPCIP.
752
The control messages and the date messages are used in the TPCIP-Negotiation stage and TPCIPExecution stage respectively. Two stages of TPCIP may map into different wire protocols. Indicate [•'.em to
Oeate
^ • I'.vent type
-
• I:vent schema
*
Configure interaction model • Destination • [Encode or encrypt events • Interval • Buffer si/e • Timeout • Wire protocols • Initial lifetimes • Termination conditions
- » •
Check access control • Authentication • Delegation of credential • Authorization
Ignite participants
J _ Transfer Event Data
TPCIP-Negotiation 4
I TPCIP-Execution : •
Figure 2 TPCIP Behaviors 2.2
Producer/Consumer Interfaces
Using the Producer and the Consumer interfaces to speak TPCIP protocol, the Producer and the Consumer make agreement to collect and distribute the performance data. The implementation of TPCIP protocol allows multiple wire protocol bindings. These interfaces perform the behaviors of TPCIP shown in Figure 2. 3
Implementation Issues
Using the Java CoG Kit [3] that is maintained as part of the Globus Project, we have implemented a prototype of the producer and consumer management system on the top of Globus in Java. The MDS of Globus is served as the Directory Grid Service to publish the existence of the Producer and the Consumer Grid Services. Both the Producer and the Consumer are viewed as OGSA-based Grid Services and they are implemented as Web services. Besides the interfaces inherited from the standard OGSA Grid Services [7], the Consumer and Producer services define their specific interfaces which has been described in Section 2.2. The interfaces speak TPCIP protocol to collect and distribute performance events between the Producer and the Consumer. 3.1
Extension ofEntry Object
An OGSA-based Grid Service should adhere to standard Grid Service interfaces and behaviors specified in [7]. In addition, in order to speak the TPCIP protocol, the entry objects of the Globus MDS are expanded by adding the following three kinds of entry objects. • Event entry object consists of two elements, event type and event schema. Event schema is a structure that collects different typed fields. • Producer entry object consists of the elements related the access control, interaction model, wire protocol and so oa • Consumer entry object consists of elements including the consumer URL and the elements correspond to like-named elements in the Event Producer Entry Object.
753 3.2
Specific Interfaces Implementation
The fundamental pattern used to speak the TPCIP protocol is of a server and a client communicating through a client-side proxy, which makes invocation of the remote method seem, to the client, like a local method call. Both Producers and Consumers may act as servers. The events are sent from the Producer to the Consumer using a separate object, called an EventPipe. Producers call PushEvent to send events in the pipe, and Consumers call PullEvent to receive the events. Control messages are transferred using the SOAP-HTTP binding. All control messages are SOAP messages. The wire protocol for data messages is selected dynamically depending on the specific application, for example, SOAP+HTTP data messages. The event data is encoded in XML. 4
Conclusions and Future Work
We present an open producer and consumer management system that incorporates the TPCIP protocol. It provides the interoperability between events Producers and Consumers across different domain and supports collection and distribution of performance event data in the VO Grid environment. We are working to create formal WSDL description of the interfaces between the Producer/Consumer Grid Services and monitoring sensor/rink, and integrate more sophisticate monitoring sensors/rinks and performance management tools into the producer and consumer management system. References 1 2
3
4 5
6
7
Foster I., Kesselman C, Nick J. and Tuecke S., Grid Services for Distributed System Integration, Computer. 35 (2002), pp. 37-46. Foster I., Kesselman C, Nick J. and Tuecke S., The Physiology of the Grid: An Open Grid Services Architecture for Distributed Systems Integration. (2002) http://www. globus.org/research/papers/ogsa.pdf. Laszewski G. von, Foster I., Gawor J., Smith W. and Tuecke S., CoG Kits: A Bridge between Commodity Distributed Computing and High-Performance Grids, in the Proceedings of ACM 2000 Java Grande Conference. (2000) pp. 97-106. Tiemey B., Aydt R., Gunter D., Smith W., Swany M., Taylor V. and Wolski R., A Grid Monitoring Architecture. http://www-didc.lbl.gov/GGF-PERF/GMA-WG/. Tiemey B., Crowley B., Gunter D., Holding M., Lee J. and Thompson M. A Monitoring Sensor Management System for Grid Environments, in the Proceedings of the IEEE High Performance Distributed Computing Conference. (August 2000). Wolski, Spring N. and Hayes J., The Network Weather Service: A Distributed Resource Performance Forecasting Service for Metacomputing, Future Generation Computing Systems. (1999) http://nws.npaci.edu/. Tuecke S., Czajkowski K., Foster I., Frey J. et al, Grid Service Specification. (2002) http://www.globus.org/ogsa
REPLICA SELECTION FRAMEWORK FOR BIO-GRID COMPUTING LIZHE WANG, WENTONG CAI, BERTIL SCHMIDT AND BU-SUNG LEE Nanyang Technological University, Nanyang Avenue, Singapore 639798 E-mail:(pg0246397'3, aswtcai, asbschmidt, [email protected] WEIJIE Institute of High Performance Computing, 1 Science Park Road, Singapore 117528 E-mail: [email protected] Computing grid is a promising platform that provides plenty of resource for large scientific computing applications. To achieve the performance of applications in the grid environment, careful resource management is needed. Protein alignment analysis, a typical bio-grid computing application, is a computing intensive, data parallel application. It needs data storage resources and large computing resources. This paper discusses a replica selection framework for bio-grid computing and demonstrates that the application of protein alignment analysis can benefit from the framework.
1
Introduction
Now, computing grid has been the most promising computing platform for large scientific applications. Computing grid infrastructure middleware, such as Globus [2], has been developed to support application users with grid services, e.g., resource management and grid information services. However, on top of these services, some higher-level supports are still needed to achieve better performance. Bio-grid computing can benefit greatly from the computing grid [1]. The application of protein alignment analysis [3,4], a typical bio-grid computing application, is a common and often repeated work in the field of molecular biology. This application is to find similarities between a particular query sequence and all sequences in Protein Banks. Processing different protein alignments, different protein datasets in different Protein Banks are needed. The Protein Banks may be large scale and geographically distributed. Thus, the computing nodes that carry out the protein alignment analysis should select the proper Protein Banks that give the minimum communication delay, for replica. It is called replica selection. In this paper, a replica selection framework for protein alignment analysis is discussed. The replica selection framework is developed to select the nearest data storage resources for protein alignment processing dynamically. Since Globus has been the de Facto standard of Grid computing, our work is developed on the platform of Globus Toolkit 2.0 and several popular resource management systems. This paper is organized as follows: in section 2, we introduce the background of protein alignment analysis application. Section 3 describes the replica selection framework. In section 4, we present some preliminary performance result of the framework. We conclude our work in section 5. 2
Protein Alignment Analysis
Protein alignment analysis allows biologists to point out sequences sharing common subsequences. From a biological point of view, it leads to the identification of similar functionalities. The need for speeding up this application comes from the exponential
754
755
growth of the bio-sequence Protein Banks: every year their size scaled by a factor 1.5 to 2 [3]. The application is based on SPMD (Single Program Multiple Data) concept. Thus, it is implemented using the master-slave model. The master is responsible for dividing the tasks and scheduling the tasks and slaves execute the tasks (see figure 1).
Fig. 1 Master-slave paradigm According to the tasks received, slave selects the nearest proper Protein Banks and makes a copy of datasets from of the Protein Banks (see figure 2). These copies of protein alignment paradigms are used for protein alignment comparison. When the task is finished, the results are sent back to the master. 3
Replica Selection Framework
After the task is submitted to the slave node, the slave node makes a replica selection for running the task. Different types of task may require different data sets for protein alignment analysis. These data sets are stored in different Protein Banks that locate in geographically distributed sites. An LDAP [5] server is configured to present replica directory service. Replica catalog supports to register the Protein Banks as logical collections of data sets and provides mappings between data sets and Protein banks. The replica selection in the slave node follows these steps (see also figure 2): (1) Get information of tasks submitted to the slave node and determine the data sets needed for tasks execution. (2) Query LDAP server to locate the proper Protein Banks that can contain the data sets needed. (3) From each proper Protein Bank that contains the required dataset, the peer-topeer communication performance is tested between the slave and the protein bank. (4) The slave node selects the Protein Banks that give minimum data transfer time as the replica source.
756
Jtr--tf t e i n Bank k
*-, Tasks
Li d
SLAVE
t a
set s
| i L
T
^ LDAP server
Re p i i c a Cat a 1 og
Figure 2 Replica Selection Frameworks
4
Performance Evaluation
In order to evaluate the performance of the replica selection, a test bed is configured. It includes three clusters in NTU (Nanyang Technological Univ., Singapore), one cluster in IHPC (Institute of High performance Computing, Singapore) and one single node in Osaka University (Japan). There is a dedicated network link between NTU and OSAKA University. An LDAP is configured in NTU for replica service. Two Protein Banks are set at NTU and IHPC respectively. Globus Toolkit is installed on the head nodes of clusters and the single node in Japan. Globus GRAM will communicate with the local job manager (e.g., Condor) about on scheduling tasks within a cluster. Table 1 shows the replica placement for the Protein Banks. Now, a single node in the Osaka Univ., as an example, is the slave node and replica selection result is shown in Figure 2. The network performance in the test bed is shown in Figure 3. Protein Bank 1 2
Site NTU IHPC
Protein Bank 1 2
Replica placement datasetl, dataset2, dataset5 datasetl, dataset3, dataset4
Table 1 Replica placements
757
•
(Xi i n ! cati en pa-fcrrarrje betvean NRJ arrJOakalriv.
^
ccmric4icn pa-faraTE betvem IHC andCtekaUiv.
Q5
1
2
3
4
5
6
7
8
i rtfec
Figure 3 Communication Performance Task index 1 2 3
Replica requirement datasetl dataset2 dataset4
Replica decision Protein Bank 1 Protein Bank 1 Protein Bank 2
Table 2 Replica Requirement and Selection 5
Conclusion
The objective of replica selection framework is to help bio-information applications achieve better performance in the grid environment. Thus, the communication delay is minimized and the performance is improved. 1. BioGRID project in EUROGRID project, http://biogrid.icm.edu.pl/, 2002. 2. Globus Project, http://www,globus.org. 3. Bertil Schmidt, Heiko Schroder and Thambipillai Srikanthan, A SIMD Solution to Biosequence Database Scanning, in Proceedings of PaCT'2001, Lecture Notes in Computer Science 2127, Springer 2001, pp. 498-509. 4. Lizhe Wang, Wentong Cai, Bertil Schmidt, et. al. Bio-grid Computing Platform: Parallel Computing for Protein Alignment Analysis, to appear 6* in International Conference on High Performance Computing in Asia Pacific Region (HPC Asia), 2002. 5. T. A. Howes and M. C. Smith. LDAP Programming Directory-Enabled Application with Lightweight Directory Access Protocol. Technology Series. MacMillan, 1997.
A GRID TESTBED SUPPORTING MPI APPLICATIONS WEI JIE, ZHOU LEI AND TIANYI ZANG Institute of High Performance Computing, 1 Science Park Road, Singapore 117528 E-mail: [email protected] LIZHE WANG Nanyang Technological University, Nanyang Avenue, Singapore 639798 E-mail: [email protected] Computational Grid is becoming a new platform for large scale science and engineering applications. We have constructed a Grid testbed using Globus toolkit and MPICH-G2 as the underlying middleware. This paper discusses the architecture of the Grid testbed and how this testbed supports the execution of MPI applications in a Grid environment. To evaluate the performance of MPI applications running on this Grid testbed, experiments were conducted and the results were analyzed.
1
Introduction
Computational Grid [1,2] involves heterogeneous collections computers that may reside in different administrative domains, run different software, be subject to different access control policies, and be connected by networks with widely varying performance characteristics. To run MPI applications in a Grid environment, MPICH-G2 [4], a Gridenabled implementation of the Message Passing Interface (MPI) [3], is developed that allows users to run this kind of applications across multiple computers at different sites using the same commands that would be used on a parallel computer. MPICH-G2 works with the Globus toolkit [5] and uses the services provided such as information, security, resource management, communication services, etc. We have constructed a Grid testbed for MPI applications using Globus toolkit and MPICH-G2 as underlying middleware. Example MPI application was executed on this testbed and its performance was evaluated. In this paper, firstly the architecture of this Grid testbed is described in Section 2. Section 3 discusses how to run an MPI application on this Grid testbed. The performance study of example MPI application is presented in Section 4. Finally, Section 5 concludes this paper and outlines our future work directions. 2
Grid Testbed Architecture
The architecture of the Grid testbed is depicted in Figure 1. A user submits an MPI application through Grid Portal. The MPICH-G2 receives the request and invokes the Globus Dynamically Updated Request Online Co-allocator (DUROC) service [5]. The DUROC can allocate multiple resources simultaneously for the sub-jobs of an MPI application. Before resource co-allocation, the DUROC needs to talk to the Globus Gatekeepers for user authentication. The gatekeepers then contacts their respective Job Managers (which in turn ask Local Resource Managers) to create and distribute sub-jobs of the application to appropriate computing resources. The DUROC also verifies correct startup of these sub-jobs and coordinates their parallel executions which may span multiple domains and resources.
758
759
The execution of MPI applications in a Grid environment can be made more intelligent through a Resource Broker. With the help of Globus Grid information services [5], i.e. Grid Index Information Service (GIIS) and Grid Resource Information Service (GRIS), the Resource Broker tries to find appropriate resources to meet applications' resource specifications.
Ki'Muuee Specification
* * «iu?' :.-. i i
Gulckccjx'i >r Job M insiger
Mi'iiiiiinng & I I'lilrol
Site 2
Figure 1. Grid Testbed Architecture
We have constructed a Grid testbed based on the architecture described above (so far the Grid Portal and the Resource Broker are not incorporated into this Grid testbed). Currently this testbed includes 2 clusters from IHPC (Institute of High Performance Computing), 1 cluster from NUS (National University of Singapore) and 1 cluster from NTU (Nanyang Technological University). Globus Toolkit and MPICH-G2 are deployed on the head node of each cluster. In addition, some popular local resource management systems (e.g. Condor, PBS and Sun Grid Engine) are installed in these clusters. Table 1 summarizes the configuration of this Grid testbed.
760 Table 1. Grid Testbed Configuration Organization IHPC
NUS NTU
3
Resource Linux Cluster (13 nodes): IBM xSeries 330, Intel PHI 933 Linux Cluster (17 nodes): IBM xSeries 330, Intel PHI 933 Linux Cluster (35 nodes): IBM xSeries 330, Intel PHI 933 Linux Cluster (6 nodes): Intel PII450
Local Resource Manager Condor Sun Grid Engine Condor PBS
Running MPI Applications on Grid Testbed
The execution of an MPI application on the Grid testbed using MPICH-G2 follows the following steps: • Compile the application on each machine using the appropriate MPICH-G2 compiler and link to the MPICH-G2 library to generate the executables on each machine. • Construct a Globus RSL (Resource Specification Language) [4] script which specifies the sub-jobs and the resources required to run them. A sample RSL file is shown is Figure 2. • Submit the RSL file directly to mpirun command provided by MPICH-G2 and start the execution of the MPI application. + (&(resourceManagerContact="tornado.ihpc.nus.edu.sg") (count=10) (label="subjob 0") (environment=(GLOBUS DUROC_SUBJOB_INDEX 0) (GLOBUS..TCP_PORT.._RANGE "1024 1050")) (arguments="/home/mems/demitube.inp") (directory=/home/mems) (executable=/home/mems/pbcgmg)
) (&(resourceManagerContact="pprg21.sas.ntu.edu.sg") (count=5) (label='subjob 1") (environment=(GLOBUS_DUROC SUBJOBJNDEX 1) (GLOBUS__TCP_PORT. RANGE "1024 1050")) (arguments="/home/mems/demitube.inp'") (directory=/home/mems) (executable=/home/mems/pbcgmg)
)
Figure 2. A RSL File As each cluster is behind the firewall of its belonging organization, the issue of firewall arises when running MPI application on the Grid testbed. Our basic solution is to create a small hole of port numbers in the firewall and to specify that port range with the environment variable GLOBUS_TCP_PORT_RANGE in the RSL script (see Figure 2). 4
Performance Study
We tested an MPI application, i.e. Incompressible Navier-Stokes Solver code, on the Grid testbed. This application is used to solve incompressible flow on domain with moving boundaries. The scale of the problem can be adjusted by tuning a parameter of the application. For a certain scale of the problem, the application is executed on the Grid
761
testbed and IHPC 13 nodes cluster respectively. The ratio between the execution times is presented in Figure 3.
o10-i
.2 4
1n
2n
3n
4n
Problem Scale
Figure 3. Ratio of Execution Time under Different Scales of Problem Figure 3 reveals that the ratio of execution times decreases when the scale of the problem becomes larger. It indicates that the Grid testbed is more attractive for solving large scale computation intensive problems. From the trend of the curve in Figure 3, we may predict that the performance of the Grid testbed can be better than that of the local cluster used in the experiment when the scale of the problem increases to a certain point. Actually this is a major merit of a Grid system when it is used to solve large scale problems to which a single computing system usually is uncompetitive. 5
Conclusion and Future Work
Using Globus toolkit and MPICH-G2, we constructed a testbed supporting MPI applications in a Grid environment. Our experimental results show that a Grid system is competitive to solve large scale computation intensive problems. In future work, we will develop a Grid Portal which provides a Web-based problem-solving environment that integrates Grid services such as real-time resource monitoring, job submission, file transferring and so on. The Resource Broker will be incorporated into the Grid testbed to support a smart resource discovery and allocation. In addition, more MPI applications with a variety of problem scales will be tested and their performance will be evaluated. References 1. Foster I. and Kesselman C , The Grid: Blueprint for a New Computing Infrastructure (1999), Morgan Kaufmann. 2. Foster I., Kesselman C. and Tuecke S., The anatomy of the Grid: enabling scalable virtual organizations, International Journal of High Performance Computing Applications 15 (2001) pp. 200-222. 3. Groupp W., Lusk E., Doss N. and Skjellum A., A High-performance, Portable Implementation of the MPI Message Passing Interface Standard, Parallel Computing 22 (1996) pp. 789-828. 4. Foster I. and Karonis N., A Grid-Enabled MPI: Message Passing in Heterogeneous Distributed Computing Systems, in Proceedings of Supercomputing 98 (SC98). 5. Foster I. and Kesselman C , Globus: a metacomputing infrastructure toolkit, International Journal of Supercomputer Applications 11 (1997) pp. 115-128.
RUNNING MPI APPLICATION IN THE HIERARCHICAL GRID ENVIRONMENT LIZHE WANG Nanyang Technological University, Nanyang Avenue, Singapore 639798 E-mail: [email protected] WEIJIE Institute of High Performance Computing, 1 Science Park Road, Singapore 117528 E-mail: [email protected] WEIXUE Tsinghua University, Beijing, P.R.CHINA 100084 E-mail: [email protected] This paper presents a real-time application in the hierarchical Grid environment. This application is transient stability computation for large-scale power systems. In this application, a new programming model is proposed. In the hierarchical Grid environment, this application is running on two clusters with MPICH 1.2.3 implementation. On the head node of each cluster, Globus Toolkit is installed. An MPICH-G2 based program is running on the head nodes for data exchange between two clusters. The application can make full use of computing resources with the programming model proposed.
1
1.1
Introduction
Application Background
The complexity of computing transient stability of modern power system maybe awful not only because power system is a high dimensional, hierarchical and distributed system but also that the electromagnetic process and strong non-liner elements need modeled in such system. Thus, the computation of power system transient stability is a computing intensive application. Furthermore, the computation of power system transient stability need to be finished in a short time to fulfill the on line control requirement. Therefore, parallel computation for power system transient stability is the natural solution. This application is a transient stability computation of large-scale power system. It developed with C language and MPI-2 on the LINUX platform. 1.2
Grid Computing and MPI
Grid computing [1] has emerged as an important new field, distinguished from conventional distributed computing by its focus on large-scale resource sharing, innovative applications, and, in some cases, high-performance orientation. Computing Grid is a hierarchical environment. Computing resources, e.g., NOWs, clusters, located in geographically distributed sites, are shared in the computing Grid. Grid infrastructure, e.g., Globus Toolkits [2], is installed in the head nodes of these resources. Inside each site, the resource is managed by the local resource management system, e.g., PBS [3], Condor [4]. Many parallel applications are written with MPI (Massage Passing Interface). MPI is a library specification for message passing, proposed as a standard by a broadly based
762
763
committee of vendors, implementers, and users [5]. There are different MPI implementations for different architectures and environments, e.g., MPICH-P4 for LAN environment and MPICH-G2 for Grid environment. To make full use of resources in the hierarchical Grid environment discussed above, the MPI applications need to link different MPI libraries in different levels. For example, in the Grid environment, MPICH-G2 libraries should be used. 2
Programming Model
The whole data of the power system to be computed is divided into two parts. Inside each cluster, an MPICH-P4 based program, says program 1, can be executed with one part of the whole data of whole system. An MPICH-G2 based program, says program2D can run on the Globus level for data exchange between clusters. The software module is detailed in Figure 1.
message passing(MPICH-G2 based)
process 2 of program 1
process n of program 1
ft ft
ft
process 1 of program 1
message passing(MPICH based)
process 2 of program 1
process n of program 1
ft ft
ft
process 1 of program 1
message passing(MPICH based)
cluster 2
cluster 1 Figure 1 Software Module
764
3
Testbed
We use two clusters, says clusterl and cluster2, in the test. On the head node of each cluster, Globus Toolkit 1.1.4 is installed. The cluster configuration information is listed in Table 1 and Table 2. The system configuration is detailed in figure 2. Node Number Head Node Slave Node
Cluster 1 1 11
Cluster 2 1 17
Table 1 Cluster Size Processor Cache Memory Hard Disk Operating System
IBM xSeries 330, PHI 933 Mhz 256MB L2 c 1GB 18 GB SCSI HDD Linux
Table 2 Node Configuration Ethernet
/ /]
100 Mbps / Fast Ethernets
100 Mbps / Fast Ethernet / n
a '_/
slave node si;-.
slave node
slave node
clusterl with 11 slave nodes
slave node
slave node
cluster 2 with 17 slave nodes Figure 2 System Configuration
4
Results and performance evaluation
The result of application is showed in table 3. Now, the computing time of application is 0.093s <0.1s, which can fulfill the requirement of the real-time.
765
Processor number Running time (single processor) Running time (the whole system) Speed up
30 1.082000 s 0.093000 s 11.63
Table 3 running result The system performance is showed in table 4.
Intra-cluster Communication
System idle 0.0% user, l.l%system 1048076K available, 511444Kused, 536632K free Average Available bandwidth = 10.95 MB/sec
Running application 97.6% user, 2.3% system 1048076K available, 522572K used, 525504K free Average Available bandwidth = 5.51 MB/sec
Inter-cluster Communication
Average Available bandwidth = 9.57 MB/sec
Average Available bandwidth = 6.67 MB/sec
CPU (per node) Memory (per node)
Table 4 system performance
5
Conclusion
In this paper, we present a real time application for transient stability computation of large-scale power system, which is running in the hierarchical Grid environment. High performance computing can be accessed with low cost clusters linked in the Grid environment. A new programming model is also proposed to run MPI applications in the hierarchical Grid environment.
References 1. I. Foster. The Grid: A New Infrastructure for 21st Century Science. Physics Today, 55(2), pages 42-47, 2002. 2. http://www.globus.org/. 3. http://www.openpbs.org/. 4. http://www.condor.cs.wisc.edu/. 5. MPI Forum, http://www.mpi-forum.org/.
CLUSTER-BASED PARALLEL SIMULATION FOR LARGE-SCALE POWER SYSTEM DYNAMICS YAN Jianfengl, XUE W e i l , SHU Jiwu2, Wang Xinfengl, JIE Wei 3 1. Department of Electrical Engineering and Applied Electronic Technology, Tsinghua University, Beijing, China 100084 2. Department of Computer Science and Technology, Tsinghua University, Beijing, China 100084 3. Institute of High Performance Computing, 1 Science Park Road, Singapore 117528 Transient stability analysis computation is the most intensive computation part of large-scale power system dynamic simulation. A key problem in large-scale power system real-time simulation is the transient stability analysis. In order to speed up the transient stability analysis, many parallel algorithms and implementations were developed. This paper analysed the available parallel algorithms for transient stability computing, pointed out their advantages and disadvantages and developed a multi-layer BBDF (Block Bordered Diagonal Form) algorithm to achieve the division of network equation computation task on cluster system. Some optimizations to reduce the whole computation cost were also presented. Numerical results on south China power grid were presented to show the speed up and usefulness of the algorithm multi-layer BBDF on cluster system.
1 Introduction Power system dynamic simulation is a powerful tool of power system analysis. Based on the time scale, it can be simply divided into two parts: transient stability analysis and middle-long term dynamic analysis. Transient stability analysis computation requires shorter time step and more complex device models. As is well known, it is the most intensive computation part of power system dynamic simulation. Moreover, its heavy computing task increased sharply with the expansion power system's scale. Transient stability analysis can now only be used in off-line state for large-scale power systems. With the development of power system, it is increasingly necessary to find a useful method to perform on-line dynamic security analysis, which depends on real-time power system transient stability simulation. For now, sequential process computing can't fulfil the requirement of the real-time simulation for large-scale power system transient stability analysis, for it has difficulty in satisfying both quickness and precision. Many parallel studies can be found to complete the real-time transient stability analysis simulation. In this paper, the available parallel algorithms for transient stability analysis computation were presented along with their advantages and disadvantages. Then a multi-layer BBDF (Block Bordered Diagonal Form) algorithm and some optimizations were proposed to achieve the division of computation task on cluster system and reduce the whole computation cost. Numerical results of south China power grid have shown that the implementation on cluster system can perform transient stability analysis computing much faster than real-time dynamic process, which showed that the application can fulfil the requirements of large-scale power systems real-time transient stability analysis computing. 2 Power system transient stability computing model To describe the characteristic behaviour of power system transient process, a set of DAE (Differential Algebraic Equations) is usually needed.
766
767
lxd=f(xd,Vd)
= Axd+Bu(xd,Vd)
(1-1)
\h=g{xdyd) = YNVd d-2) Where: x d - State vector of generator and exciter variables; Vd - Real and imaginary parts of voltages at each node; Id - Real and imaginary parts of current injection; •A - Block diagonal matrix; " - Rectangular block matrix; ' • ' ' * - Nonlinear vector function; YN - Complex sparse matrix According to network characteristic of power system, the rank of xd was far lower than that of Vd . Equation (1-1) could be solved with separately block, which was a facility in parallel computing task division. Computing time cost depended on Equation (1-2) basically, where YN was a sparse high rank matrix. It must be mentioned that the roots of equations must be a series according to time domain t. In traditional, the DAE were solved separately for each time step, which is called IAI (Interlaced Alternating Implicit) algorithm. A series roots according to time domain t could be received by following periodically computing. For (t = 0; t< Total time steps; t++) { Obtain xd (t), YN (t) from events and system state in previous time step; Obtain initial Vd (t) from computing initialization or results in previous time step; While(AV/(0> +l
Obtain Id
E
){
(t) from g using Vd (k), xd (k) ;
Obtain the time derivatives fd (t)(xd(t),Vd
(t)) from Equation (1-1);
Calculate xd (t) with implicit integration method, by (1-1) Obtain Vd+ (t) solved from Equation (1-2);
AV;+1(0 = | Vdk+\t)-Vd'(t) I; k++; } } 3 Parallel transient stability algorithm and optimization schemes The design of parallel algorithm depends on the computer architecture. In this section, cluster system used in our research was introduced firstly. Secondly, the available parallel algorithms for transient stability analysis computing were presented. Finally, a multi-layer BBDF algorithm and its optimizations were proposed.
768
3.1 architecture of cluster system Cluster system, consisting of commercial CPUs and network devices, can provide enough computation power to enable parallel and distributed real-time simulation for large-scale power system transient stability analysis computing. In Table 1, the cluster system constitution used in this paper was given in Table 1. Table 1 Cluster system of Tsinghua Univ.
Number of computing nodes Makeup Node Information
CPU Memory Hard disk Net
Operation System Programming Language Parallel Environment
Results 36 SMP (Symmetry Multiple Processor) 4 X Intel Xeon PIH700MHz 1G SDRAM 36G SCSI Ultra2 2.56GB Myrinet/IOOM Ethernet Linux C> C+-K Fortran PVM, MPI> OPENMP
3.2 available research on parallel transient stability simulation Recently the research on parallel algorithms and their implementations on transient stability analysis computing have been well developed [1] [2]. Spatial parallel algorithms, including partition method and parallel factoring algorithm, took a time-domain integration method and decomposed each time step computation into sub-tasks among different processors. As a coarse granularity parallel algorithm, partition method application could be easier realized and achieve higher efficiency on distributed architecture system, while parallel factor algorithm behaved better on shared memory system. In order to gain better performance on more processors, the simultaneous multiple-time-step solutions such as WR (Waveform Relaxation) method and parallel-in-time Newton algorithm were introduced into parallel algorithms. These time domain parallel algorithms enlarged the size of transient stability analysis computing problems by solving some time steps of the DAE simultaneously and improved the speedup effectively. However, it was difficult to achieve both great effect of parallel computing and high convergence rate. Another limiting factor hindering parallelism was that many invalid computations would be brought into computing task when random events happened in the 'time window' of computing. Many previous developments have been accomplished on memory-shared parallel computer or Transputers, but little on cluster systems. Recently with the wide use of scalable cluster systems, the research on cluster-based parallel algorithms for transient stability simulation has become a new hotspot in this field [3]. 3.2 multi-layer BBDF algorithm and its optimizations According to power system characteristics, the corresponding differential equations of dynamic devices such as generators were only related to one of the network nodes. The
769
differential part of Equation (1-1) could be sorted by different network sub-area, and divided into corresponding processor for computing simultaneously. Therefore, the parallel computation of linear network equations was the key of transient stability analysis computation. Based on cluster system, the linear network equations were reformed in the Block Bordered Diagonal Form[41, as follow:
iAXd,vd)-\ I2(Xd,Vd)
y2 (2)
in(xd,vd) Is(Xd,Vd)\
Y'
Y'
Y'
Y.
Where: I j(Xd,Vd) - Real and imaginary parts of current injection in sub-area j and border area s respectively; Yj - Complex sparse matrix from sub-areay and border area s respectively; Vj - Real and imaginary parts of voltages at each node in sub-area j and border area s respectively; Yj - Complex matrix from connection between sub-area j and border area s respectively; And/=l,2,...,n The BBDF of the equations could be achieved by following flow chart: For Iter = 1, Maxlters Solve the vector and matrix corrected Solve the vector and matrix in sub-area 1 corrected in sub-area n Ay =Y'Y~XY A/ =Y'Y~XI AY =Y'Y'lY,AI =Y'Y~]I n
n
n
n>
n
n
n
n
Communication of collection between processors Solve border area s
(^-XAJ^^-XA/,.
(3)
i=i
Communication of scattering the solution of border area equations to subsystems Solve the node voltages in sub-area 1 Solve the node voltages in sub-area n YV = I •yy, YV =1-YV In some large-scale power systems, Chinese power system, for example, the number of sub-areas had to be large enough to make the computing task of each CPU 'light' enough for real-time simulation. Since the border area node number increased markedly with the rise of the sub-area number, the computing task of equation(3) would be much heavier than any other sub-area equations.
770
Additionally, dissymmetry caused by some events would introduce negative and zero sequence matrices into YN and enlarge border area node number too. To solve the problem, a multi-layer BBDF method was proposed in this paper. To 'lighten' computing task of border area in case of large sub-area number, the equations were divided into several parts as formula (4) and were computed with parallel BBDF scheme, which was the same as above. \ NTp
yTz
NTZ
NpT
(4)
YF
KT
NzT In formula (4), where YT , YTn and YTz were positive, negative and zero sequence node matrices for connection between different sub-areas; YF was the matrix of event nodes; N
T
, NnT , NzT , NTp , NTn
and NTz
were the connection matrices.
Because some event forms had invalid impedance matrix, the admittance matrix was adopted. In this paper, two optimizations were developed to reduce the consumed computing time and improve scalability of the program. • In fact, there were two iteration loops in IAI algorithm. The inner one for solving nonlinear network equations was broken in this algorithm for evading the unnecessary substitutions. And it was proved that the solutions of this algorithm gained the same precision as the traditional IAI algorithm. • During the simulation, state variables of dynamic devices, currents injected and node voltages of positive sequence network had to be updated every iteration step, while those of negative or zero sequence network in sub-areas and border area were not necessary. So the state variables of negative or zero sequence in sub-areas and border area were solved only once in every iteration step. Furthermore, different event computing schemes were used to different event forms, such as cancelling zero sequence network computation when only inner-phase faults happened. 4
Results
Some cases of south China power grid, the largest power grid in China and being given in Table 2, were tested on the cluster system. For all the tested cases, an A-phase fault on single branch was assumed, the time step was 0.02s, and convergence tolerance was ,-4 10' pu. Table 2 power system data used in tests
Power system Number of Nodes Number of Branches Number of Province Systems
South China Power Grid 2098 2588 6
771
Number of Regional System Sparseness of YN
53 0.21% Average Number of Branches per Node 2.36 Figure 1 showed the speedups and simulation velocity of the parallel computing with the data of south China power grid. Both of the communication devices of cluster system, 100M Ethernet and 2.56 GB Myrinet, were tested respectively. 10
10
12
10
12
5.59
3.172 7.163 8.172 9.069 9.823 9.195 7.572
-Rp 0.21 0.572 1.134 1.226 1.321 1.39 1.407 1.174
Rp 0.215 0.682 1.54 1.757 1.95 2.112 1.977 1.628
2.724 5.4 5.838 6.29 6.619 6.7
Figure 1 speedups and simulation velocity of South China Power Grid (left: 100M, right: Myrinet) Where: Sp: Speed up which was compared with sequential process computing program Rp: Ratio of real time to computing time
Several comments were made from the figures above: • The transient stability analysis computing ran faster than the real-time process in south China power grid with both 100M Ethernet and Myrinet communication device. In figure 1, parallel computing speed up ascended with the increase of the number of sub-area and reached the maximum between 8 and 10. The parallel computing on 8 sub-areas with Myrinet was almost 9 times faster than the sequential simulation and 1 times faster than real-time process. It was proved that the multi-layer BBDF algorithm for transient stability analysis was very efficient. • It was also found that the efficiency of this parallel algorithm exceeded 100% in some test cases. Although the communication and computation cost to solve border area equations increased with the rise number of the sub-areas, the whole computation cost to solve network equations with BBDF scheme decreased. The abnormal high speedups might be achieved when the overall costs of the algorithm in this paper decreased 5
Conclusion
In this paper, a multi-layer Block Bordered Diagonal Form algorithm was proposed and implemented on cluster system. Some optimizations such as iteration reduction were also presented to improve the performance of the simulation computing. Numerical results of the largest power grid in China suggested that the algorithms and optimizations were efficient and scalable. For the real-time transient stability simulation of larger power system, even the future nationwide power grid in China, this algorithm, with adequate efficiency and scalability, is a feasible selection.
772
References [1] Daniel J.Tylavsky, Anjan Bose. Parallel processing in power systems computation [J]. IEEE Trans. On PWRS, 1992,7(2): 629-638 [2] Chai J.S, Bose A.J. Bottlenecks in Parallel Algorithms for Power System Stability Analysis [J]. IEEE Trans. On PWRS, 1993,8(1): 9-15 [3] Wei Xue, Jiwu Shu, Xinfeng Wang, Weimin Zheng. Advance of parallel algorithm for power system transient stability simulation [J]. Journal of system simulation, 2002, 14(2): 177-182 [4] A.Torralba. Three methods for the parallel solution of a large, sparse system of linear equations by multiprocessors [J]. International journal of energy systems, 1992, 12(1) [5] I.C.Decker, D.M.Falcao. Conjugate Gradient Methods for Power System Dynamic Simulation on Parallel Computers[J]. IEEE Trans. On PWRS, 1996, 11(3):1218-1227 [6] Nagata, M., Uchida, N. Parallel processing of network calculations in order to speed up transient stability analysis[J]. Electrical Engineering in Japan, 2001, 135(3):26-36 [7] Wei Xue, Jiwu Shu, Xinfeng Wang, Weimin Zheng. Parallel algorithm of power system network equations on cluster system [J]. Journal of Nanjing University, 2001,37:204-210
HARDWARE IMPACT ON COMMUNICATION PERFORMANCE OF BEOWULF LINUX CLUSTER TANG YUAN 1 , ZHANG YUN-QUAN 1 ' 2 , LEE YU-CHENG 1 '(Research and Development
Center of Parallel Software, Institute of Software, Chinese of Science, Beijing 100080,
China);
2
(Key Lab of Computer Science, Chinese Academy of Science, Beijing 100080, E-mail: [email protected],
Academy
z\[email protected].
China)
[email protected]
There exist a lot of models of parallel computation, amongst which LogP and LogGP are famous and suitable to describe the framework of communication process of Beowulf LINUX Cluster. [4] analyzed the impact of each LogP parameter on real world application in detail. But most researchers seemed ignore the impact of different hardware conditions on LogP parameters. This paper, based on Beowulf LINUX Cluster, compare these various hardware differences and their possible impacts, and propose some useful suggestions in the end. We hope it will be helpful to point out the ways on the configuration of Beowulf LINUX Cluster. And the analysis of software impacts are our future work.
1
Introduction:
There are many models of parallel computation, such as LogP[2], LogGP[3], LogPQ, LoGPC, P-3PC and so on. When performing the tests in this paper, we adopted a named parameterized LogP model proposed in [1], which characterizes a Network as N = (L, Os(m), Or(m), g(m), P). Also, the program to measure the parameters is downloaded from web site http://www.cs.vu.nl/albatross/. The program from http://www.cs.vu.nl/albatross/ runs 6 different combinations of MPFs send and receive mode ({MPI_Send, MPIJsend, MPI_Ssend} * {MPI_Recv, MPIIrecv}), amongst which only MPI_Send * MPI_Recv directly reflects the hardware impacts on a real world MPI application. Other combinations reflect software impacts more. So the following comparison and analysis in this paper will only focus on MPI_Send * MPI_Recv. The remaining contents of this paper are organized as follows: Section 2 describes our testing platforms. Section 3 discusses the impact of computational part per node. Sub section 4.1 and 4.2 details the impact of Ethernet versus Myrinet and 1NIC versus 2NIC, respectively. Section 5 is our conclusions and future work. 2
The testing platforms: The testing jrogram runs on following different Beowulf CPU Host bus clock L1ICache speed 64KB(64 Sv-ether AMD 266MHz bytes/line) 1.6GHz 64KB(64 Sv-myri AMD 266MHz bytes/line) 1.6GHz
773
LINUX Clusters: L2 Cache L1DCache 64KB(64 256KB(64 bytes/line) bytes/line) 256KB (64 64KB (64 bytes/line) bytes/line)
Mem
NIC
1GB
3c905c
1GB
Myrinet GM ver 1.5
774 Sv-PIII
Intel PHI 500MHz
100MHz
16KB(32 bytes/line)
16KB(32 bytes/line)
512KB(32 bytes/line)
Sv-PIV
Intel PIV 1.6GHz
100MHz
12KB(64 bytes/line)
8KB(64 bytes/line)
Sv-INIC
Intel PIV 1.6GHz
100MHz
12KB(64 bytes/line)
8KB(64 bytes/line)
Sv-2NIC
Intel PIV 1.6GHz
100MHz
12KB(64 bytes/line)
8KB(64 bytes/line)
512KB(12 8 bytes/line) 512KB(12 8 bytes/line) 512KB(12 8 bytes/line)
512MB SDRA M 128MB DDR
3c905b
128MB DDR
3c905c
128MB DDR
2*3c905c
3c905c
Table 1. The configuration of our testing platforms
3
The impact of computational part per node:
The testing platforms selected for this comparison are Sv-PIII and Sv-PIV in table 1. 0000 .
1 *">c d - ^ g(m}ne w —*—
* n n
1000
IT
H
TT
' ' '
y
"
osim^old —*— 03<m >rew —*—
.r
100
1 D
B~»
_«-
10
7
H • '•
V
J l . .i .
1
10
100 1000 10000 message size (bytes)
100000
1
10
100 1000 10000 me33age size (bytes)
100000
Fig 3. Sv-PIII.Send.Recv.OsfmJ
Fig 2. Sv-PIII.Send.Recv.g(m)
vs Sv-PrV.Send.Recv.Oifm)
vs Sv-PIV.Send.Recv.g(m) or(m >old or(m >new
10000
1000
^L. -a* f***
100
^a^e,B— r-*^i,
10
1
10
100 1000 10000 message 3ize (byte3)
100000
Fig 4. Sv-PIII.Send.Recv.Or(mJ vs Sv-PIV.Send.Recv.Orfm;
From these 3 figures we could see that, improving the computational part of each node reduces the overhead of transferring short and medium messages, but contributes little to long messages and g(m). Considering that g(m) determines the minimal time interval between consecutive messages, only improving the computational portion enlarges the gap between Os(m), Or(m) and g(m), which is exact the gap between computation and
775
communication and should be used to the overlap of computation and communication in long message's non-blocking transfer. Also, improvement of computational part of each node contributes much to the computational part of real world parallel application. 4 4.1
Networks: Ethernet versus Myrinet
The testing platforms selected are Sv-ether versus Sv-myri in table 1. Running the test programs on these two platforms generate similar comparison figures like those illustrated in section 3. And from these figures we can see that the communication performance improvement by Myrinet mainly focus on short and medium messages (< 16KB in our example). For longer (> 16KB in our example) messages, the performance of Myrinet decreases significantly and even does a little bit worse than Ethernet at some points. Another important point to be mentioned is that Myrinet seems suffer more from "socksize" parameter, which sets the size of underlying send/receive buffer, than Ethernet. For short and medium messages, the improved g(m) of Sv-myri against Sv-ether is about 99.88% 92.52%; For short and medium messages, the improved Os(m) of Sv-myri against Sv-ether is about 90.05% 92.18%; For short and medium messages, the improved Or(m) of Sv-myri against Sv-ether is about 73.02% 59.31 %; 4.2
1 NIC versus 2 NIC :
The testing platforms in this sub section are Sv-INIC and Sv-2NIC. Note: we define 1 NIC mode here to mean only one network adapter is available on each node of Beowulf LINUX Cluster and is used for both MPI and OS communication; 2 NIC mode here to mean two network adapters are available on each node and one is for MPI communication and the other for OS communication such as NFS, YPSERV, ARP. From similar resulting comparison figures like those in section 3, we could see that to pure communication tests, 1 NIC mode and 2 NIC mode defined in this sub section make no significant difference. And further tests illustrate that the amount of OS communication is too little compared with MPI while the applications are running, which is about 0.002% 0.003% in bytes of traffic. And the running results of NPB (version 2.3, CLASS = A, NPROCS = 2) on our Sv-INIC and Sv-2NIC confirm the results. So we may conclude that if a parallel program has a lot of OS communication like NFS, YPSERV, ARP, and so on while running, 2 NIC mode will be useful; otherwise, 1 NIC mode may be a good choice considering the performance/cost ratio. 5
Conclusions and future work:
g(m) is mainly influenced by network speed and software parameters. Only improving the computational part of each node has no significant effect. To short or medium messages, Os(m) and Or(m) is mainly occupied by instruction execution in the protocol stacks. So it is very effective to speedup the CPU frequency and the memory of each node to improve Os(m) and Or(m). If message size keeps growing, Os(m) and Or(m) will be greatly occupied by waiting for the re-availability of underlying send buffer or receive buffer, which depends on the
776
network speed. So to long messages, the improvements of Os(m) and Or(m) rely not mainly on the CPU frequency or memory per node, but the speed of network. The communication performance improvement of Myrinet compared with Ethernet mainly focus on short and medium messages; to long messages, the performance of Myrinet decreases quickly and significantly. And Myrinet seems suffer more from the parameter—'socksize' of GM, which sets the underlying send buffer and receive buffer size. If there are two network adapters available per node of Beowulf LINUX Cluster, and use one for OS communication (such as NFS, YPSERV, ARP and so on), the other for MPI communication, which is so-defined 2 NIC mode in sub section 4.2, the possible performance improvement will depend on the ratio of OS messages compared with MPI messages in the running proceeding of applications. If the ratio is large, the 2 NIC mode will be useful. Otherwise, it will be not. If only one network adapter is available per node, some OS parameters should be adjusted to decrease the amount of OS messages, in order to reduce the total program running time. This paper analyzes the possible impact of computational part of each node, Ethernet versus Myrinet, 1 NIC versus 2 NIC, on communication performance of Beowulf LINUX Cluster. We get some conclusions and put forward some suggestions, which may be useful when one should make some decisions on hardware selection of Beowulf LINUX Cluster. And we'd like to analyze more hardware impact, such as the share of single network adapter by multiple CPU per node in the future. The analysis of software impact is also in the schedule. References: 1. Thilo Kielmann* Henri E.Bal* and Kees Verstoep* "Fast Measurement of LogP Parameters for Message Passing Platforms"* http://www.cs.vu.nl/albatross/. 2. David Culler* Richard Karp* David Patterson* Abhijit Sahay* Klaus Erik Schauser* Eunice Santos* Ramesh Subramonian* and Thorsten von Eicken* 'LogP* Towards a Realistic Model of Parallel Computation"* In Proc. Symposium on Principles and Practice of Parallel Programming(PPoPP), pages 1-12, San Diego, CA, May 1993 3. Albert Alexandrov Mihai F.Ionescu* Klaus E.Schauser* and Chris Scheiman* "LogGP* Incorporating Long Messages into the LogP Model* One step closer towards a realistic model for parallel computation". In Proc. Symposium on Parallel Algorithms and Architectures(SPAA), pages 95-105. Santa Barbara, CA, July 1995 4. Richard P.Martin* Amin M.Vahdat* David E.Culler and Thomas E.Anderson* 'Effects of Communication Latency* Overhead* and Bandwidth in a Cluster Architecture"* http://www.berkelev.edu/~culler/papers 5. David Culler* Lok Tin Liu* Richard P.Martin* and Chad Yoshikawa* 'LogP Performance Assessment of Fast Network Interfaces", http://www.berkeley.edu/~cullei7papers
D-GRIDMST: CLUSTERING LARGE DISTRIBUTED SPATIAL DATABASES
JI ZHANG
YUE CAO
Department of Computer Science, National University of Singapore, Lower Kent Ridge Road, Singapore 117543 Email: {zhangji,
caoyue}@comp.nus.edu.sg
In this paper, we will propose a distributable clustering algorithm, called Distributed-GridMST (D-GridMST), that deals with large distributed spatial databases. D-GridMST employs the notions of multi-dimensional cube and grid to partition the data space involved and uses density criteria to extract representative points from spatial databases, based on which a global MST of representatives is constructed. Such a MST is partitioned according to users' clustering specification and used to label data points in the respective distributed spatial database thereafter. Since only the compact information of the distributed spatial databases is transferred via network, D-GridMST is characterized by small network transferring overhead. Experimental results show that D-GridMST is effective since it is able to produce exactly the same clustering result as that produced in centralized paradigm, making D-GridMST a promising tool for clustering large distributed spatial databases.
1
INTRODUCTION
Spatial data clustering, aiming to identify clusters, or densely populated regions in a large and multi-dimensional spatial dataset, serves as an important task of spatial data mining. Though a large number of spatial clustering algorithms have been proposed in literature so far, most of them assume the data to be clustered are locally resident in centralized scenario, making them unable to cluster inherently distributed spatial data sources. Recent effort in this filed includes [1, 2, 3, 4]. In this paper, we will propose a distributable clustering algorithm, called Distributed-GridMST (D-GridMST), which deals with large distributed spatial databases. D-GridMST employs the notions of multi-dimensional cube and grid to partition the data space involved and uses density criteria to extract representative points from spatial databases, based on which a global MST of representatives is constructed. Such a MST is partitioned according to users' clustering specification and used to label data points in the respective distributed spatial database thereafter. Since only the compact information of the distributed spatial databases is transferred via network, D-GridMST is characterized by small network transferring overhead. Experimental results show that D-GridMST is effective since it is able to produce exactly the same clustering result as that produced in centralized paradigm. These advantages are believed to make D-GridMST a promising tool for clustering large distributed spatial databases. 2
MST (MINIMUM SPANNING TREE) CLUSTERING
MST clustering is the best-known graph-theoretical divisive clustering algorithm. Given n points, a MST is a set of edges that connects all the points and has a minimum total length. Deletion of edges with larger lengths will subsequently generate a specific
777
778
number of clusters. The advantages of using MST for clustering are that MST is very suitable for dealing with the dataset featuring arbitrary shapes and the clustering result of MST is stable and not sensitive to the input order of the data points. In addition, the MST method can achieve a very promising accuracy especially when the size of data needed to be clustered is small and the data is clear (free of outliers or noises). Figure 1 shows an example of MST of a number of points.
• (a)
•
»
•
(b)
Figure 1 : A set of points and its corresponding MST Example 1. Given a set of points (Figure 1(a)), the corresponding MST is shown in Figure 1(b).
3
GRIDMST
GridMST clusters spatial databases in a number of steps as follows: (1) Construction of MST of representative points of the spatial database; (2) Clustering representative points using MST clustering method; (3) Label the points in the spatial database. 3.1
Construction of MST of representative points of the spatial database To construct the MST of representative points, a hyper-cube data structure is constructed whereby each point in the dataset is assigned to one and only one cell in the hype-rectangle. The density of each hyper-cube cell is then computed. If the density of a hyper-cube cell exceeds some user-specified threshold, then the cell is considered to be dense and the median of the points belonging to the dense cell is selected as the representative point of this cell. Once the representative points have been selected, a graph-theoretic based algorithm is used to build the MST. 3.2
Clustering representative points using MST clustering method When the MST of representative points has been constructed, it can be easily partitioned according to user's clustering requirements. For instance, if user wants to cluster the spatial database into k clusters (k here is an user-specified parameter), the MST will be partitioned by cutting its k-\ longest edges. 3.3
Label the points in the spatial database When the representative points of the spatial database has been clustered, clustering labeling is performed in this last step to cluster the whole spatial database. Two labeling strategies are used: 1) For the points falling into one of the dense cells, it will share the same cluster label as the representative point of this dense cell; 2) For the points falling into one of the non-dense cells, it will have the same cluster label as that of its nearest representative point.
779
4
D-GRIDMST (DISTRIBUTED-GRIDMST)
4.1
Local data model vs. global data model To enable D-GridMST to produce the clustering result of these distributed databases that is comparable to result of a single database, globalization of local data model is entailed to capture the cluster features for the whole spatial database. (1) Globalize range of every dimension of data in each distributed site Global range of every dimension of the data is required to construct a global hypercube structure, making sure that the hypercube constructed is able to encapsulate all the data points stored in the distributed sites. To this end, all distributed sites are required to provide the central site with the information regarding the maximum and minimum values, i.e. the range, of every dimension of local data points. (2) Globalize local occupied cells in each distributed site Here, the occupied cells refer to the cells that occupied by the data points in the database. The global occupied cells serve as the potential pool for the selection of dense cells: the dense cells are only the occupied cells whose neighborhood density exceeds some threshold. The global occupied cells are the union of local occupied cells. 4.2
D-GridMST Algorithms Table 1 presents the detailed algorithm of D-GridMST. Step
Transfer/Location
1
DS—>CS
2
CS
3
cs—•DS
4
DS
Operation Transfer local range of every dimension of data Globalize local range to global range Transfer global range (global hypercube) Assign local points into the hypercube and compute the density
5
DS—»-CS
Transfer local occupied cells and their densities
6
CS
Globalize local occupied cells of the hypercube
7
CS
Generate representative points and construct MST
8
CS
Perform clustering using MST
9
C S _ * . DS
10
DS
Transfer clustering result of representative points Label local data points Table 1. Algorithm ofD-GridMST
Meaning
Value
Clustering operations in centralized site
CS
Clustering operations in all the distributed sites
DS CS—*DS
Data are transferred from centralized site to all the distributed sites
DS—*" CS
Data are transferred from all distributed sites to the centralized site
Table 2. Annotations of the Transfer/location field in Table 1
780
5
EXPERIMENT AND DISCUSSION
Experiments have been conducted to evaluate the effectiveness of D-GridMST in clustering distributed spatial databases. Synthetic datasets are generated using dataset generator. The main focus of our experiments is to see whether D-GridMST is able to, with only small data transferring via network, yield the same clustering as that produced in a centralized scenario. The results of centralized clustering using GridMST and distributed clustering using D-GridMST on the synthetic datasets are illustrated in Figure 2 and 3. Experimental results show that D-GridMST is effective since it is able to produce exactly the same clustering result as that produced in centralized paradigm. These results verify that D-GridMST is a promising clustering algorithm that achieves very good performance when working on distributed spatial databases.
(a)
(b)
Figure 2. Result of Dataset 1
(a)
(b)
Figure 3. Result of Dataset 2
(a): the result of centralized clustering using GridMST; (b): the result of distributed clustering using D-GridMST
Finally, spatial database is subject to frequent changes and updates, thus a future research direction will be making D-GridMST allow for dynamic distributed spatial database. Instead of performing naive re-clustering, D-GridMST is expected to dynamically deal with these updates and perform clustering efficiently. REFERENCES [1]
I.S. Dhillon, and D.S. Modha. "A Data-Clustering Algorithm On Distributed Memory Multiprocessor", Large-scale Parallel KDD Systems, eds. M. Zaki and C. Ho, (1999), pP245-260. [2] G. Forman, B. Zhang, "Distributed Data Clustering can be Efficient and Exact", HPL Technical Report HPL-2000-158. Also appears in SIGKDD Explorations, (2001). [3] K. Johnson, and H. Kargupta. "Collective, Hierarchical Clustering from Distributed, Heterogeneous Data. Large-scale Parallel KDD Systems, eds. M. Zaki and C. Ho, (1999), pp221-244. [4] H. Kargupta, W. Huang, S. Krishnamoorthy, and E. Johnson. "Distributed Clustering Using Collective Principal Component Analysis", Knowledge and Information Systems Journal: Special Issue on Distributed and Parallel Knowledge Discovery, (2000).
MASSIVELY PARALLEL SEQUENCE ANALYSIS WITH HIDDEN MARKOV MODELS BERTIL SCHMIDT School of Computer Engineering, Nanyang Technological University, Singapore E-mail: [email protected]
639798
HEIKO SCHRODER School of Computer Science and Information Technology, RM1T, Melbourne, E-mail: [email protected]
Australia
Molecular biologists use Hidden Markov Models (HMMs) as a popular tool to statistically describe protein sequence families. This statistical description can then be used for sensitive and selective database scanning. Even though efficient dynamic programming algorithms exist for the problem, the required scanning time is still very high, and because of the exponential database growth finding fast solutions is of high importance to research in this area. In this paper we illustrate how massive parallelism can be used for efficient sequence analysis using HMMs. We present two new techniques to parallelize the dynamic programming calculation: "diagonal-by-diagonal" and "row-by-row". This leads to significant runtime savings on our hybrid parallel system based on commodity components to gain high performance at low cost. The architecture is built around a coarse-grained PC-cluster linked by a high-speed network and fine-grained SIMD processor arrays connected to each node.
1
Introduction
Scanning sequence databases is a common and often repeated task in molecular biology. The need for speeding up this treatment comes from the recent developments in genomesequencing projects, which are generating an enormous amount of data. This results in an exponential growth of the bio-sequence banks: every year their size scaled by a factor 1.5 to 2. The scan operation consists in finding similarities between a particular query sequence and all the sequences of a bank. This operation allows biologists to point out sequences sharing common subsequences. From a biological point of view, it leads to identify similar functionality. However, identifying distantly related homologues is still a difficult problem. Because of sparse sequence similarity, commonly used comparison algorithms like BLAST or Smith-Waterman often fail to recognize their homology. Therefore, Hidden Markov Models (HMMs) have become a powerful tool for high sensitivity database scanning, because they can provide a position-specific description of protein families. HMMs can identify that a new protein sequence belongs to the modeled family, even with low sequence idendity [2]. An HMM can be compared with a protein sequence by dynamic programming based alignment algorithms, such as the Viterbi algorithm, whose complexities are quadratic with respect to the sequence and model length. Basically, there are two methods to parallelize HMM database scanning: one is based on the parallelization of the dynamic programming calculation, the other is based on the distribution of the computation pairwise comparisons. Fine-grained parallel architectures, like linear SIMD array, have been proven as a good candidate structure for the first approach, while more coarsegrained networks of workstations are suitable architectures for the second [1]. This paper presents a new approach to high performance HMM database scanning that combines both strategies in order to achieve even higher speed. We have designed
781
782
massive parallel versions of the Viterbi algorithm that are tailored to fit the characteristics of our hybrid parallel architecture. Their implementation is described on our hybrid system consisting of Systola 1024 cards within the 16 PCs of a Beowulf cluster [4]. The rest of this paper is organised as follows. In Section 2, we introduce the Viterbi algorithm used to align an HMM with a protein sequence. Section 3 provides a description of our hybrid system. The new parallel algorithms and their mapping onto the hybrid architecture are explained and evaluated in Section 4. Section 5 concludes the paper with an outlook to further research topics. 2
Viterbi algorithm
The structure of an HMM to model a set of biologically similar protein sequences (a protein family) is shown in Figure 1. It consists of a linear sequence of nodes. Each node has a match (M), insert (I) and delete state (D). Between the nodes are transitions with associated probabilities. Each match state and insert state also contains a position-specific table with probabilities for emitting a particular amino acid. Both transition and emission probabilities can be generated from a multiple sequence alignment of a protein family.
Figure 1. The transition structure of an HMM of length 4. Squares represent match states, circles represent delete states and diamonds represent insertions.
An HMM can be compared (aligned) with a new protein sequence to determine the probability that the sequence belongs to the modeled family. The most probable path through the HMM that generates a sequence similar to the new sequence determines the similarity score. The well-known Viterbi algorithm computes this score by dynamic programming. The computation is given by the following recurrences: M{i-\,j)
+
tr(MrIj)
M ( / , 7 - l ) + »-(M,_,,£>,)
/(/, j) = e(I j, s,) + max / ( i - l , j ) + fr(/y,/y) D(i,j) = max<WJ-V D{i-\,j) + tr{Drlj) DVJ-V 'M{i-\,j-\) M (i, j) = e(M j,si) + max I{i-\,j-\) D{i-l,j-l)
+ +
trUj-hDj) triDj^D,)
+ tr{Mt_x,M,) + +
tr{Ihl,Mt) tr(DM,Mj)
where tr{state\,state!) is the transition cost from state 1 to state! and e(Mj,Sj) is the emission cost of amino acid Sj at state Mj. M(ij) denotes the score of the best path matching subsequence s\...s, to the submodel up to state j , ending with s, being emitted by state Mj. Similarly I(iJ) is the score of the best path ending in st being emitted by /,, and, D(ij) for the best path ending in state Dj. Initialization and termination are given by M(0,0)=0 and M(n,m+\) for a sequence of length n and an HMM of length m.
783
3
The hybrid architecture
We have built a hybrid MIMD-SIMD architecture from general available components (see Fig. 2). The MIMD part of the system is a cluster of 16 PCs (Pentiumll, 450 MHz) running Linux. The machines are connected via a Gigabit-per-second LAN (using Myrinet M2F-PCI32 as network interface cards and Myrinet M2L-SW16 as a switch). For application development we use the MPI library MPICH v. 1.1.2.
T: +: +:^r +:^:^r^r
^
Systola 1024 board architecture
t t t t t t t t High speed! Myrioet switch
0717TT
tXIxxJj-J Figure 2. Architecture of our hybrid system: a cluster of 16 PCs with 16 Systola PCI boards (left). The data paths in Systola 1024 are depicted on the right.
For the SIMD part we plugged a Systola 1024 PCI board into each PC. Systola 1024 contains an Instruction Systolic Array (ISA) of size 32x32. The ISA [3] is a meshconnected processor grid, where the processors are controlled by three streams of control information: instructions, row selectors, and column selectors. The instructions are input in the upper left corner of the processor array, and from there they move step by step in horizontal and vertical direction through the array. Every processor has read and write access to its own memory. Besides that, it has a designated communication register (C-register) that can also be read by the four neighbor processors. The ISA combines advantages of fine-grained SIMD machines with the capability of efficiently performing so-called aggregate functions. These are associative and commutative functions to which each processor provides an argument value. Examples for aggregate functions are broadcast, ringshift, sum and maximum along the rows or columns of the processor array. These are the key operations within the algorithm presented in the next section. 4
Mapping onto the hybrid architecture
Mapping of the database scanning application on our hybrid computer consists of two forms of parallelism: a fine-grained on Systola 1024 and a coarse-grained on the PCcluster. While the Systola implementation parallelises the dynamic programming computation in the Viterbi algorithm, the cluster implementation splits the database into pieces and distributes them among the PCs using a suitable load balancing strategy. In Fig. 3 we are presenting two ways to map the dynamic programming calculation to a linear array of processing elements (PEs): "diagonal-by-diagonal" and "row-by-row".
784
Both mappings can be efficiently implemented on the ISA taking advantage of the broadcast, ringshift and maximum aggregate functions described in Section 3 (see full version of this paper for details). Table 1 reports times for scanning the SwissProt databank for HMMs of various lengths with the Viterbi algorithm. A single Systola 1024 board is 3 to 4 times faster than a single Pentium III, 1GHz. However, a board re-design based on state-of-the-art technology (Systola has been build in 1994) would make this factor significantly higher. (a) protein N o d e x sequence [ I ...P S —»| R J
mde
r^
2
m/lM
Node m
]
»( W j-»
j
j
(b) protein N f > d e sequence
H F I
-P S
1
N
2
^ ^
Noda
m
[jf
Figure 3. Parallelization of the Viterbi algorithm on a linear processor array. Each node of the HMM is assigned to one PE. In (a) the sequence is shifted systolically through the PEs. In each step, the computation of a single diagonal in the dynamic programming matrices M(ij), I(ij) and D(ij) is performed in parallel ("diagonal-by-diagonal"). In (b) a character of the sequence is broadcasted through the array in each step. The computation of a single row in the dynamic programming matrices is performed in parallel ("row-by-row"). Table 1. Scan times in seconds of SwissProt (release 40, 113997 protein sequences) for various HMM lengths with the Viterbi algorithm on Systola 1024, a PC cluster with 16 Systolas and a Pentium III 1 GHz.
~HMM length Systola 1024 (speed up) Cluster of Systolas (speedup) Pentium III 1 GHz 5
~
Th 2(3) (37) 478
" 222 288(3.5) 22 (45) 994
"~"^^ 546 (4) 40 (56) 2243
Conclusions and Future Work
In this paper we have demonstrated that hybrid computing is very suitable for scanning bio-sequence databases with HMMs. By combining the fine-grained ISA parallelism with a coarse-grained distribution within a PC-cluster, our hybrid architecture achieves high performance at low cost. We have presented the design of two ISA algorithms that lead to a high-speed Viterbi implementation on Systola 1024. The exponentially growth of genomic databases demands even more powerful parallel solutions in the future. Because comparison and alignment algorithms that are favored by biologists are not fixed, programmable parallel solutions are required to speed up these tasks. As an alternative to special-purpose systems, hard-to-program reconfigurable systems, and expensive supercomputers, we advocate the use of specialized yet programmable hardware whose development is tuned to system speed. References 1. Grate L., Diekhans, M., Dahle D., Hughey R., Sequence Analysis with the Kestrel SIMD parallel processor, Pacific Symposium on Biocomputing (2001) pp. 263-274. 2. Krogh, A., et al., Hidden Markov models in computational biology: applications to protein modeling, JMB 253 (1994) pp. 1501-1531. 3. Kunde, M., Lang, H.-W., Schimmler, M., Schroder, H., Schmeck, H., The ISA and its relation to other models of parallel computers, Parallel Comp. 7 (1988) pp. 25-39. 4. Schmidt, B., Schroder, H., Schimmler, M., A hybrid architecture for bioinformatics, Future Generation Computer Systems 18 (2002) pp. 855-862.
TABU SEARCH AND SIMULATED ANNEALING ON THE SCHEDULING OF PIPELINED MULTIPROCESSOR TASKS M. F I K R E T E R C A N Singapore Polytechnic, School of Electrical and Electronic Engineering, E-mail: [email protected]
500 Dover Rd.,
Singapore
YU FAI FUNG The Hong Kong Polytechnic
University, Department of Electrical Engineering, Kowloon, Hong Kong S.A.R. E-mail: [email protected]
Hung Horn,
Parallel computers that can run multiple parallel algorithms simultaneously are generally targeted for applications where operations are periodic. The performance of those systems however considerably depends on the efficient scheduling of tasks. This paper evaluates the solution quality of the two metaheuristic algorithms developed for scheduling multiprocessor tasks on these systems. The reduction achieved over the completion time of these tasks by the metaheuristics has been studied for various task parameters and machine configurations.
1
Introduction
In many real-time applications, such as machine vision, robotics, and power system simulation, two main characteristics of parallelism co-exist. These are spatial parallelism and temporal parallelism[l]. In order to speed up operations these algorithms performed on multiprocessor machines. However, this unique computing structure can be better exploited if it is executed on a multiprocessor environment that can execute multiple parallel programs simultaneously. This class of computers can be named as multi-programmable systems (MPSs) and they are made of either a pool of processors that can be partitioned into processor clusters or processor arrays prearranged in multiple layers [1,2]. A MPS platform can simultaneously execute a number of parallel algorithms on independent processor arrays and provides data exchange between them. Hence, by making use of spatial parallelism, algorithms can be split into smaller grains and when computations are repetitive, temporal parallelism can be exploited. This results in a set of pipelined multi-processor tasks (MPTs) to be performed on a MPS. That is algorithms at each level of precedence relation tree can be mapped into a processing layer of a MPS and executed simultaneously to create a pipelining effect. A single pipeline, made of MPTs, will be named as a job and we assume that there is no precedence relation between them. This paper deals with the problem of efficient scheduling of multiple jobs on a MPS. The objective of
785
786
the scheduling is to find a sequence of jobs that can be processed on the system in minimum time. Two metaheuristic algorithms were introduced for the solution and their performances are evaluated. 2
Simulated Annealing and Tabu search algorithms
A frequently used algorithm is Simulated Annealing (SA), which simulates annealing during search process [4]. Tabu search (TS) is another local search method, which is guided by the use of adaptive memory structures [3]. In order to, apply SA and TS to a practical problem several decisions have to be made and they are briefly described in the following. For SA, an initial solution is generated by setting all jobs in ascending order of their indices. A neighbour of the current solution is necessary to define. We employed interchange neighbourhood, which swaps two randomly chosen jobs in the job list. We found that this method performs well compared to others that we have experimented. A simple cooling strategy is employed in the algorithm. Temperature is decreased through an exponential manner with Tt = XTt_{ where X < 1. In our implementation a X value of 0.998 is selected following a series of experiments. The initial value of temperature is selected using T0 =
— where AEavg is the ln(x0) average increase in the cost for a number of random transitions. Initial acceptance ratio, x0, is defined as the number of accepted transitions divided by the number of proposed transitions. Initial temperature estimated after 50 randomly permuted neighbourhood solution of the initial solution. For the TS algorithm, we employed interchange neighbourhood. In the tabu list, we keep a fixed number of last visited solutions. Tabu list updating uses the elimination of the farthest solution stored in the list method. We have employed a fixed number of iterations as stopping criterion for both algorithms. Objective function is defined as, the minimal value obtained for the completion time of all jobs for both algorithms. 3
Computational study
During computational experiments, the stopping criterion for heuristics was defined as a fixed number of solutions visited to make sure a comparable computational effort committed by each heuristic. This number has been set at 5300. Problem sets were generated randomly for various processing time ratios and different processor configurations. For each combination of
787
processing time ratio and processor configuration of the architecture 25 problems were generated. Results are presented in terms of Average Percentage Deviation (APD) of solution from the lower bound, which is shown in our earlier study [2]. For comparison, a list-based heuristic (LH) which sorts jobs in ascending order of their indexes is also considered. The APD of each heuristic is presented in Table 1. From the computational study, we can conclude that in most of the cases, SA and TS found a reasonable solution and the completion time of jobs were minimized as high as 81 percent and as minimum as 14.5 percent. Figures 1 and 2 illustrate the convergence curves of both algorithms for some selected problems. The performance achieved by both SA and TS were quite similar, though, in most of the cases SA delivered a slightly better result. In addition, when the convergence curves of both algorithms are analysed, it can be seen that TS converges more slowly than SA. In most of the cases, SA converged to a reasonable solution within 500 iterations for ml = m2 processor configuration while TS within 1000 iterations. Simulated Annealing J=10
Simulated Annealing J=10
—•— m1=2,m2=2 *
m1=4,m2=4
A
m1=8,m2=8
• -
o15 Q.
< 10 5 i - - • - -m " i t • A > J - » - • • • • • * * - - • - - - - -
W T
>V^^_,
0 c o m m T j - ( D c o o u i r o < o ^ TTTcvj m m
Figure 1. Convergence of Simulated Annealing.
<
30 -m1=2,rre=l
- * — m1=2,rr£=2
-m1=4,rre=2
—*—m1=4,nn2=4
EX
-m1=8,rr£=4
5£
15 10
—•—m1=8,rr£=8
(%
°20
Tabu Search J=10
Tabu Search J=10
3
45 40 35 30 £25
Q.
<
^-*
5
•—- •1
f\.»v.
f
^
0-
0
CM 1-
1-
Q
CO
le rations
SI 5 S 8 co T3 to 01
Figure 2. Convergence of Tabu Search.
co
co
m
1-
10
788 Table 1. Average percentage deviation of each heuristic algorithm
Job
Machines 2:1 LH
4
SA
TS
Machines: 4:2 LH
SA
TS
Machines 8:4 LH
SA
TS
10 30 50
13.93 0.94 1.14 0.52 0.47 4.05 0.44 0.42 2.89 Machines 2:2 LH SA TS
24.62 4.21 5.06 12.29 2.64 3.41 8.98 2.3 1.71 Machines: 4:4 SA LH TS
35.87 10.1 10.7 16.96 4.65 5.82 13.19 4.53 5.59 Machines 8:8 LH SA TS
10 30 50
21.04 13.95 9.9
27.7 16.51 11.86
32.03 20.66 16.2
2.62 2.49 2.69
3.28 3.25 3.06
7.07 3.5 4.2
8.49 4.69 4.79
8.88 7.81 7.0
9.21 9.32 7.96
Summary
In this paper, job-scheduling problem on a MPS is considered. A job is made of interrelated MTSs and modelled with its processor requirement and processing time. Two metaheuristics have been implemented for the solution of this problem and their performance were evaluated based on their capacity to shorten overall schedule. The results show that metaheuristics provided a significant improvement. In our further studies, a more generic instance of the problem will be explored. In addition, a comparison with another well-known metahuristic, genetic algorithm, will be conducted. References 1. Cantoni V. and Ferretti M., Pyramidal Architectures for Computer Vision (Plennium Press, New York, 1994). 2. Ercan M. F., Oguz C. and Fung Y. F., Scheduling image processing tasks in a multi-layer system, Computers and Electrical Engineering 27/6 (2001) pp. 429-443. 3. Glover F., Taillard E. and de Werra D., A user's guide to tabu search, Annals of Operations Research 41 (1993) pp. 3-28. 4. Kirkpatrick, S., Optimization by simulated annealing - Quantitative studies, J. Stat. Phys. 34 (1984) pp. 975-986.
TME - A DISTRIBUTED RESOURCE HANDLING TOOL
TOSHIYUKIIMAMURA, YUKIHIRO HASEGAWA AND NOBUHIRO YAMAGISHI Center for Promotion of Computational Science and Engineering, Japan Atomic Energy Research Institute, 6-9-3 Higashi-Ueno, Taitoh-ku, Tokyo 110-0015, Japan E-mail: (imamura, hasegawa, yama}@ koma.jaeri.go.jp HIROSHI TAKEMIYA Hitachi Tohoku Software, Ltd., 2-16-10 Honcho, Aoba-ku, Sendai, Miyagi 980-0014, Japan E-mail: [email protected] TME, Task Mapping Editor, is developed for both handling distributed resources and supporting the design of a distributed application. On the TME console, a user can design a workflow diagram of the distributed application like a draw tool. Since all resources are represented as icons on the TME naming space and data-dependency is defined by a directed arrow linking icons, TME realizes a higher-level view of schematizing the structure. Furthermore, it has an importing mechanism for user-defined applets and some valuable built-in applications. TME provides users greatflexibilityin distributed computing.
1
Introduction
The progress of distributed processing is permeating successively with the spread of Grid [1] and development of the globus toolkit [2]. On the Grid environment, geographically dispersed resources, machines, databases and experiments can be handled uniformly with higher abstraction. Simplification of the interface, handling any resources, and construction of the coupled services become important issues for the next step. The visual support that a GUI offers is more efficient than a script. It supports not only intuitive understanding but also design of the combination of multiple resources. In addition, it enables users to detect the structural bottleneck and errors via real-time monitoring of execution circumstance. Some projects that intend a GUI-based steering and monitoring on the Grid environment are well known. WebFlow [3] is a one of the pioneers of visual designing tools for distributed computing. GridMapper [4] is also one of GUI tools, and it focuses on the monitoring of geographical network. UNICORE project [5] intends to connect multiple computer centers and it supports to illustrate the dependency among data and jobs in order to realize a meta-computing facility. From these trends of the related works, common features, visual design, automatic execution, monitoring, and flexible integration framework can be considered as the key technologies for increasing usability of the Grid.
2
TME (Task Mapping Editor)
The Task Mapping Editor, TME, was originally developed as a component of STA, Seamless Thinking Aid project [6], STA consists of three components, the development environment, communication infrastructure and TME. The goal of TME is to support combining and allocating distributed resources, and to provide users with higher level viewing of the structure of applications and so on. The basic architecture of TME layers four structures, GUI, control, communication, and adaptor layers, i) The GUI layer faces user and some components for design and
789
790
monitoring are facilitated, il) The control layer comprises five components, Proxy Server (PS), Tool Manager (TM), File Manager (FM), Exec Manager (EM) and Resource Manager (RM), and they perform the communication control between distributed machines, the management of any components on TME, handling of distributed files, executing and monitoring the application on local/remote machines, monitoring the state of batch systems and the load of interactive nodes, respectively, iii) The communication layer relays the RPC-based requests from the control layer to the adapter layer deployed on distributed machines via PSs. iv) The adaptors play a role of an outlet between RPC called from EM and the existing applications. Rest of this section shows the features of TME corresponding to four trend issues that are shown previously. 2.1
Visual Design of a Distributed application
TME is an editor that supports to draw specifications of distributed application graphically as shown in Figure 1 and 2. On the TME console, any resources, for example, data files, programs and devices, are expressed as icons with higher abstraction on the TME naming space. By linking program icons and data icons, users can specify data dependencies and execution order. Thus, TME requires users little knowledge of distributed architecture and environment, and they can handle the resources as on a single computer. Parallelism and concurrency can also be taken advantage of the nature of the data-flow diagram. Consequently, TME supports the intuitive understanding for the structure of distributed applications. 2.2
Supporting Automatic Job Submission
Users can choose computers on which they submit jobs interactively. According to the execution order of programs and required data files as defined on the TME console, the RM and EM assign and invoke programs, and transfer data onto specified machines. Complex applications that comprise multiple executables and a lot of data transfer can be encapsulated and automated on the TME system. 2.3
Job Monitoring and Scheduling
EM watches the state of the application and it sends a report to the TME control system, thus user can monitor the application virtually on real-time. On the monitoring console, any resources and dependency are represented as icons and lines as well as the design console. Thus, it helps detecting structural bottlenecks. EM and RM also coordinate semi-automatic resources mapping. When user registers his or her scheduling policy and priority, the user can take advantage of efficient allocation and reservation of the resources. This function optimizes the economical restriction under the circumstance where accounting is a severe problem. 2.4
Flexible Integration Framework
On the TME environment, various kinds of built-in components are integrated. Main built-in components are 'blocking 10', 'pipelining 10', 'tee (duplicate the output stream)', 'container (sequential list of input/output files)', 'text browser', 'GUI layout-composer', 'simple data-viewer', and so forth. This mechanism is realized by the development
791 framework of the STA. The applet developed by a user is also registered as a new built-in component and it can be shared among other users. In addition, the GUI layout composer separates the design phase and the execution phase from the task of an application user. It means that TME facilitates a common frame or a collaborative space between the developer and the application user. 3
Examples
We designed several applications by using TME. In this section, three applications are presented as examples of TME applications. 1. The radioactive source estimation system requires high response capability, speed and accuracy [7], The core simulations should be carried out on the most calculation servers available and on the most sites available in order to minimize the calculation time. This application is designed as Figure 1 (left) on TME. Users can easily specify spawning slave processes over distributed sites by linking master and slave icons. 2. Data analysis for nuclear experiments also requires an acquisition of light-loaded computing resources. In this case, huge observed data is recorded to the DB server at every shot and physicians analyze the data that they want to find specific phenomena. This work is also formulated on TME as Figure 1 (right). 3. In bioinformatics field, one of typical analysis methods is to search the specified genome pattern from huge amount of DNA data. However, it is realized by the combination of a lot of applications and databases. On the TME console, such a complex procedure can be defined as Figure 2 and the user may simplify the whole process. The integration of the applications and database may further reduce the burden of bioinformatics analyses.
Figure 1. Data-flow of the radioactive source estimation system and the data analysis for nuclear experiments
Figure 2. Data-flow of the genome sequence analysis system and a snapshot of a visualization tool
792
4
Discussions and Current Status
Cooperation with the Grid middleware is also one of great interests for developers and users. Currently TME uses the nexus library developed by ANL, the predecessor of globus-nexus in the globus toolkit (GTK). Since in GTK version 2 or later, communication and security framework are shifting to that based on OGSA, future release of TME will adopt a globus-based communication infrastructure and collaborate with many Grid services. The main feature of TME is the description by data-flow; however, the control-flow mechanism has been introduced in the latest version of TME for the expert developers. This extension supports the structure of conditional branch and loop, and it enables the development of more advanced applications. Authors believe that it can contribute to the development of advanced PSE (Problem Solving Environment) on a distributed environment, which is one of ultimate goals for Grid computing. For definition of the distributed application, TME adopts a subset of XML. XML is used for the description for universal documents; however, it is also a powerful tool for the definition of a distributed application. Using the XML format suggests that other tools like an emulator, a debugger, and a scheduler share the TME components. This extension is a significant issue for the next stage of TME. 5
Conclusions
TME supports a design and handling of the computational resources distributed over several sites. It has a higher-level view of schematizing the structure and it helps the intuitive understanding of applications. Automatic submission and monitoring can improve efficiency of jobs. In addition, the framework that makes the integration and co-operation with various user defined functions suggests the possibility of the collaboration environment and PSE. TME is going to improve the scalability and reliability. Authors would like to contribute the advancement of Grid computing through the development of TME. Finally, authors would like to thank Dr. Kei Yura and Prof. Dr. Hironobu Go for their support in construction of bioinformatics applications and databases. References 1. Foster I. and Kesselman C. eds., The Grid: Blueprint for a Future Computing Infrastructure (Morgan, 1999). and activities of GGF in http://www.globalgridforum.org/ 2. The Globus home page, http://www.globus.org/ 3. Bhatia D., et al. "WebFlow - a visual programming paradigm for Web/Java based coarse grain distributed computing", Concurrency - Practice and Experience 9(6) (Wiley, 1997) pp. 555-577. 4. Allcock W. et al., "GridMapper: A Tool for Visualizing the Behaviour of LargeScale Distributed Systems". IEEE HPDC-11, Edinburgh (2002) pp. 179-187. 5. Erwin D.W, "UNICORE - A Grid Computing Environment", Concurrency and Computation: Practice and Experience (Wiley, to appear 2002) 6. Takemiya H., et al., "Software Environment for Local Area Metacomputing", Proc. Intl. Conf. SNA2000, Tokyo (2000). 7. Kitabata and Chino: "Development of source term estimation method during nuclear emergency", Proc. Intl. Conf. M&C99, Madrid (1999).
PROTECTING INTEGRITY IN A DISTRIBUTED COMPUTING PLATFORM TAY TENG TTOW, CHU YINGYI Department of Electrical and Computer Engineering, National University of Singapore, [email protected],
[email protected]
This paper proposes a software detection scheme based on a Self-Signature technique, to address the integrity protection problem in an open Internet distributed computing platform. While it is not possible to guarantee complete protection, under the scenario that a malicious host has full access to an executing program, we analyze our proposed Self-Signature method in terms of the detection probability for such malicious behaviors. The strength of the protection is discussed in detail.
1
Introduction
The distributed computing systems of [1], [2], [3] provide platforms that support Internet distributed computing. As these distributed computing designs operate in an open and heterogeneous environment of the Internet, security is a key issue and is a significant hindrance to their applicability. There are two separate issues. The first is the protection of the servers from malicious programs. The second issue, the protection of the distributed program, can be divided into two parts. The first part is secrecy, which requires the content of data and semantic of codes in the program be shielded from the remote server. This risk is reduced somewhat by ensuring that no one server receives the complete set of private information. Traditional encrypted and authenticated channels are also utilized to enforce this kind of protection. The second part is on integrity, which requires the un-tampered execution of the distributed program by the un-trusted servers. Violation of integrity must be detected as early as possible to recognize malicious servers and avoid using the wrong results. In this paper we focus on this aspect of the security problem. The programs to be distributed to un-trust servers are incorporated with a Self-Signature algorithm before distribution, so that the re-casted program is able to perform interactive self checking throughout its execution on un-trusted servers. The proposed integrity protection scheme effectively reduces the risks of both purposeful and random tampering. 2
Integrity Protection in a Distributed Computing Platform
The computation procedure involves the originating client host and the
793
794
server hosts, which are recruited to perform its tasks. The task is divided into several parallel objects, each of which is sent to a server to execute. During task execution, the client periodically requests servers to send back intermediate image of the program. The image includes the state of the task, that is, the current status of the program and all intermediate results. These information allow the host to restart the execution of the program from that point. To deal with the integrity problem, integrity protection methods are introduced to assuring the un-tampered execution of the distributed programs. There are three general approaches; organizational, hardwarebased, and software-based solutions. Our proposed scheme is based on software detection of integrity violation. When an originating host receives result returned from clients, it decides if the client had behaved maliciously on the assigned program and whether the returned results are trustable. The detection of tampering is based on a Self-Signature method discussed next. 3
Detection of Tampering
This is done through a Self-Signature technique. Every distributed bytecode is incorporated with the Self-Signature algorithm. The bytecode uses its own codes as input to calculate the signature. Any distributed bytecode executes the Self-Signature algorithm in every phase of its execution, in addition to the ordinary computations. The computed signatures are returned to the client host in the intermediate image at the end of each phase. The client host will check these signatures for their correctness. If some parts of the bytecode which is tampered are reflected in signatures, an inconsistency would be observed. Self-Signature algorithms are inserted into the distributed bytecode in a pre-processing of the original bytecode. Thus the calculation of the signatures becomes a part of the bytecode to be distributed to server hosts. Let the tampering behavior of a certain potentially malicious client be characterized by a probability vector pt = [pt (1), ••-,/?, (")], where pt (/) is the probability that the ith bit of the bytecode is changed. Essentially, there is an independent binary random variable ti associated with each bit i of the program, i = \,--,n . The two values of tj, 0 and 1, represents the bit is not tampered and tampered respectively. Let us define the tampering
795
intensity of the client as T = ^ pt (i). Let us represent the property of a Self-Signature algorithm S by a vector Ps = [PsO)>'">Ps(n)]> where ps{i) is the probability that the ith bit of the bytecode, if changed, will be reflected in the signature using the algorithm S. Similar to the probabilistic property of tampering behavior, there is an independent binary random variable st associated with each bit i of the program, i = 1, • • •, n. Probability ps (/) represents the probability that the binary random variable st = 1, i = \,--,n. Let us assume that ps (1) = • • • = ps (n) = ps. We further assume random variables t( are independent of the self-signature property Sj. We define the Detection Probability (Pa) of Self-Signature algorithm as the probability that the algorithm detects the tampering of the program, given the probabilistic property of tampering behavior. For the SelfSignature method, the detection probability is the probability that the tampering of a program is reflected in the signature. Lemma 3.1: If the probability that each bit is tampered is Pt = [/ ? ( (l)> - " 5 Pr(")]> a n d the probability of each bit to be reflected in signature is ps, then the Detection Probability Pd is:
1-ftp-P,/>,(0] P„=
~n
-(3-D
i-Ilti-^CO] 1=1
Theorem 3.1: Given the probability property of signature procedure ps and the length of program n, for any tampering behavior of intensity T, the lower bound of the Detection Probability Pd is 1-(1 —)". n For example, for a program of 5k byte if ps = 0.5, and the tampering intensity T is 10, namely, the expectation of the number of tampered bits is 10, then the lower bound is greater than 0.993. 4
Integrity Protection Scheme
The key of the protection scheme is the insertion of a Self-Signature algorithm into the distributed bytecode. A procedure that implements a Self-Signature algorithm is inserted into every phase of the program. This
796
is done in a pre-processing of the distributed bytecode. In each phase of the execution, the transformed bytecode will perform both the workload computation and the signature calculation. At the end of each phase, the client host will request the client to return the current status of the execution. The signature computed from the Self-Signature algorithm is also included in the image among other intermediate results of this phase. All the signatures can be re-calculated at the client host to check for correctness. The Self-Signature algorithm is a function / from the bytecode to a signature value. We use the Crypto-Computing technique of [9] to encrypt the Self-Signature function such that the function cannot be determined in polynomial time. Crypto-Computing theories deal with the problem of "non-interactive evaluation of encrypted functions (EEF)". It ensures security under the constraint that the function/is a polynomial in Z/NZ as follows. "Let E be an additively homomorphic encryption scheme on Z/NZ." Then the above method "realizes non-interactive EEF for polynomials / e Z/NZ[XV---,XS]. Assume further that the used encryption scheme (E) is polynomial time indistinguishable. Then no information about f except its skeleton." {Skeleton of a polynomial / is the set of monomials of / with coefficients.) The above scheme using achieves the Detection Probability of 3.1 only if the two assumptions of Section 3.2 are satisfied. For the first assumption that the probabilities ps{i) for each
is leaked non-zero Theorem bit to be
reflected in signature are equal, this can be satisfied if the Self-Signature algorithms are chosen randomly. Note that for each Self-Signature function f{{bx,---,bn)), there is a counterpart function for each possible permutation f((b,---,b)).
Q> ,---,b
)
of the input bits, that is, the
function
In our scheme the client host is able to satisfy this
assumption by randomly selecting the Self-Signature functions. The second assumption is that the tampering behavior (random variables, t{) is independent of the Self-Signature property (random variables, si), is satisfied if we have P{ti = l]^. = 1) = P{ti = 1) = pt (i). When a SelfSignature function has acquired all the necessary parameters and is ready to run, whether each bit is to be reflected in signature or not is deterministic, that is, Sj equals either 0 or 1. However, if the server does
797
not know the values of Sj when it is tampering the program, then the above assumption P(ti -Isj
= 1) = P(ti = 1) is satisfied.
A malicious server host can acquire the above information either before execution or during runtime. Firstly, the server host is able to analyze the distributed program or conduct pre-computation of the program before running it for the client host. Our scheme prevents the server from obtaining any information about Sj prior to execution because the SelfSignature function parameters are sent to the server only at runtime. Secondly, the server host may analyze the Self-Signature function at runtime. Our scheme prevents the runtime analysis using a phase deadline for each phase of the execution. The function parameters are sent to the server at the beginning of each phase in the execution of the distributed program. With these information, the server host then can determine the s. of the Self-Signature function to be computed in this phase. However, since the Self-Signature function is encrypted, and cannot be completed within polynomial time, a phase completion deadline can be set, which require the server to complete the computation of this phase within the deadline. If the intermediate image for this phase is returned after the phase deadline, then it will be considered unsafe. The phase deadline is determine to be longer than the ordinary execution time of the phase, but shorter than the time to break the Self-Signature function. The whole bytecode is viewed as a series of bits: bi, b2, ..., bs. Thus, the Self-Signature algorithm is implemented as a polynomial function
p = !*>..<#-KReference: 1 Nisan, N.; et al, (1998), Proc 18th Int Conf DCS. 2 Cappello, P.; et al, (1998) Proc 3rd Conf MPPM. 3 P. Liu, (2001), Master Thesis, National University of Singapore. 4 Ronald, E.M.A.; Sipper, M., (2000) Computer. 5 Sander, T., (1998), Proc 9th Int Sym on Software Reliability Eng. 6 Farmer, W; Guttmann, J; Swarup, V, (1996), Proc ESORICS. 7 http://www.genmagic.com/ Telescript/Documentation/TRM/. 8 Palmer, E, (1994), Proc M P SEC'94 Conference. 9 Sander, T.; Tschudin, C.F.,(1998), IEEE Sym on Security and Privacy. 10 Low, D., (1998), Master Thesis, University of Auckland. 11 Collberg, C ; Thomborson, C ; Low, D., (1998) IEEE ICCL. 12 Aguilar, J.; Hernandez, M., (2000), Proc 8th ISMASCT.
AN INTEGRATED DISTRIBUTED COMPUTING PLATFORM ON A DECENTRALIZED ARCHITECTURE TAY TENG TIOW, CHU YINGYI Department of Electrical and Computer Engineering,National University of Singapore.Singapore, 119260. Email:[email protected],
[email protected]
This paper proposes an alternate distributed computing scheme. The proposed scheme has a fully decentralized architecture and addresses the scalability problem at the fundamental level by removing all dedicated component in the system. Every host participating in the scheme is identical in function. The input to the system, via a graphical user interface, is assumed to be standard, single machine oriented Java program. This paper describes each layer of the platform that allow the said input program code to be executed without any further active user intervene.
1
Introduction
Distributed computing or grid computing is the emerging platform for many computational intensive applications. Most previous proposals such as [1], [2] use dedicated components to organize the participating machines. Our proposal removes any dedicated component in the system. Every host participating in the scheme is identical in function. To support our scheme, a set of network protocols based on group communication is proposed to acquire resources. This may be viewed as a self-broking scheme, in the language of [5]. A key consideration in such a scheme is the reliability of the recruited nodes in getting the assigned tasks done. To address this concern, we use a similar design philosophy as in TCP/IP, in that performance of the recruited hosts is on a best effort basis. The runtime layer then implements a progress monitoring and migrating protocol to determine if the minimum performance measure is met and if not, to migrate the tasks assigned to an errant host to another host. The application layer of any distributed computing platform provides two basic functions. Firstly, it provides the method to map computing requirement to available hosts. Secondly, it provides a method to produce distributed applications that run on the specific platform. In our proposal, the assignment of computations is implemented in the distributed application itself based on the communication affinities. For the second aspect, support for the development of distributed applications is an important issue. In our system, the distributed code generation function automatically ports standard concurrent programs running on single machine to a form that runs on the network. The method is proposed in our previous work [10] and is summarized here.
798
799
2
System Architecture
Every host participating in our scheme is identical in function, and each host can act as a requesting host and/or a contributing host, and possibly both at the same time. Computing hosts can join or leave the scheme as and when desired without the need to register their presence or absence with any controller. Any distributed application running under the scheme involves a group of coordinated nodes taking on roles as a server and/or as a client. As a client, it first recruits available hosts in the system using a group communication protocol. After it receives responses from enough contributing hosts, the client will submit the application to run on the recruited hosts. The recruited hosts take the roles of compute servers. To participate in the computing scheme, a node launches the JDGC software described in this paper. The JDGC software integrates the functions of a client, namely requesting resources and submitting applications, and a server, namely responding to recruiting requests. The JDGC software system consists of three layers; namely, the network layer, the middle layer and the application layer (See Figure 2.1).
Distributed Code Generation Output: generated| System Submission distributed c<
Application Layer
^"£
Runtime Application Manager Network Layer recruit
\1 Code | | Code [
Fig.2.1 Components of Java Distributed code Generating and Computing (JDGC) Platform
object creation
Network Layer protocol and API
3
1^-
objectTOtrmmnicariont
The Network Layer
The recruitment protocol provides the facility for a client application to acquire available server hosts in the system. This protocol is based on group communication and it is implemented in the Recruit() API function on the client side and the server daemon on the server side. The state transition diagrams for the two sides are depicted in Fig.3.1 and Fig.3.2. The other set of API functions in the network layer provides the facilities to create object and invoke their methods in a distributed environment. The RemoteCreate() function enables the application code to create an object on a specified networked machine. The RemoteInvoke() function enables the application code to invoke the method of an object on a different
800
machine. They will be incorporated into the distributed code that runs on the system. These functions only involve uni-cast communication . To facilitate the object backup and crash recovery, the network layer provides the Checkpoint) function. The function sends a message to the destination virtual machine, which in turn uses the Object Serialization interface [11] to transform the object into an array of bytes. The array records the current information of the object. The array is then returned to the code that invokes the CheckpointQ function.
Fig.3.1 Recruit() State Transition Diagram: The Client
Fig.3.2 Recruit() State Transition Diagram: The Server Daemon
4 The Middle Layer All user applications in the JDGC platform are object-oriented Java applications. The manager is the runtime environment in the software system that supports the running of these applications. For every participating host, all the applications and recruited hosts are recorded and organized in the runtime application manager in the local JDGC platform. It always contains the current scenario/status of the dynamic system. The data structure for runtime application management contains three kinds of nodes, namely application nodes, object nodes and host nodes. Every application node has a reference to a vector of object nodes, which represent the objects existing in the application. Every object node has a reference to a host node, which represents the host where the object resides. All the host nodes are linked in a global host list. Thus all these nodes are organized in two global cross-linked lists An important function provided in the runtime application manager is backing up and crash recovery. The system will backup the objects in the local host during the lifetime of the objects. The backup image information will be held in each object node. During an invocation, if the remote host crashes, does not response or has other exceptions, the system will automatically retry the invocation for n number of times. After that, the remote host or server daemon is considered crashed. The system will follow a similar procedure as the object creation procedure to select a host and migrate the object using the backup image information in the
801
corresponding local object node. After the object is created, the system reperforms the invocation. All these will be done by the system automatically without the user's intervention. 5
The Application Layer
The application layer provides an integrated environment for the operations of both client and server hosts. As a server host, the user can start the server daemon from the user interface to response to requests. As a client, the user can open and submit applications to run on the network utilizing idle computation resources in the system. The applications submitted from the user interface are standard single machine oriented programs. The code that runs on the distributed environment, is however distributed code, which is able to place the objects on networked machines and to handle communications among these objects. Therefore, the application layer of the JDGC platform integrates a function of distributed code generation using object level analysis. The distributed code generation using object level analysis is a preprocessing for the submitted standard concurrent application. The objective of the method is to efficiently map the concurrent application to the system's computation resources, and to make this process automatic. The method groups heavily communicating objects in the same machine so that they could use lightweight communication mechanisms. The method first extracts the detailed object level communication affinity metrics between the runtime objects. This is done with two levels of analyzer with an intermediate 3 dimensional affinity metrics. Then the method automatically ports the standard application to the form that runs on the JDGC platform with the aid of a Abstract Syntax Tree (AST) transformation. The details of this method are proposed in our previous work [10]. Output of the process is the bytecode incorporated with the network layer API. It can be submitted to the distributed environment and be managed by the runtime application manager. 6
Applications
The proposed JDGC platform is an integrated distributed computing system supporting Java applications. In this section we present two sample applications that run on the JDGC platform. - MultT: Multiplication of two matrices of floating point numbers. - Curve: Curve fitting for data using the least square error criteria.
802
The two applications are tested in different situations where one or more single threaded and multiple threaded applications are submitted to the system. In the single threaded case, all the computation is done within one object. Multiple such applications can be submitted, and their computations are distributed by the system to available hosts. In the tests for multiple threaded applications, the computation of an application is done in multiple concurrent objects that can be distributed to available hosts. The applications' average execution times are recorded for these situations. The units of execution time in the following tables are in seconds. Results are summarized in Tables 6.1 and 6.2. lable.6.1 1 he Average bxecution tinne or Multiple 1 hreaded Apolications MultT (2400)
Threads
Num of Hosts Execution time Speed up Curve Threads (3200, 10"6) Num of Hosts Execution Time Speed up
Single
Multiple (4)
Multiple (4)
Multiple (4)
1 1401.336 1 Single 1 1164.755 1
1 1582.071 0.886 Multiple (4) 1 1255.500 0.928
2 809.992 1.730 Multiple (4) 2 652.834 1.784
4 426.507 3.286 Multiple (4) 4 335.265 3.474
Table.6.2 The Overheads in Creation and Communication MultT (2400)
Threads Num of Hosts Creation Overhead Communication Overhead Curve Threads (3200, 10'6) Num of Hosts Creation Overhead Communication Overhead
Multiple (4) 1 11.827 2.205 Multiple (4) 1 10.619 0.366
Multiple (4) 2 13.197 15.678 Multiple (4) 2 11.702 11.347
Multiple (4) 4 13.252 20.067 Multiple (4) 4 11.969 25.874
REFERENCES 1] Christiansen, B.O.; et al, (1997), Concurrency: Practice/Exper, Vol 9. 2] Lewis, M.; Grimshaw, A., (1996), 5th IEEE ISHPDC. 3] Michael O. N, et al, (2000), Euro-Par 2000. 4] Noam Nisan, (1998), IC. Distributed Computing Systems. 5] Neary, M.O.; et al, (2000) Concurrency: Practice /Exper, Vol 12. 6] Brecht, T., et al, (1996), 7th ACM SIGOPS. 7] Shmulik L, (1998), MSc Thesis, Hebrew University of Jerusalem. 8] Fahringer, T., (2000), IEEE IC Cluster Computing. 9] Baratloo, P., et al, (1999), HCW'99, IPPS/SPDP 1999. 10] Yingyi Chu, (2002), Master's thesis, National University of Singapore. 1 l]Darby, C , (1997), WEB Techniques, Vol 2, Issue 9.
GRID BASED PROBLEM SOLVING ENVIRONMENT FOR SCIENTISTS EMILDA SINDHU, UVARAJ PERIATHAMPY, MURALI KANTHARAJ Institute of High Performance Computing, 1 Science Park Road, E-mail: {emilda, uvarafmurali] @ ihpc. a-star. edu. sg
Singapore
Building Grid-based Problem Solving Environments (PSE) for scientists is a challenging task. Providing access to various Grid services or to provide access to scientific applications on the grid is not usually a simple matter for PSE developers. Using commodity grid toolkits (CoG Kit's) and other open source software's can overcome a part of the difficulty. A convenient interface to PSE is to design a web portal for a scientific domain or for a particular scientific problem. In this paper we will be discussing about the PSE we have developed for solving large-scale computational intensive problems in science and engineering. We will be discussing about the nature and implementation details of the services provided. The object of the portal is to facilitate access to large-scale, multi institutional, dynamic, distributed application environments for scientific research.
1
Introduction
High performance distributed computing is rapidly becoming a reality. Nationwide highspeed networks are becoming widely available to interconnect high-speed computers at different sites. Projects such as Globus [5] are developing software infrastructure for computations that integrate distributed computational and informational resources. The development of the next-generation problem solving environments (PSEs)[2] are influenced by the rapid advances in distributed computing and the emerging national-scale Computational Grid [1]. The explosive growth of the Internet and a broad spectrum of distributed computing technologies like RMI, CORBA, Jini, DCOM etc., have led to significant technology improvements that are important for the development of PSEs accessing large-scale computational resources. Simultaneously, the high performance computing community has taken big steps toward the creation of Grid [1,4]. Computational Grids have become an important asset in large-scale computing. It is emerging as a popular paradigm for solving large-scale compute and data intensive problems in science and engineering. The Globus toolkit-which has emerged as a de facto standard for Grid computing-is a communitybased set of services and software libraries for security, information infrastructure, resource management, data management, communication, fault detection and portability. The Globus toolkit is now central to distributed computing. The task of building Grid applications remains extremely difficult because there are only few tools available to support developers. The Commodity Grid (CoG) project is working to overcome this difficulty by creating what we call Commodity Grid Toolkits (COG Kits) [3] that define mapping and interfaces between the Grid and particular commodity frameworks familiar to developers. In this paper we will be discussing about the PSE we have developed for solving large-scale computational intensive problems in science and engineering. We will be discussing about the nature and implementation details of the services provided.
803
804
2
Problem Solving Environment for Scientists
The Java-based Commodity Grid Toolkit (Java CoG kit) defines and implements a set of general components mapping Grid functionality into a Java framework. This kit is very useful for developing PSE. This kit is extremely useful in providing access to sophisticated remote compute services/servers using lightweight Web interfaces or portals. The definition of PSE as given by [10]: "A problem solving environment is a computational system that provides all tools required for solving problems from a specific domain, interact with and visualize and analyze results". In the PSE we have developed the primary domain we are concentrating is 'problems requiring large-scale computation'. Our primary goal is to build PSE for scientists for solving large-scale computational intensive problems in science and engineering. For solving the above problem scientists may need to access remote resources, using a secure connection. The process of solving the problem is steered by the scientist, and the progress may be monitored, analyzed/visualized through the portal. Computational portals are emerging as the interface for performing operations on the Grid. Computational Grids have emerged as a distributed computing infrastructure for providing pervasive, ubiquitous access to a diverse set of resources ranging from highperformance computers (HPC) to tertiary storage systems to large-scale visualization systems. One of the primary motivations for building Grids is to enable large-scale scientific research projects to better utilize distributed, heterogeneous resources to solve a particular problem or set of problems. Grid portal provides application scientists with a customized view of software and hardware resources specific to their problem domain and provides a single point of access to Grid resources. It is a web based application server enhanced with the necessary software to communicate to Grid services and resources. 3
Implementation Details
The portal is developed using commodity off-the-shelf software and toolkits. The portal development leverages off existing Globus/Grid middleware infrastructure as well as web technology including Java Server Pages and servlets. The development of the portal is also based on Grid Portal Development Kit (GPDK) [6] and MyProxy [7] toolkits. Based on the Model View Controller design paradigm, the GPDK has proven to be an extensible and flexible software package. The GPDK integrates nicely with other commodity technologies including the open-source servlet container Tomcat and the Apache web server. Our portal development effort also makes use of the APIs of Java CoG toolkit for its Java implementation of client side Globus Grid services. The Grid information services are provided by Lightweight Directory Access Protocol (LDAP) [8,9], an open source LDAP server. The basic architecture of a Grid Application Portal is illustrated in Figure 1. The user makes a secure connection from the web browser to the portal web server. The portal server then obtains a certificate from the MyProxy server (proxy certificate server) and uses that to authenticate the user with the Grid. By taking advantage of the Myproxy package, users can use the portal to gain access to remote resources from anywhere without requiring their certificate and private key be located on the same machine/device running a web browser. The functionalities provided by SER Grid portal includes remote
805
program submission, file transfer, and querying of information services from a single, secure gateway. In addition, profiles are created and stored for portal users allowing them to track and monitor jobs submitted and view results. A snapshot of the Science and Engineering Research (SER) Grid Portal at Institute of High Performance Computing is shown in Figure 2.
Compute Resource
Web server
Client Web Browser
HTTPSo
Storage Resource TirnTIIHHIUHIHIIIIT
GSI
|A
Requestl
User Proxy Information Service
LDAP
Myproxy Server
Figure 1. Grid Portal Setup. b|.MJlUil,llll,M«»»««lin»l*!ffl
••IOIKI
3 <*&> |
Arf*««j€3 https//1b2158T1 18'sei/servet'ier
• E)up .
^ ] g^SwcliW* -£
j Ffe
E« « «
FjOW
3
J ;-t
a i
J
J- 4
—J
Institute of High Performance Gemputing
Seiense & Engineering lesearch Grid ,-*r«*-x User: rauid* Time left before session expires:
Logout]
UsefPrafie I
Submit Job
VitjwJobs J
ReTtansfef
Pesouicetnto J
Qu
00:25:14
Jobs Submitted: 6
View Resource Info from: Idap://192.168.11.18:2135/dc=ihpc, dc=nus, dc=edu, dc=sg, o=Gnd Hostname
cpucount
cpuloadS
manufacturer
0.26
i686-unknown-linuxoldld
ASOCucvjs Porta! mendd.ihpc.nus.edu.sg
« Bun*
H 5»'u«M»«« F i g u r e 2. Science & Engineering Research Portal Interface - A snapshot.
806
4
Conclusion and Future Work
The goal of our PSE for scientists is to provide the users of the area a central place providing seamless and efficient access to the virtual workbench with the needed tools and infrastructure. The SER grid portal takes advantage of the existing commodity software to develop a Web-based services portal for scientists to securely access a range of networked resources. Currently the SER grid portal provides remote job submission, file transfer, and querying of information services. In future, the PSE also provides facility to visualize and analyze results. This task would involve dynamic calculation, rendering calculations and display. Another area of future development is combine all the tasks (related to grid and other tasks related to data staging and result visualization) of scientists' in the problem domain under consideration, and make single point of access by creating a work flow system. References 1. I.Foster and C. Kesseslman, editors. The Grid: Blueprint for a Future Computing Infrastructure, Morgan-Kaufmann, 1999Barradas, Y., Diferentes tipos de competencia. Ecologia Austral 6 (1996) pp.55-63. 2. J.R Rice and R.F.Boisvert, "From scientific software libraries to problem-solving environments", IEEE Computational Science and Engineering, fall:44-53, 1996. 3. http://www.globus.org/cog. 4. G.C. Fox, D. Gannon, "Computational Grids", IEEE Computer Science Eng. Vol 3, No. 4, pp. 74-77 2001 5. I.Foster and C.Kesselman, "Globus: A Metacomputing Infrastructure Toolkit" International Journal of Supercomputing Applications. 11(2): 115-128, 1997 6. J.Novotny, "The Grid Portal Development Kit", IEEE Concurrency Practice and Experience, 2000 7. http://www.ncsa.uiuc.edu/Divisions/ACES/MvProxv/ 8. http://www.openldap.org 9. Netscape Directory and LDAP Developer Central. http://developer.netscape.com/tech/directory/index.html. 10. E.Gallopoulos, E. Houstis, and J.R. Rice. "Problem Solving Environments for Computational Science", IEEE Computational Science and Engineering, 1:11-23, 1994.
MANAGEMENT OF EJB APPLICATIONS USING JAVA MANAGEMENT EXTENSIONS JOONG-KI PARK, JOONG-BAE KIM, AND DUCK-JU SOHN 161 Gajeong-dong, Yuseong-gu, Daejon, 305,350 KOREA E-mail: [email protected], [email protected], [email protected] The availability of object middleware, such as EJB is rapidly being accepted as a mean for cost effective and rapid development of a wide range of the distributed applications. The distributed applications that are built using these technologies include many objects and can become rather complex. Therefore, the development of such complex distributed applications requires a significant improvement of management methods and pertinent tools. In this paper, first, EJB and JMX, which are two extensions of Java technology in the context of Telecom Management Systems and Application Management, are analyzed respectively. Second, an integrated distributed Telecom Network Management framework based on the EJB middleware components is provided. The EJBbased framework assumed to be deployed on J2EE servers and some management issues that are not covered by J2EE servers are also investigated. Third, based on this framework, an architectural model for managing EJB components using JMX technology is proposed.
1
Introduction
The rapid growth of networking technologies in recent years has created much more complex and heterogeneous network environments. To manage network devices and services in such complex environments, various organizations have developed management platforms (e.g. Internet management framework, OSI management framework, TMN [3]) using different kinds of management protocols (e.g. SNMP [4], CMIP [5]). Since there are many different network management schemes, which cannot be easily integrated, there have been many research efforts attempting to harmonize them with new technologies. For example, Joint Inter Domain Management (JIDM) is one of the organizations doing such integration. The goal of JIDM is to integrate CMIP, SNMP and CORBA technologies [3]. These integrations are vendor-oriented and currently there is no de facto standard for integrating different network management solutions. The Sun Microsystems tries to standard integration of different network management solutions by proposing EJBs as a common framework for integrating different network management solutions. The Sun Microsystems's EJB architecture for integrating different network management solutions seems to answer common challenges of distributed integrated network management such as scalability, security, reliability and availability by delegating these low level tasks to be managed by an application server (i.e. J2EE servers) in which EJBs components are executed [1]. However, the spread of EJB components based on distributed
807
808
applications has raised the need to analyze what management techniques are suitable to control and monitor the resources implied in their operation [7] [9]. These management techniques have the goal of improving the performance and the reliability of EJB applications and making easier configuration and security tasks, which are key issues to maximize the quality of service offered. When dealing with component-based server side applications (i.e. EJB), several management issues should be addresses such as Management of the component-based platform, Management of the applicationindependent functionalities, Management of application-dependent functionaries and Management of the underlying system and network resources [6]. Currently, the J2EE platform specification does not define how to solve the management problems commented above. As a result, management development team has to develop its own approach to the management of EJB applications [2]. To this end, adopted management architecture must be proposed based on some management standardizations. 2
Management of EJB-Based application using JMX
A Network Element management application only exposes management information of network element. There is no application management capability defined in EJB application platform. In this section some of the main management related aspects of an EJB-based application that is conceptualized within the scope of the Network Element Management system is described. The Network Element Management system is based on deploying EJB components to J2EE platform provided by Sun Microsystems. As mentioned earlier in this paper, J2EE platform does not provide all management functionality required for management of EJB components. For example, consider an EJB application that is implemented to collect alarm logs form network elements and store them into an alarm database. In deed, network management developers do this implementation. Then, assume that IT department proposes a new requirement that an EJB application must collect summary information from an alarm database and send it to a predefined e-mail address once a day. Since the J2EE model does not define a timer, and does not allow components to manage system resources such as threads, a programmer must include this functionality in custom classes, which are attached to the J2EE server. This implies a need
809
to separate application management from network management. This separation lets network management developers focus on developing only network management solutions and IT management developers focus on developing management systems and tools for managing those systems developed by network management developers. The following section describes how two aspects of the EJB management problem have been faced in the management of the conceptualized Network Element management application: • •
2.1
The design of the management architecture for an EJB application. The design of the management instrumentation of service-oriented management aspects within the managed application.
Architecture ofJMX Management to Instrument EJB-Based Application
JMX is chosen as the basis for the management architecture of the proposed Network Element Management application due to its straightforward integration with EJB [10]. Management information and operations of Java applications managed with JMX are made available through MBeans. According to the JMX specification, MBeans should be plugged into an MBean server that resides in the same Java Virtual Machine than the application. Nevertheless, this scenario covered by the JMX specification is not possible in most cases. In general cases, the J2EE server and the JMX MBean server have to be run in different JVMs. Thus, the management instrumentation of EJBs of the Network Element Management application in the form of MBeans seems to be useless, as they cannot be connected to the corresponding MBean server. To overcome this limitation, a double workaround is adopted [11]. First, for obtaining management information from the EJBs or for invoking management operations on them, the JMX Model MBeans are used. These Model MBeans are dynamic MBeans that can be used to instrument Java code at run-time. Model MBeans read an XML description file at run-time describes the management capabilities of a Java resource in terms of management attributes and operations. That Java resource must be registered within the Model MBean; therefore, when a model MBean receives an invocation of a particular management operation from the MBean server, it redirects the invocation to the registered Java resource.
810
EJB Container
EJB Container
Figure 1. Architecture of JMX Management to instrument EJB-Based application
A Model MBean is instantiated within the JMX MBean Server for each EJB that has to be managed. For each Model MBean an XML management descriptor is created for describing management information of the remote EJB. As well, a local reference to the Remote Interface of the corresponding EJB is registered into the Model MBean. Thus, each time a management operation is invoked on the Model MBean, a remote management operation is transparently invoked on the remote EJB. Using this approach avoids developing customized MBeans for each EJB. The JMX-based agent only has to instantiate the appropriate number of Model MBeans, locate the remote EJBs, obtain the references to their remote interfaces and register them within the Model MBeans [11]. Second, for collecting notifications generated by the EJBs, JMX specifies a complete notification model that includes the definition of Notifications, Notifications Filters, Notifications Broadcasters and Notification Listeners. This notification model only covers the transmission of events between MBeans within the same JMX agent and does not cover the problem of collecting notifications submitted by remote EJBs.
811
In order to solve this limitation, the JMX-based agent for Network Element Management application includes a "Notification Server" that receives JMX notifications remotely using RMI-UOP distributed computing capabilities from the managed EJBs. This RMI-UOP "Notification Server" implements the Notification Broadcaster" JMX interface so that different "Notification Listeners" can be registered in the "Notification Server" in order to receive management information they are interested in. The "Notification Server" registers itself at the JNDI server used by different EJB containers with a predefined name that is known by all the EJBs. . Figure 1 shows architecture of the Network Element Management application with extended application management capabilities applied to each EJB. 2.2 Management Instrumentation There are two requirements that should be considered when deciding what management instrumentation scheme to apply to the EJBs of the Network Element Management application. First, The instrumentation approach adopted should be generic enough so that it might be applied not only to the Network Element Management application but also to any other EJBbased application. This requirement implied that the instrumentation approach should be flexible enough to be able to obtain management information of different types. Second, the instrumentation should be as transparent as possible to the developers of the EJBs [11]. EJB-based applications have characteristics that have to be taken into account when thinking of instrumentation possibilities. After EJB-based applications have been developed, they have to be deployed over the EJB container of J2EE server [11]. This deployment step implies that the EJB container generates all the necessary stubs and skeletons for RMI-UOP communications as well as a wrapper for each EJB (as shown in figure 1, EJB Wrapper) containing the support needed for transactional, security, persistence management related issues. This naturally implies that wrapping approach is most appropriate for instrumentation of EJB components. In this case, the wrappers for each EJB have to be coded manually. Using Java class inheritance, the management wrappers (as shown in figure 1, Mgmt Wrapper) can be integrated with the functional code of the application such as EJB-Alarm Management subsystem and EJB-Topology Management subsystem without modifying the source code of these subsystems.
812
3
Conclusions and Future Related Work
In this paper, the conceptual architecture of a Network Management Element system based on EJB architecture is analyzed. In the future, EJBbased management systems can be analyzed in the area of the Wireless Network management, Alarm Management, Event Correlation, Policy Administration and Quality of Service Enforcement [8]. Furthermore, a conceptual architecture for JMX-based management system for the management of EJB middleware components is proposed. A wrapping technique is used to provide application manageability of EJB components. This technique is introductory and assumed to be performed manually by developers. In the future, the wrappers can be generated automatically by enhancing EJB deployment descriptions with information about the characteristics of the management wrappers and then enhancing J2EE servers to automatically generate management wrappers. In the future, interoperability between the JMX management solutions and existing management applications through other standard protocols such as SNMP, TMN, CORBA/HOP, and CM/WBEM can be investigated. References 1. Gottfried Ludere, Hosson Ku, Baranitharan Subbiah and Anad Narayanan, "Network Management Agents Supported by a Java Environment," International Symposium on Integrated Network Management (ISINM '97). 2. Matjaz B. Juric, Ivan Rozman, Simon Nash "Java 2 Distributed Object Middleware Performance Analysis and Optimization", ACM SIGPLAN Notices, August 2000, No. 8, pages 31-40. 3. Jae-Young Kim, Hong-Taek Ju, J. Won-Ki Hong, Seong-Beom Kim and Chan-Kyu Hwang, "Towards TMN-based Integrated Network Management Using CORBA and Java Technologies," Special Issue on New Paradigms in Network Management of IEICE Transactions on Communications, Vol. E82-B, No. 11, November, 1999, pp. 17291741. (SCI). 4. J. Case, M.Fedor, M. Schoffstall and C.Davin, "The Simple Network Management Protocol (SNMP)," RFC 1157, May 1990. 5. ITU-T Recommendation X.711, "Common Management Information Protocol (CMIP)," Specification, 1991. 6. Jorge E.Lopez.de.Vegraara, Victor A. Villagra, Juan I. Asensio, Jose I.Moreno, Julio J.Berrocal "Management of E-Commerce Brokerage
813
7. 8.
9.
10.
11.
Services," Department of Telematic systems Engineering, Technical University of Madrid (DIT-UPM), Spain. Heather Kreger, "Java Management Extensions for application management," IBM systems Journal, Vol 40, No 1, 2001. Sun Microsystems, Inc. "Telecom Network Management With Enterprise JavaBeans (EJB) Technology," Technical White Paper, Sun Microsystems, May 2001. Sun Microsystems, Inc. "Dynamic Management for the Service Age, Java Management Extensions White Paper, Sun Microsystems, June 1999. Sun Microsystems, Inc. "Java Management Extensions Instrumentation and Agent Specification," technical report, Sun Microsystems, July 2000. Jorge E.Lopez.de.Vegraara, Victor A. Villagra, Juan I. Asensio, Jose I.Moreno, Julio J.Berrocal "Experience in the management of an EJBbased E-commerce application," Department of Telematic systems Engineering, Technical University of Madrid (DIT-UPM), Spain.
ON APPLICATIONS OF DATA MINING TO HUMAN RESOURCES DATA V. KAMALESH AND V. KURALMANI Institute of High Performance Computing 1 Science Park Road, #01-01 The Capricorn, Science Park II Singapore 117 528 Tel: 64191551, 64191442 Fax: 64191230 Email: {vkamal, manivk)@ihpc.a-star.edu.sg This paper gives a brief introduction to data mining and its applications to human resources data. Three well-known data mining technique namely predictive modeling, association rule mining and clustering are suggested for extracting useful knowledge from the HR database. These techniques can be used to analyze attrition, performance, compensation, demographics, skill development, training, career management, retention and so on. Key Words Data mining, association rule mining, clustering, predictive modeling, knowledge discovery and human resources data.
1
Introduction
Data mining is an advanced method and process of exploring and extracting information from large databases to reveal hidden patterns, trends and correlations. The automated, prospective analyses offered by data mining move beyond the analyses of past events provided by retrospective tools. Data mining answers business questions that traditionally were too time-consuming to resolve. Data mining tools scour databases for hidden patterns, finding predictive information that expert may miss because it lies outside their expectations. Such information enables organisations to solve business problems and make strategic decisions. Data mining is being applied to all sectors of the industry, in particular, banking, insurance, transportation, telecommunications, retail, and hospitality. It is also increasingly becoming an important tool in the field of healthcare, genomes and drug discovery. One may refer to Adriaans (1996), Fayyad et. al. (1996), Deslesie and Croes (2000) and Braha (2001) for specific uses of data mining. Application of data mining to human resources (HR) data has not received much attention in the past and is therefore the motivation for this paper. HR data in any business is an excellent source of information that could be tapped easily to profile their employees and extract useful knowledge from masses of unleveraged data. In general, each employee has a complete human resources profile with many attributes. These attributes include all the standard human resources descriptions including date of hire, job grade, salary, review dates, review outcomes, vacation entitlement, organization, education, address, insurance plan and many others. Analyzing these data requires an enormous statistical effort. As it may not be feasible in reality, this paper suggests certain data mining methodologies and procedures that can be used for HR data in any industry to identify the patterns and trends that exist in the database. For example, the traditional question is: "How many employees left in last six months compared to same period last year?" Whereas the data mining question could be: "Is there any correlation between specific employee's competencies and project success measured in revenue terms?" or "Do teams with particular competency sets show greater ability to succeed than other teams?" Efficient data mining normally relies extensively on the data preparation. Thus, one must have a thorough data preparation, which has a few stages. Some of the main stages are: (i) data integration, where multiple, heterogeneous data sources may be integrated into one (ii) data selection, where data relevant to the analysis task are retrieved from the database (Hi) data cleaning, which handles noisy, erroneous, missing, or irrelevant data (iv) data transformation, where data are transformed or consolidated into forms appropriate for mining by performing summary or aggregation operations
814
815 2
Mining the HR Data
Generally, data mining tasks can be classified into two categories: descriptive data mining and predictive data mining. The former describes the data set in a concise and summary manner and presents interesting general properties of the data whereas the latter constructs one or a set of models, performs inference on the available set of data, and attempts to predict the behavior of new data sets. Clustering could be considered as descriptive data mining whereas predictive modeling and association rule mining are considered as predictive data mining. (i) Clustering: Clustering analysis is used to identify clusters embedded in the data, where a cluster is a collection of data objects that are similar to one another. Similarity can be expressed by distance functions, specified by users or experts. A good clustering method produces high quality clusters to ensure that the inter-cluster similarity is low while the intra-cluster similarity is high. This means one may able to cluster the employee's database based on certain specified criteria. For instance, (a) To study the profiles of high performing and low performing employees. Performance can further be easily analyzed. (b) To identify which cluster will make the best employee under specific conditions, to focus on a particular cluster in order to fully evaluate and understand them (c) Able to discover which particular groups of employee, for example, are surprisingly effective? (ii) Prediction: Prediction in general refers to either classification or regression. Classification analyses a set of training data (i.e., a set of objects whose class label is known) and constructs a model for each class based on the features in the data. A decision tree or a set of classification rules is generated by such a classification process, which can be used for better understanding of each class in the database and for classification of future data. Regression function predicts the possible values of some missing data or the value distribution of certain attributes in a set of objects. It involves the finding of the set of attributes relevant to the attribute of interest (e.g., by some statistical analysis) and predicting the value distribution based on the set of data similar to the selected objects. Usually, regression analysis, generalised linear model, correlation analysis and decision trees are useful tools in quality prediction. Genetic algorithms and neural network models are also popularly used in prediction. However, in respect of HR data, both these techniques could effectively be used appropriately. For instance, (a) An employee can be classified as a good performing or poorly performing employee, (classification) (b) Whether an employee would stay with the company for long or leave? (classification) (c) How long an employee can possibly stay with the company? (regression) (d) What is the loss because of his leaving? (regression) (e) Whether a particular staff will be successful in a new project? (classification) (f) Staffing for a project or for training (classification) (iii) Association: One of the reasons behind maintaining any database is to enable the user to find interesting patterns and trends in the data. For example, in a supermarket, the user can figure out which items are being sold most frequently. But this is not the only type of trend that one can possibly think of. The goal of database mining is to automate this process of finding interesting patterns and trends in any environment. Once this information is available, we can perhaps get rid of the original database. The output of the data mining process should be a summary of the database. This goal is difficult to achieve due to the vagueness associated with the term 'interesting'. The solution is to define various types of trends and to look for only those trends in the database. One such type constitutes the association rule. This analysis is quite useful in the HR environment to unearth the patterns that are associated therein. For instance, this gives: (a) Whether an employee can achieve certain designation based on other merits. (b) What are the characteristics that are associated to achieve a particular grade of work? (c) Whether an employee has any diseases because of his other medical conditions? (d) Rules that implies certain association relationships among a set of attributes
816 3
Data and Transformation
Most of the HR data can be classified into four major categories such as personal, medical, educational and service data. For example, (il Personal Information: Employee ID, Name, Date of birth, Gender, Race, Religion, Marital status, etc. (iil Medical Information: Physical characteristics, Height, Weight, BMI, Hearing, Eyesight, Mental State, Principal disease, Additional diseases, etc. (iii) Educational Information: Academic qualifications, Title of degrees, Subject majors, Year of completion, Extra curricular activities, etc. (iv) Service Information (of previous & current jobs): Date of joining, Date of confirmation, Number of years of experience, Number of previous appointments, Division, Designation, Occupation, Salary, etc. Transformations of data are also very important for the data mining process. For example, (a) Aggregating multilevel categorical variables into fewer categories based on similarity between categories [E.g. Educational degrees, subjects, etc]. (b) Disaggregating multivalued categorical variables to a set of binary variables that represent the presence (or absence) of each of the possible values for the original variable [E.g. Disease, Extra curricular activity] (c) Binning numerical data into categories (such as high, medium, and low), particularly if the data is 'soft' (not measured with much precision). Binning can be a convenient way to handle difference in magnitude or skew in numerical data [E.g. Salary, Height, etc]. Other than the above, certain new variables should be derived if necessary on the available attributes. Similarly, outliers should be taken care of. Outliers or spurious values should be identified or otherwise it may lead to bias in the estimation. In addition, in the case of association analysis, the most common obstacle in performing a good association analysis is the presence of low-support variables. There are two ways to deal with this problem. One way is to create a support threshold. Any combination that has a support below a certain percentage will be dropped from the analysis. Unfortunately, the support threshold method has a major disadvantage that it eliminates some potentially valuable data from the consideration. This brings us to the best way to deal with the low-support variables is the creation of a taxonomy. Taxonomy is an orderly hierarchy of variables and variable categories, dividing things such that each variable put into the association analysis occurs with a similar level of support. This is done by aggregating low-support variables into groups (for example, combine all religions who do have less frequencies) and then analysing them as a group, while breaking down the high-support variables into the smaller units (for example, race can be sub-divided into the smaller groups by their linguistics). This will eliminate the uneven support in the analysis thereby to ensure the production of association rules whose confidence and lift can be meaningfully used for comparison. Attention should also be paid while mining the medical data. Understanding medical data in the HR environment can be particularly a very important aspect, especially when it pertains to the disposition and overall health of the employees. As a matter of fact, a high percent of the illnesses and incapacitation may result not from physical injuries received but from infection, and other disease. Planning for treatment of these conditions ahead of time can facilitate medical response and treatment during critical situations. For example, we may discover a relatively high incidence of chicken pox among young employees between the ages of 17 and 19. As we may know, chicken pox in adults can be quite a serious health matter and identification of problematic subgroups within a group of population can facilitate the establishment of policies and procedures aimed at minimising this health threat.
817 4
Limitations
Data mining depends to a large extent on the availability of the relevant attributes and on the accuracy of the data. It requires exponentially more computational effort as the problem size grows. Efficient analyses require familiarity with the domain knowledge of the attributes and their relationships. Sometimes presented results may be actually due to the success of previous strategic decisions; therefore one should pay more attention while interpreting the results. 5
Summary
Data mining on HR can easily be implemented in any large or medium scale organisations like Banking, Ports, Airlines, Police Force, Civil Defence, Military, Public Service and so on. Usefulness of data mining mainly lies in the areas of recruiting and retention, staffing, training, employee evaluation, payment, compensation, time and attendance, safety, fraud, claiming and so on. The ultimate aim of this application is to save the time and cost, and also to improve the organisational effectiveness. As mentioned in Stewart et. al. (2000), 'knowledge is a corporate asset that shall be managed'. Does data mining do this job more effectively? Of course, Yes. References Adriaans, P. and D. Zantinge, Data mining, Harlow, Addison-Wesley, 1996. Braha, D., Data mining for design and manufacturing-methods and applications, Kluwer academic publishers, 2001. Deslesie, L. and C. Croes, "Operations research and knowledge discovery: a data mining method applied to health care management", International transactions in operational research, 7, 159170, 2000. Fayyad, U. D. Madigan, G. Piatetsky-Shapiro and P. Smyth, "From data mining to knowledge discovery in databases", AI Magazine, 17, 37-54, 1996. Stewart, K. R. Baskerville, V. C. Storey, J. A. Senn, A. Raven and C. Long, "Confronting the assumptions underlying the management: An agenda for understanding and investigating knowledge management", The Data Base for Advances in Information Systems, 31, 41-50, 2000. Yiming, M., B. Liu, C. K. Wong, P. S. Yu, and S. M. Lee, "Targeting the right students using data mining", ACM 2000, 457-464.
LINGUISTIC RULE EXTRACTION B Y GA COMBINING D D R AND RBF NEURAL NETWORKS
Institute
XIUJU FU of High Performance Computing Science Park II Singapore E-mail: [email protected]
LIPO WANG School of EEE, Nanyang Technological University, E-mail: [email protected]
Singapore,
117528
639798
We propose a novel method to extract rules by using the data dimensionality reduction (DDR) technique combining genetic algorithms (GA) based on radial basis function (RBF) neural networks. Firstly, data are preprocessed by removing irrelevant or redundant attributes based on attribute ranking results according to the separability-correlation measure (SCM). Secondly, preprocessed data are classified by an RBF classifier. Initial conditions of the premises of rules are obtained based on trained RBF classifiers. The interval for each attribute in the condition part of each rule is tuned by GA. The fitness of a chromosome is determined by the accuracy of the extracted rules. Our method leads to rules with hyper-rectangular decision boundaries directly without the need for an intermediate step to transform continuous attributes into discrete ones, unlike some existing methods based on the multilayer perceptron (MLP). Simulations demonstrate that our approach results in more accurate and concise rules compared to some other related methods.
1
Introduction
As an important tool of data mining, neural networks are widely used in data mining tasks, such as classification, prediction, and estimation, etc.. However, the black-box curse on neural networks impedes data miners' understanding on data concepts. Extracting rules from data sets has attracted more attention in recent years because it helps people break the black-box curse of neural networks and reveal data concepts. In rule extraction task, compact rules with high accuracy are desirable. In order to extract compact rules, we can reduce the number of inputs or simplify the architecture of neural networks. Data dimensionality reduction (DDR) algorithms have widely explored as the data preprocessing in data mining tasks for removing irrelevant or redundant attributes in data. In this paper, we use a novel separability-correlation measure (SCM) 4 for determining the importance of the original attributes. Then different attribute subsets according to ranking attribute queue are input to RBF neural networks to select the best feature subset which leads to lowest classification error rates and the smallest feature subset size. Then rules are extracted based on selected feature subsets. Due to its explicit form and perceptibility, hyper-rectangular decision boundary is often employed in rule extraction, such as rules extracted from the MLPs 167 and from RBF neural networks 8 - 1 0 . In order to obtain symbolic rules with hyper-rectangular decision boundaries, a special interpretable MLP (IMLP) was constructed in l . In an IMLP network, each hidden neuron receives a connection from only one input unit, and the activation function used for the first hidden layer
818
819 neurons is the threshold function. In 7 , the range of each input attribute was divided into intervals. The attribute was then encoded as a binary string accordingly. Rules with hyper-rectangular decision boundaries were thus obtained. In 3 , we proposed to extract rules using GA from RBF neural networks. However, irrelevant or redundant attributes may include in rule sets. The paper is organized as follows. In Section 2, the SCM measure is briefly reviewed first. Then feature subsets are extracted based on the SCM measure by removing irrelevant or redundant attributes. We introduce our rule extraction method in Section 3. The interval of each attribute in the premise part of each rule is encoded into a GA chromosome. Experimental results are shown in Section 4. Finally, Section 5 presents the conclusions of this paper. 2
Separability-Correlation Measure
Class separability and the correlation between attributes and class labels are used to measure the importance of each attribute. The probability of correct classification is large, when the distances between different classes are large. Therefore, the subset of features which can maximize the separability between classes is a desirable objective of feature selection. Class separability may be measured by the intraclass distance Sw and the interclass distance Si, 2. The greater St, is and the smaller Sw is, the better the separability of the data set is. The ratio of Sw and St is calculated and is used to measure the separability of the classes: the smaller the ratio, the better the separability. If omitting attribute fci from the data set leads to less class separability, i.e., a greater Sw/Sb, compared to the case where attribute &2 is removed, attribute k\ is more important for classification of the data set than attribute k-i, and vice versa. Hence the importance of the attributes can be ranked by computing the intraclass-to-interclass distance ratio with each attribute omitted in turn. In addition, we propose to use the correlation C/t between the changes in attributes and their corresponding changes in class labels as another indication of importance of attribute k in classifying the patterns. Hence we propose a separability-correlation measure (SCM) 4 Rk as the sum of the class separability measure Jwk /St,k and the correlation measure Cfc (k refers to the fc-th attribute), where Swk and Sbk are intraclass and interclass distances calculated with the fc-th attribute omitted from each pattern, respectively. The importance level of attributes is ranked through the value of Rk- The greater the magnitude of Rk, the more important the fc-th attribute. Based on attribute ranking results, n (n is the total number of attributes) subsets of attributes are input to RBF classifiers respectively. As the number of attributes used increases the validation error first decreases, reaches minimum when a certain attribute subset is used, and then increases. The attribute subset is selected. 3
Rule Extraction
After RBF neural networks are trained based on selected feature subsets, rules are extracted by using GA.
820 3.1
Encoding Rule Premises Using GA
Uji and Lji are the upper limit and the lower limit of interval j in rule i, respectively. Uji and Lji are set according to the trained RBF classifier. Initially, Uji is randomly generated according to the uniform distribution within range [/i.^,1], and Lji is randomly generated according to the uniform distribution within range {0,fJ.ji\We encode real value pji (p = U, L) using k binary bits: G(v)
ji
= {9k,9k-i,-9i,-,92,9i},
The relationship between pji and G^ji Pii
where B^ji
9i = 0 , l , i = 1,2,..., k
(1)
is as follows:
= f # > / ( 2 * - 1)
is the decimal value corresponding to B^ji=gk*2k-1+gk-1*2k-2
.
,
(2) G^ji:
+ ...+g2*21+g1*2°
.
(3)
A chromosome in the population pool can be represented as a one dimensional binary string: (GW11,G^lll...)GWBl)GW„i,...,G(u)lm,GWlm,...,G(l/)„m)GWBm)
3.2
. (4)
Crossover and Mutation
The roulette wheel selection is used to select chromosomes in each generation. Each chromosome in the population pool corresponds to a rule set. The accuracy of the rule set is calculated to evaluate the fitness level of each chromosome. Two-point crossover is used in our algorithm. The probability of crossover is usually around 80%. Mutation can prevent the fixation at some particular loci. The mutation rate is 1/1000. "Elitism" is used to retain the best members in the population pool. 4
Experimental Results
Iris and Thyroid data sets are used for testing our algorithm. There are 4 attributes in Iris data set, and 5 attributes in Thyroid data set. For Iris and Thyroid data sets, attribute ranking queues obtained according to the SCM are {4,3,1,2} and {2,3,5,4,1}, respectively. For Thyroid data set, the classification error rates of its RBF classifiers input different subsets of attributes in the order of importance are calculated. As the number of attributes used increases the validation error first decreases, reaches minimum when attributes 2, 3 and 5 are used, and then increases. Hence all the other attributes are considered unimportant for data concept and then removed, which decreases the classification error rate from 0.0465 to 0.0233, the number of inputs from 5 to 3. For Iris data set,attribute 1 and 2 are removed from the data set, which decreases the classification error rate from 0.0467 to 0.0333, the number of inputs from 4 to 2. Based on the rule-extraction algorithm described in Section 3, there are 3 rules (each rule is only with 2 premises) extracted for Iris data set. The accuracy of
821 obtained rules is 97.33%, which is the same with the result in 3 . However, the rule set in this paper is more compact since there is only 2 premises in each rule. There are 5 rules with accuracy 88% in the rule set of Thyroid data set, whose rule accuracy is higher than the result (85%) in 3 , and with smaller number of premises in rules. 5
Conclusion
In this paper, we propose to extract rules from RBF neural networks using G A based on the SCM method. Irrelevant attributes are deleted from the original attribute set according to the ranking results from SCM. Genetic algorithm is used for tuning the premises of rules. Experimental results show that the method proposed is effective in reducing the size of data sets, reducing the number of rules, and improving the accuracy of rules. References 1. G. Bologna and C. Pellegrini, "Constraining the MLP power of expression to facilitate symbolic rule extraction", Proc. IEEE World Congress on Computational Intelligence, Vol. 1, pp. 146-151, 1998. 2. P. A. Devijver and J. Kittler, Pattern recognition: a statistical approach, Prentice-Hall International, Inc. London, 1982. 3. X. J. Fu; L. P. Wang, "Rule extraction by genetic algorithms based on a simplified RBF neural network", Proceedings of the 2001 Congress on Evolutionary Computation, pp. 753 -758, vol. 2, 2001. 4. X. J. Fu; L. P. Wang, "Rule extraction using a novel gradient-based method and data dimensionality reduction Neural Networks", Proceedings of the 2002 International Joint Conference on Neural Networks, vol.2, pp. 1275 -1280. 5. E. R. Hruschka and N. F. F. Ebecken, "Rule extraction from neural networks: modified RX algorithm", Proc. International Joint Conference on Neural Networks, Vol. 4, pp. 2504-2508, 1999. 6. H. Ishibuchi and M. Nii, "Generating fuzzy if-then rules from trained neural networks: linguistic analysis of neural networks", IEEE International Conference on Neural Networks, Vol. 2, pp. 1133-1138, 1996. 7. H. J. Lu, R. Setiono and H. Liu, "Effective data mining using neural networks", IEEE Transactions on Knowledge and Data Engineering, Vol. 8 6, Dec. 1996, pp. 957 -961. 8. K. J. McGarry, S. Wermter, and J. Maclntyre, "Knowledge extraction from radial basis function networks and multilayer perceptrons", Proc. International Joint Conference on Neural Networks, Vol. 4, pp. 2494-2497, 1999. 9. K. J. McGarry, J. Tait, S. Wermter, and J. Maclntyre, "Rule-extraction from radial basis function networks", Proc. Ninth International Conference on Artificial Neural Networks, Vol. 2, pp. 613-618, 1999. 10. K. J. McGarry and J. Maclntyre, "Knowledge extraction and insertion from radial basis function networks", IEE Colloquium on Applied Statistical Pattern Recognition (Ref. No. 1999/063), pp. 15/1-15/6, 1999.
WEB-BASED CONFIGURATION AND CONTROL OF HLA-BASED DISTRIBUTED SIMULATIONS NIRUPAM JULKA, DAN CHEN, BOON PING GAN Production and Logistics Planning Group, Singapore Institute of Manufacturing Nanyang Drive, SINGAPORE 638075
Technology,
71
STEPHEN JOHN TURNER, WENTONG CAI School of Computer Engineering, Nanyang Technological
University, SINGAPORE
639798
The need for better understanding, control and optimization of supply chains is being recognized more than ever in the new economy. Simulation of a supply chain holds a great potential in providing the much needed visibility and control of the supply chain operations. The High Level Architecture (HLA), an IEEE standard for interoperable simulations, enables distributed simulations of this type. This paper discusses a framework to enable configuration and control of HLA-based distributed simulations through a web interface. The implementation of a prototype based on the framework is also presented.
1
Introduction
Simulation has become one of the standard technologies used for operational optimization by organizations. In the last decade enterprises have strived to unlock benefits from their supply chains using various cutting edge simulation technologies. Lack of visibility and details of execution of their upstream and downstream partners have meant the studies performed for operational improvement by these organizations have not given accurate results. Important reasons for this include the secrecy maintained regarding execution rules in a company, ever changing business environment and correspondingly changing operational condition, and the inherent geographical distribution of various nodes. The need for secure supply chain simulation in a distributed environment is felt more than ever. Web-based modeling and simulation obviously provide a solution to the above problem. But a close examination of the earlier work on the topic would reveal that their objectives differ from those at hand and hence their solutions are not directly applicable. The solutions provided include exposing the model building and execution facility through a web interface [7] and providing simulation services to customers through an Application Service Provider (ASP) model [5,8]. Kuljis and Paul [4] provide a critical analysis of work in web-based simulation modeling and execution. The High Level Architecture (HLA) is an IEEE standard (1516-2000) for interoperability of distributed simulations. This holds a great potential for supply chain study and optimization through simulation. A group of companies (supply chain partners) can build HLA compliant simulation models (federates) and can jointly perform supply chain experiments through distributed simulations communicating through the web. 2
Motivation
In our earlier work we proposed alternative structures for HLA-based simulations [1,2] for federation communities. These structures include Hierarchical HLA (HHLA) and Setbased HLA (SHLA). The motivation for these alternative topologies is to enable selective
822
823
information shielding between different entities in the distributed simulation. The user thus has more flexibility to run configured simulations in a distributed environment with higher control upon information sharing with other partners. The facility to configure and control simulations of various scenarios using the above mentioned topologies as well as a flat HLA is exposed in this paper. The framework allows users to register, authenticate, configure, modify, invoke and terminate supply chain simulations though a web browser. The choice of Java as the platform for the implementation of this framework allows portability, complements the interoperability promised by HLA and provides a facility to critically compare supply chain scenarios by configuring, executing and analyzing corresponding distributed simulations through a web interface. The overall motivation is to enable companies to provide the use of simulation models of their enterprises to their partners in a manner similar to web-services. Each of the partners can then run customized supply chain scenarios in a grid (distributed computing) fashion. The framework is not specific to supply chains and can be ported to other domains with minimal effort. This provision of secure simulation over the internet bridges the gap between in-house simulations and collaborative simulations between trading partners based on legacy networks.
3
Overall Framework
The framework for web-based control of simulations is illustrated in Figure 1. There are two components in the software, the authentication server component and the company server component. The authentication server component (AS) has five modules handling various different services provided by the server and the company server component (CS) has two modules handling the services provided by the company. The prototype developed based on this framework uses the Model-View-Controller (MVC) architecture with beans providing the Model, Java Server Pages (JSPs) providing the View and servlets providing the Control. All information pertaining to the various entities in the simulation are stored in a central database on the AS. The specific modules, services offered by them and implementation details are as follows.
824
3.1
Federate Information and Management (FIM)
The Federate Information and Management (FIM) module provides the user with the facility to modify information regarding the various federates hosted at CS in the database on the AS. The right to modify each piece of information is linked to the role of the user (AS Administrator, CS Administrator, User, Guest). Signatures of the executable federate programs which are hosted on the various CS are also stored in the same database. These signatures are specific to the executable code of a federate. Upon modification of any federate code, the owner of the federate needs to log the changes made to the federate code and apply for a new signature through FIM. The information regarding the configurable parameters of the federates is also managed through the FIM. These parameters are used to configure different scenarios in a supply chain simulation. FIM is also used for the account management of the various users. 3.2
Authentication Module (AM)
There are three types of authentication in the framework . These are user authentication, company authentication and federate authentication. The need for federate authentication stems from the fact that all changes made by an organization to their federates need to be transparent and clear. Also unauthorized or rogue federates should not be permitted to run in the federation community. This module is available on the AS (AM-S) and the CS (AM-C). Upon invocation of a simulation the AM-S queries the AM-C to send the signature of the federate. This signature is compared with the one in the database and the federate is authenticated. 3.3
Simulation Configuration Module (SCM)
The user is able to configure a supply chain simulation through a series of JSPs displaying the available federates and the configurable parameters. The user may choose to configure a fresh scenario, browse through stored scenarios to select an appropriate scenario or modify an earlier configured scenario. The federation community structure is generated by SCM based on the information sharing requirements. Details of this will be made available in subsequent publications. 3.4
Invocation and Termination Module (IM)
This module enables invocation of the federates and gateways based on the user defined configuration, and the termination of the entire simulation. The federates provided by a particular company and the supporting federation execution run at CS. The gateways running at the AS link the user federations together to create a federation community. Once the user completes the configuration of the simulation, the IM on the AS (IM-S) sends invocation signals which include all configuration information to the respective IM modules on the CS (IM-C). The invocation signals are sent through the AM-S and AM-C. The initiation of the termination process is done by a 'Terminator' federate. The user invokes this federate when he wishes to terminate a particular simulation. The federate then joins all the higher level federations and sends out a termination message (modeled as an interaction) to all the gateways. Any federate that receives the interaction forwards the interaction to the lower level federations, resigns from all federations it is part of and then attempts to destroy them. The execution of the federation community is thus terminated by all federates resigning and all federations destroyed.
825
3.5
Simulation Information Module (SIM)
This module provides users and guests with the information on simulations of the various scenarios running in the distributed environment including their configurations and results. It also allows the user to view previously run scenarios and information pertaining to it. This module can also be connected to a visualization tool to observe an ongoing simulation [3]. 4
Conclusions and Future Work
A solution for management and configuration of distributed simulations has been presented in this paper. With more and more adoption of HLA in the civil domain [6] the need for such tools will surely increase. Future work in the field includes development of appropriate tools for configuration of various input files for remote models, gathering of data from the different hierarchical levels, and analysis of simulations involving hierarchical federation communities. Adoption of industrial standards for data exchange like XML and information exchange like RosettaNet into simulation systems involving federation communities is also under investigation. Future work is also planned in development of strategies and policies to enable such collaboration between supply chain partners. This is necessary to smoothen the transition of this novel methodology of decision-making along with the enabling technology into the industrial domain of supply chain management. REFERENCES 1. Cai, W., S.J. Turner, and B.P. Gan. 2001. Hierarchical Federations: An Architecture for Information Hiding. In Proceedings of the 15th International Workshop on Parallel and Distributed Simulation, pp. 67-74. 2. Gan, B. P., D. Chen, N. Julka, S. J. Turner, W. Cai and G. Li. Benchmarking Alternative Topologies for Multi-Level Federations. Submitted to Simulation Interoperability Workshop (SIW) 2003. 3. Julka, N., B.P. Gan, D. Chen, P. Lendermann, L. F. McGinnis and J. P. McGinnis. Framework for Distributed Supply Chain Simulation: Application as a DecisionMaking Tool for the Semiconductors Industry. In Proceedings of the International Conference on Modeling and Analysis of Semiconductor Manufacturing (MASM) 2002, pp. 376-381. 4. Kujlis, J. and R. J. Paul. A Review of Web Based Simulation: Whither We Wander? In Proceedings of the 2000 Winter Simulation Conference, pp. 1872-1881. 5. Marr, C , C. Storey, W. E. Biles, J. P. C. Kleijnen. A Java-based Simulation Manager for Web-based Simulation. In Proceedings of the 2000 Winter Simulation Conference, pp. 1815-1822. 6. StraBburger, S. Distributed Simulation Based on the High Level Architecture in Civilian Application Domains. Doctoral Dissertation 2001. Otto-von-Guericke University, Magdeburg, Germany. 7. Weidemann, T. VisualSLX - An Open User Shell for High-Performance Modeling and Simulation. In Proceedings of the 2000 Winter Simulation Conference, pp. 18651871. 8. Weidemann, T. Simulation Application Service Providing (SIM-ASP). In Proceedings of the 2001 Winter Simulation Conference, pp. 623-628.
COMPETING RISKS WITH CENSORED DATA: A SIMULATION STUDY ling Lukman1 , Noor Akma Ibrahim3, Fauziah Maarof2, Isa Bin Daud2, Mohd Nasir Hassan1
'Dept. of Environmental Science, Dept. of Mathematics Faculty of Science and Environmental Studies 3
Institute for Mathematical
Research,
Universiti Putra Malaysia Serdang Selangor D.E. E-mail:iingl@yahoo.
com
In this simulation study, two regression models for competing risks with censored data are compared. The first method is the conventional Cox's proportional hazard regression model. The second method is based on Cox's model using a duplicated data technique of Lunn and McNeil. Both methods can be used for any number of different failure types assuming the risks are independent. In this study sample of various sizes and censoring proportion are generated and fitted using both models. The joint estimation of parameters obtained from the data duplication approach has an advantage over the traditional Cox's model thus provide new insight into the analysis of survival data.
1 Introduction Competing risks is a survival model dealing with more than one possible causes of death or failure to a subject in the population [3]. Cox regression is a useful method for survival or failure data analysis [8], Lunn and McNeil [6] analyzed the competing risks in the survival model using Cox proportional hazards model with censored data as an effort to overcome the complexity in the comparison of parameter estimates corresponding to different failure types. Kalbfleisch and Prentice [4] method for competing risks, involved fitting models separately for each type of failure, treating other failure types as censored. We will propose that the modified LunnMcNeil based on duplicated data technique of Lunn and McNeil [6] can work properly. 2 Modification of Lunn-McNeil Method Data Handling. The assumption and data entries are the same as Lunn and McNeil [6]. Suppose that types I and II are given by 8 and 1-8 where 8= 0 or 1. If subject i fails at time t, and the first failure type Si (or 1- 5j) then the second failure type is 1- 8i (or 8;). By providing a column for the second failure type, two entries are made as follows
826
827
Table 1. Covariates
Type Subj Failure time Status 1st Fail
2nd Fail
i t ;
1-5;
i(rep)
ti
1
5i
0
l-8j
8;
1st Fail
2nd Fail
Xi,8iXi
x; , ( 1 - 8 ;
)XJ
Xi,(l-8i)xi xj,
8j Xj
The augmentation of the second failure type in Table 1 is useful for seeking the joint estimation of parameters. Modification of Method. Within the competing risks framework Kay [5] says, for a patient with covariate values x 2 , . . . , xp the estimated hazards for cause-specific j is as follows A,j(t;x)=?i0j(t)(exp(Pj1x1))(exp(pj2X2+
+PjPxP) , j = l,2,...,m
(1)
where xi is the binary treatment indicator and x 2 , . . , xp are the background covariates. The parameter P's are estimated separately for each failure type j by considering all failures of type other than j as censored. The regression model introduced by Cox [1] specifies the hazards rate A,j (t; x) for every individual in terms of a vector of covariates x; and its vector of regression parameters p, then (1) can be then written as follows p
Aj(t;xi) = A,*(t)exp I p r x j r
(2)
r=l
Contribution of (2) to the partial likelihood corresponding to the ith risk set is: p p expZPr Xj r=l
r
/ Z exp jeR
E Pr Xj r=l
(3)
r
By introducing censoring indicator Q; [7] where Q; =1 if t; is failure and Q.i =0 if tj is censored, (3) can be written as exp((Zprxjr)Qi)/ Sexp( i p r x j r ) n i r=l
jeR
r=l
Hence the required conditional log likelihood becomes
(4)
828
L((3)= £ ft i SPrXjr- In [ Eexp(ZprXj r ) ]} i=l
r=l
jeR
(5)
r=l
which is similar to the ordinary Cox regression. 3 Simulation Study The study is to compare both methods, namely cause-specific hazards ordinary Cox and cause-specific hazards based on modified Lunn and McNeil. The simulated data generated is based on [2] of the Standford Heart Transplant Data with five covariates i.e. type of failure, age, mismatch score, age by failure type, and mismatch score by failure type. Exponential distribution is used for covariates age and mismatch score, while binomial distribution is used for indicator variable related to type of failure. The data for the interaction between type of failure (8; ) and covariates (x,) such as age or mismatch score by failure type are merely the product of 8; and X; as indicated by Lunn and McNeil [6] . Censoring indicator variable is fixed complying with certain percentage of censoring that we imposed on the data set. In generating the failure times we imposed two values for X , in the first type of failure 1=1.25, the second ^=.75. This is to get the proportionality. The data generated is simulated 1000 times for every sample size together with the designated percentage of censoring. True (initial) parameter of covariates (3 i = .99; (3 2= .06605; p3=.200;p4=.0654;p 5 =.198. 4 Results Table 2. Analysis of maximum likelihood estimates of cause-specific hazard of rmse Sample size of 15 and censoring percentage of 25 are as follows Modified Lunn-McNeil method Ordinary Cox method Par
A
Mean Bias RMSE
-.008 -.038 .003 .019 -.286 1.405 .134 37.398 .318 .801 .601 36.408 -.252 -.351 1.207 -.856 -.074 -.238 -.062 -.179 .117 612.95 15.49 19.46 32.75 40.31 1.97 .104 .333 .375 Sample size of 15 and censoring percentage of 50 are as follows Ordinary Cox method Modified Lunn-McNeil method
A
A
A
A
A
A
A
A
Par
A
Mean Bias RMSE
15.586 -.831 .124 .622 -.058 -.161 .000 .049 .036 14.596 -.149 .059 -.065 -.149 -.368 -.124 -.361 -.102 222.48 3.034 10.16 .078 3.077 1.627 106.3 13.13 .859 Sample size of 45 and censoring percentage of 25 are as follows
A
A
A
A
A
A
A
A
A
A -.195 -.393 11.27
829
Modified Lunn-McNeil method
Ordinary Cox method Par
A
Mean Bias RMSE
0 2.942 .063 -2.94 -.005 .001 -.002 -.004 23.252 .005 -.198 -.070 2.744 -.927 -.068 -.204 -.065 22.262 -.061 -3.14 .22 .213 .072 47.23 1.105 .071 47.32 6.71 201.45 6.71 Sample size of 45 and censoring percentage of 50 are as follows Ordinary Cox method Modified Lunn-McNeil method
A
A
A
A
A
A
A
A
A
Par
A
A
A
A
A
A
A
A
A
A
Mean Bias RMSE
19.58 18.59 138.06
-.075 -.141 5.163
-1.883 -2.083 38.207
.073 .008 5.16
1.881 1.683 38.19
.087 -.903 1.197
-.004 -.070 .071
-.013 -.213 .23
0 -.065 .078
.004 -.194 .231
5 Conclusion In the cause-specific proportional hazards model the modified Lunn and McNeil method is better (its root mean square error smaller) than the ordinary Cox with respect to sample size and censoring percentage. References 1. Cox, D. R. (1972). "Regression Models and Life Tables (with discussion)". 7. R. Statist. Soc. B, 34, pp. 187-220. 2. Crowley, J and Hu, M. (1977). "Covariance Analysis of Heart Transplant Survival Data". J. Amer. Statist. Assoc. 72, pp.27-36. 3. David, H. A. and Moeschberger, M. L. (1978). The Theory of Competing Risks. London: Griffin. 4. Kalbfleisch, J and Prentice, R. (1980). The Statistical Analysis of Failure Time Data. New York:Wiley. 5. Kay, R. (1986). "Treatment Effects in Competing Risks Analysis of Prostate Cancer Data". Biometrics 42, pp.203-211. 6. Lunn, M. and McNeil, D. (1995). "Applying Cox Regression to Competing Risks". Biometrics 51, pp.524-532. 7 . Noor Akma Ibrahim and Isa Daud. (1995). "Estimating Parameters of Proportional Hazards Model with Censored Data Using SAS". Proceedings of the Annual SAS User's Group Malaysia , pp. 19-20. 8 . Pettitt, A. N. and Bin Daud, I. (1990). "Investigating Time Dependence in Cox's Proportional Hazards Model". Applied Statistics 39, pp.313-329.
COMBINING SUPPORT VECTOR MACHINE (SVR) WITH GENETIC ALGORITHM (GA) TO OPTIMIZE THE INITIAL POSITIONS OF AGENTS IN THE LAND COMBAT SIMULATION L.J. CAO, K.S. CHUA, W.K. CHONG, H.P.LEE AND L. QIAN Institute of High Performance Computing, 1 Science Park Road, #01-01 the Capricorn, Park II, 117528 Singapore. Email: [email protected]
Science
This paper proposes the methodology of combining support vector regressor (SVR) with genetic algorithms (GA) to optimize the initial positions of agents in the multi-agent based land combat simulation, which is a game with two teams of agents competed to capture their opponent agents' flag. Specifically, given a fixed team of agents in terms of their initial positions, the proposed methodology is used to optimize the initial positions of the other team of agents that enable the optimized agents to capture their opponent agents' flag as much as fast. Firstly, a large data set is collected by running the land combat model to record the initial positions of the team of agents to be optimized and the corresponding time difference of two teams of agents to capture their opponent agents' flag. Then, SVR is used to estimate the relationship between the initial positions and the time difference. Finally, GA is used to search for the optimal initial positions where the fitness function is evaluated based on the developed SVR. The simulation shows that the team of agents using the optimal initial positions can capture their opponent agents' flag much faster than me other team of agents without optimization. The result also demonstrates the fact that the initial positions of agents play an important role on the land combat simulation.
1
Introduction
Recently, multi-agent based simulation (MAS) which uses a bottom-up approach to model complex systems has been receiving increasing interest. The reason lies in that MAS could provide a more accurate model than the traditional techniques through the use of more number of spatial degrees of freedom. MAS also emphasizes the adaptability of the elements in the modeling so that the human decision capability can be incorporated into the model to make it more realistic, unlike the traditional methods which treat the modeling as a deterministic process. There are also fewer assumptions in MAS compared to the traditional techniques. One of the successful applications of MAS is in the area of military affairs. Based on MAS, the land combat model called irreducible semi-autonomous adaptive combat (ISAAC) is developed by Ilachinski in 1997 [1]. As illustrated in Figure 1, in ISAAC, there are two teams of agents of the same size, respectively represented by red and blue color. There are also a red flag and a blue flag in the red team of agents and the blue team of agents. The two teams of agents together with their flags are initially located at diagonally opposite corners of the two-dimensional battlefield. The goal of these agents is then competed to capture the opponent agents' flag. The team wins the combat if its agents first reach the flag. In the framework of MAS, learning and adaptation is one of the most important capabilities for agents because of the complexity and dynamic of the environments in the system. Artificial intelligence techniques appear to be the most attractive approaches. Recently, in the enhanced version of ISAAC named as EINSTein [2], GA is used to evolve the personalities of agents. That is, by giving a fixed team of agents in terms of their personalities, GA is used to optimize the personalities of the other team of agents that is "best able" to capture their opponent agents' flag, "best" is measured in terms of either the time to capture the target flag or the casualty in performing the task.
830
831 Motivated by Ilachinski's work, this paper proposes the methodology of combining SVR with GA to optimize the initial positions of agents. In all the previous land combat models, the initial positions for both teams of agents are randomly generated. In this paper, given one fixed team of agents in terms of their initial positions, the proposed methodology is used to optimize the initial position for the other team of agents which enables the team of optimized agents to capture their opponent agents' flag as quick as possible. Our simulation shows that the team of agents using the optimal initial positions can capture their opponent agents' flag much faster than the other team of agents without optimization. The result also demonstrates the fact that the initial positions of agents play an important role on the land combat simulation.
*;::. •
red agents red flag
blue agents blue flag
Figure 1. gives a pictorial illustration of the land combat simulation.
2
The proposed methodology
Data Collection
Support Vector Regressor (SVR)
Genetic Algorithm (GA) Optimizer
Run the simulation using the optimial initial positions Figure 2 A flowchart of the proposed methodology
As illustrated in Figure 2, the proposed methodology consists of three major components: a data collection component, a SVR component, and a GA optimizer component.
832
1. Data collection component: by repeatedly running the land combat model, the data collection component is used to record the initial positions of the team of agents to be optimized and the corresponding time difference of the team of agents without optimization and the team of agents to be optimized to capture their opponent agents' flag, which will be later analyzed by the SVR component. For recording the initial positions, all possible initial positions that could be occupied by agents comprise of one input vector, with the value of 1 denoting the occupancy of one agent and the value of 0 denoting the absence of one agent. 2. Support vector regressor (SVR): Based on the collected data set, SVR is used to estimate the relationship between the initial positions and the time difference of two teams of agents to capture their opponent agents' flag. Compared to other neural network regressors, there are three distinct characteristics when SVMs are used to estimate the regression function. Firstly, SVMs estimate the regression by a set of linear functions which are defined in a high dimensional space. Secondly, SVMs define the regression estimation as the problem of risk minimization where the risk is measured using Vapnik's e-insensitive loss function. Thirdly, SVMs use the risk function consisting of the empirical error and a regularization term, which is derived from the Structural Risk Minimization Principle [3]. 3.GA optimizer: The basic idea of GA to search for the optimal solution is inspired by the Darwin's evolution theory where potential solutions to a problem compete and mate with each other to produce increasingly stronger solutions [4]. The potential solutions are also called chromosomes in GA. The solutions in one generation constitute a population, and the initial population is generated randomly. A fitness function is also required to evaluate the quality of solutions, thus determining the possibility of surviving of solutions in the successive generations. Usually the solutions with larger fitness value will have higher probability of surviving than those of smaller fitness value. Then the three basic operators — selection which copies chromosomes from the current population for the next generation, crossover where parts of paired solutions are combined to create more adaptive solutions, and mutation where one or more components of solutions are randomly changed, are repeated to update the old population until a predetermined number of generations is reached or a satisfactory solution is found. The procedure of GA is outlined as below: t=0 Initialize P(t) Evaluate P(t) While (r<MAXGEN) t=t+l P(t):Se\ect P(t-l) Pf/j.-Crossover P(t) /W-Mutate P(t) Evaluate P(t) End 3
Experiment
The JACOB program developed based on RELATE architecture [5] is used in the data collection component. Each of both teams is composed of 30 agents. All the agents in one team could occupy one different position in the initial position field of 7-by-7 square. The initial position fields for the red team and the blue team are respectively located in the lower left and upper right region of the battlefield of 80-by-80 square. In the collection of data, the initial positions of the red agents are randomly generated in the first run and then
833
kept fixed in the later runs. The initial positions of the blue agents are randomly generated for each run. The JACOB program is run in the high-end computing resources of the Institute of High Performance Computing [6]. A total of 3291 data patterns are collected in the data collection component. Each data pattern consists of 50 bits, with the first 49 bits corresponding to the initial positions of blue agents and the last bit corresponding to the measurement of the time difference of red agents and blue agents to capture their opponent agents' flag in terms of seconds. The whole data set is further randomly partitioned into two smaller data sets: the training set and the testing set. The training set is used for training SVR, and the testing set is used for selecting the optimal parameters of SVR. There are a total of 2800 data patterns in the training set and 491 data patterns in the testing set. The Gaussion function is used as the kernel function of SVR. The values of 8 , C, and £ are chosen based on the testing set, which are respectively determined as 100, 1.0, 0.5. Finally, GA is used to search for the optimal initial positions for the blue agents where the fitness function is evaluated based on SVR. The choice of the initial probability as 0.5, the crossover probability as 0.1, and the mutation probability as 0.01 is because these values could produce the largest fitness value in the converged chromosome. Finally, the JACOB program is run again, where the converged chromosome with the largest fitness value is used as the initial positions of the blue agents and the initial positions of red agents are used as the same values as used in the previous experiment. The time difference of red agents and blue agents using the optimal initial position to capture their opponent agents' flag is 6.53. 4
Conclusion
This paper proposes the methodology of combining SVR with GA to optimize the initial positions of agents in the MAS based land combat simulation. The simulation shows that the team of agents using the optimal initial positions can capture their opponent agents' flag much faster than the other team of agents without optimization. The result also demonstrates the effectiveness of the proposed methodology to optimize the initial position of agents. References: 1.
2.
3. 4. 5.
6.
Ilachinski A. Enhanced /SAAC neural simulation toolkit (EINSTsin): an artificial-life laboratory for exploring self-organized emergence in land combat (U), Center for Naval Analyses, 1999. Ilachinski A. Genetic algorithm evolutions of ISAACA personalities. In Irreducible Semi-Autonomous Adaptive Combat (ISAAC): An Artificial-Life Approach to Land Combat. Center for Naval Analyses Research Memorandum CRM 97-61.10, 1997. Vapnik V. N. The Nature of Statistical Learning Theory, New York, SpringerVerlag, 1995. Goldberg D.E. Genetic Algorithms in Search, Optimization, and Machine Learning, reading, MA: Addison-wesley, 1989. Roddy, K.A. and Dickson M. R., Modeling human and organizational behavior using a relation-centric multiagent system design paradigm, Ph.D. thesis, Naval Postgraduate School, Monterey, California, 2000. www.ihpc.a-star.edu.sg
ACQUISITION OF BACKGROUND COEFFICIENT X. Y. QI AND C. LU Institute of High Performance of Computing, I Science Park Road #01-01 the Capricon Singapore Science Park II Singapore E-mail: qixy,luchun@ ihpc.nus.edu.sg
117528
Z.G.LIU Motorola Co Ltd, USA, 3231 North Wilke Road ARLINGTON HeghtsO, IL 60004 United E-mail: [email protected]
States57
An important issue in data mining research area is forecasting association rules—derive a set of association rules corresponding to the future circumstance from the current dataset. For a new situation, the correct set of rules could be quite different from those derived from the dataset corresponding to the current situation. To derive the set of rules for a new situation, the existing technique of Combination Dataset and Foundation Groups should be used. In this paper, intensive research is focused on the core of this technique—acquisition of Background Coefficients by information gain measure.
1
Introduction
Data mining techniques are well accepted for discovering previously unknown, potentially useful and interesting knowledge from the past datasets. Association rule mining [1] is one of the most important data mining techniques. Much research work has been concentrated on many aspects of the technique to improve the performance of the rule generation [2], however, little attention is paid to the prediction of association rules for the future circumstance with the current dataset. For a future circumstance, because of the unavailability of the data source, the only rules available to make the decision are those mined from the earlier dataset. For example, when a supermarket manager plans to set up a new store in a new location, decisions have to be made including what kinds of goods are likely to be in greater demand and therefore need to be stored more beforehand. Unfortunately, the only information available for the manager to make the decision is the past dataset from the past store that he manages and is operational. But is there any method to obtain a set of association rules for the new store before it starts to run? 1.1 the Concept of the Background Coefficient, Foundation Group and Construction Dataset Consider the above supermarket example. Now setting up a new store, because the new data is still not obtainable, the manager has no other choice but to use the discovered association rules from the past dataset to make the decision. But, if the customer profiles of the two stores are quite different, many of the rules are likely to be inapplicable to the new store. Figure 1 shows the association rules discovered from two stores under one manager with the same supermarket environment. The rule soap => electric shaver is valid for the first store but not for the second, while soap => lipstick is valid only for the second store. The supermarket environment such as all kinds of resources and managerial
834
835
methodology for the store 1 and 2 is the same. Further study shows that 85% of the customers for store 1 are men while 75% of the customers for store 2 are women. Generally speaking, men are more likely to be the customers of electric shavers while women are of lipsticks. Hence, the above rules begin to become understandable.
Confidence threshold = 20%, STORE 1 Identified rule: soap => electric shaver (confidence = 21%) STORE 2 Identified rule: soap => lipstick (confidence = 23%) However, The rule: soap =* electric shaver is not true for store 2 because the fact is that 100 transactions show soap, but only 14 of them are electric shaver. Thus, the confidence level is below the threshold. Figure 1: An example of showing variation of rules in two stores
From the above example, it is seen that the gender of the customer plays an important role in mining the two rules, even though it does not directly appear in the rules as an item. Such kind of background attribute (does not appear in the rules) is called Background Coefficient that can influence the generation of the rules that indicate the relations among the foreground attributes (those items appear in the rules). A set of Background Coefficients with the associated values/value-ranges identifies a Foundation Group that has two natures no matter which circumstance the Foundation Group resides in. 1) The members in the Foundation Group have the same characteristics that result in their same behavior. In the above example, for instance, Background Coefficient "gender" with the associated value "male" identifies a male group whose members are more likely to be the customers of the electric shaver whichever store they go to.
2) Every Foundation Group corresponds to a set of association rules. Still in the above example, the rule that corresponds to the male group is : its members, in general, will buy the electric shaver when they buy the soap. With N Background Coefficients, N-dimensional space can be obtained for specifying the Foundation Groups. For example, if gender and degree are two Background Coefficients, a two-dimensional space can be obtained. Also, the Foundation Groups derive from the combination of the values of these N Background Coefficients. If the values for the attribute "gender" are {male, female} and for the attribute "age" {<25, 25~45,>45 }, then six Foundation Groups need to be formed to address all of the possibilities. {1: Male <25}{2: Male 25-45}{3: Male>45}{4: Female <25}{5: Female 25-45}{6: Female >45} Generally, if there are N Background Coefficients for a circumstance and there are n. possible values for N(. Background Coefficient, for i=l
N. To cover all of the
possibilities for a circumstance, the total number of Foundation Groups is: Nc=n 1 *n 2 *n 3 ,....,n A ,
836
These whole Nc Foundation Groups constitute a set called Complete Group Set. Each circumstance consists of the same Complete Group Set, but every individual Foundation Group in the Complete Group Set is with different proportion. For the above example, both future store customer group and current store customer group consist of those six types of customers. If the current store has more women with age less than 25, the proportion of the Foundation Group 4 in the example should be larger. If for the future circumstance, there is no men with age greater than 45, the proportion for the Foundation Group 3 in the example is simply zero. Thus, the dataset for each circumstance is the sum of all of the Foundation Groups in the same Complete Group Set but with different proportion for each individual Foundation Group. Let the number of Foundation Groups in the Complete Group Set for a circumstance be N, the number of tuples of the Foundation Group i be Fi and the proportion of the Foundation Group i be Pi, for i=l, ,N. The number of tuples of the dataset for this circumstance is:
1=1
Since the dataset for each circumstance is the sum of all of the Foundation Groups in the same Complete Group Set but with different proportion for every individual Foundation Group, this also holds for the future circumstance. Thus if all of the Foundation Groups and corresponding proportions for them, that is. values for the variables on the right side of the formula 1 can be obtained, the dataset for the future circumstance called Combination Dataset can be constructed and the association rules for the circumstance can be mined.
2
Acquisition of Background Coefficients
This section addresses how to find the Background Coefficients, which is also our contribution. According to the section 1, since the definition of the Background Coefficient is that it is a background attribute that can influence the generation of the rules that indicate the relations among the foreground attributes (those items appear in the rules), it shows that this background attribute is highly relevant to these foreground attributes that constitute the association rules. Intuitively, an attribute is highly relevant to a given class if the values of the attribute can be used to distinguish the class from others. For example, it is unlikely that the color of a computer can be used to distinguish expensive from cheap computers, but the brand, the speed of hard-disk and memory are likely to be more relevant attributes. Thus, the first step to find the Background Coefficients is to construct the classes (that will be distinguished) with those foreground attributes appearing in the association rules. To make the classes complete, they are all of the combinations of items in the association rules. Therefore, the attributes that are relevant enough to distinguish these classes are Background Coefficients. Take the rule for division 1 from figure 1 as an example, from the rule: soap => electric shaver four classes can be obtained: (1) buy soap, buy electric shaver (2) not buy soap, buy electric shaver (3) buy soap, not buy electric shaver (4) not buy soap, not buy electric shaver. After the classes are constructed, attribute relevance analysis can be carried out. The general idea is to compute information gain that is used to quantify the relevance of an attribute to a given class.
837
Let S be a dataset that has s tuples, if there are m classes, S contains Sj tuples of class Q, for i=l,....,m. An arbitrary tuple belongs to class Q with probability s/s. the expected information needed to classify a given tuple is
v- s,, s,. ,s m )=-2,— l o g 2 —
«SPS2
i=\
s
(2)
s
An attribute A with values { a ^ , ,av} can be used to partition S into the subsets}Si,S2, ,SV}, where Sj contains those tuples that have value a, of A. Let Sj contain Sy tuples of class Q. The expected information based on this partitioning by A is known as the entropy of A.
E(A)=]T^+ 7=1
+ SmJ
I(sw
,smj) (3)
S
The information gain obtained by this partitioning on A is defined by Gain(A)=I(s,,s 2 , ,s m )-E(A) (4) To use this approach, firstly m classes need to be obtained based on the combinations of items in the association rules, then every attribute except the items in the association rules needs to be tried as attribute A to compute the information gain. By doing so, a ranking of attributes can be obtained. All of the attributes above the relevance threshold are considered as Background Coefficients.
Conclusions It can be concluded that the approach introduced in this paper provides a simple, but powerful means for acquiring Background Coefficient. References 1. R. Agrawal and R. Srikant, "Fast algorithms for mining association rules," in Proceedings of the 20th International Conference on Very Large Databases, September 1994, Santiago, Chile, pp. 487-499. 2. R. Agrawal and J. Shafer, "Parallel mining of association rules: Design, implementation and experience," in IEEE Transactions on Knowledge and Data Engineering, 1996.
A FRAMEWORK FOR A REAL-TIME DISTRIBUTED RENDERING ENVIRONMENT
Zhu Huabing, Chan Kai Yun,Tony Centre for Advanced Media Technology SCE Nanyang Technological University Singapore 639798 E-mail: [email protected], [email protected] This paper proposes a framework for a Distributed Rendering Environment (DRE). Our research focuses on the parallel rendering architecture and the image transmission scheme from a server to its clients. Compared with the object space sorting scheme, the image space sorting scheme requires lower communication bandwidth but is prone to load imbalance. In order to overcome this disadvantage, we propose a Dynamic Mesh-based Screen Partitioning (DMSP) algorithm to partition a large task into several smaller, similar-sized tasks and assign these tasks across the rendering nodes evenly. Another challenge of DREs is the transmission of image data between clients and the rendering engine. We propose a novel image transmission scheme using image-based rendering techniques to solve this problem. The goal of these efforts is to minimize the communication cost without impacting the image quality and load balance.
1
Introduction
C o m p u t e r A i d Design ( C A D ) , Scientific Visualizations a n d 3 D g a m e s often n e e d user-steered interactive displays of very c o m p l e x e n v i r o n m e n t which usually cont a i n s massive d a t a s e t s . Ideally, t h e i n t e r a c t i v e visualization n e e d s t o m a i n t a i n a frame r a t e of a t least 20 frames p e r second t o avoid j e r k i n e s s 1 . However, in m o s t case, such massive d a t a s e t s a r e t o o huge t o b e r e n d e r e d in r e a l - t i m e by even highe n d g r a p h i c s s y s t e m s , i.e. S G I Infinite Reality E n g i n e . M o s t large-scale parallel c o m p u t e r s d o n o t have directly a t t a c h e d , high-resolution graphical displays w i t h interactive i n p u t devices. It is m o r e u s u a l t o access t h e parallel c o m p u t e r s from a w o r k s t a t i o n connected via a L A N , such as E t h e r n e t or F i b e r D i s t r i b u t e d D a t a Interface ( F D D I ) , or even a W A N . T h i s t y p e of access s u p p o r t s s h a r e d use of a n expensive c e n t r a l resource b y multiple users. T h e s i t u a t i o n w h e r e b y t h e user ( a n d t h e user interface) is decoupled from t h e i m a g e synthesis engine is called a s t h e Distributed Rendering Environment (DRE). R e c e n t research work in D R E h a s focused on r e a l - t i m e s y s t e m b a s e d o n clusters of low-cost c o m p u t e r s . T h e goal of an effective parallel r e n d e r i n g s y s t e m s h o u l d achieve t h e following i m p o r t a n t goals: — It should
balance the workload
among
— It should minimize
the overheads
— It should implement stage. — It should minimize part of the system.
an effective
the rendering
nodes.
due to pixel and primitive control component
the communication
between
redistribution.
to avoid limiting
the rendering
the
pipeline
part and the
GUI
I n o r d e r t o m e e t t h e s e r e q u i r e m e n t s , o u r a p p r o a c h focuses o n t h e p a r t i t i o n algorithm and the communication.
838
839 2
S y s t e m Overview
Our system is divided into four subsystems. They are Client Sub-system, Data Management Sub-system, Parallel Rendering Sub-system and Image Composition Sub-system. The Client Sub-system is composed of a GUI and an image assembling component. The GUI provides the interaction with the user and displays the animation from the image assembling subsystem. The image assembling subsystem is used to compose the last image by the raw image generated by the image-based rendering and the image difference from the server side. The data management system partitions the geometry data into viewpoint cells associated with a cull box each. According to the viewpoint, it updates the primary storage memory and secondary storage memory. It manages the data structure of the scene data to optimize the performance. There are two parts in the parallel rendering system, controller and rendering nodes. The controller partitions the screen space and assigns the tasks across the rendering nodes evenly. Another function of the controller is to monitor the workload of every rendering node to avoid load-imbalance. The rendering node renders a region of the screen space and sends it to the image composition system. The image composition system receives the data from the rendering nodes and generates an exact image. It also generates a raw image by image-based rendering from the old image and the predictive image. Comparing the exact image with the raw image, the comparison manager gets the difference and sends it to the client. The system works as following steps. Each new viewpoint sent into the pipeline is passed to the data management system and the parallel rendering system. The viewpoint determines what geometry are retrieved from the database and loaded into primary and secondary storage memory. After cell-based culling, the rendering system runs view-frustum culling and screen space partitioning. The tasks are assigned to the rendering nodes evenly. The image composition system receives the image fragments from the rendering nodes and generates an exact image. Comparing the exact image with a raw image generated by image-based rendering techniques, the system sends the result to the client. In client, the system assembles an exact image with the help of the image difference from server side and a raw image generated by the same method in image composition system. 3
Task Partition scheme
The taxonomy of parallel rendering was developed around 1994 by UNC (University of North Carolina) graphics researchers according to where the redistribution occurs in the rendering pipeline. It includes the classes "sort-first", "sort-middle" and "sort-last" 8 . It has been shown that communication overheads can be reduced by partitioning the workload in screen space so that pixels rendered by a rendering node can be sent to the display directly with little or no depth compositing and oversampling. That means sort-first scheme is a good choice for the real-time parallel-rendering system with PC clusters. However, sort-first is prone to load-imbalance because of the random distribution of the primitives in the screen space. The choice of strategy
840
Figure 1: Procedure of DMSP
for mapping screen regions to processors has a critical impact on the performance of a sort-first system. One of the main advantages of sort-first is its ability to take advantage of the coherence of on-screen primitive movement. In the real-time interactive system, the viewpoint usually changes very little from frame to frame, and thus on-screen distributed of primitives does not change appreciably either. Using retained mode with sort-first means that the resulting database distribution from one frame can form the initial distribution for the next frame. Thus the primitives will migrate from rendering nodes to rendering nodes, and they will only need to be communicated as they cross over boundaries between different rendering nodes, screen regions. Load-balance, another important aspect of the parallel rendering system, centers around efficiency. The key idea of our approach is to cluster primitives into groups for rendering by each server dynamically based on the overlaps of their projected bounding volumes in screen space. It is a Dynamic Mesh-based Screen Partition (DMSP). Each single partition step of DMSP proceeds as follows: 3D object bounding boxes are used instead of the object in the scene. According to the amount of the polygon in the object, its a weight value is set to the bounding box. E.g. an object with 4000 polygons, its bounding box weight is 4000. Then, we split the current region along the longest axis. Hence, we shall sweep vertical lines. We begin with a vertical line on the left. The vertical line will move to the right. The objects assigned as we move the left line will belong to the left group. The line is moved until the total amount of the bounding box weight in the left group is almost half of the total amount of the weight in the screen space. The other objects are assigned to the right group. At that time, if the partition line crosses a bounding box, this object will be assigned to the both groups. Then, the right and left groups are partitioned following the same steps as above. This process is recursively until exactly N tiles and groups are formed, and one is assigned to each of the N Rendering nodes. The following figure shows the partition procedure. Because of the frame-to-frame coherence, DMSP does not need to run every frame. It runs when the load-imbalance rises. In order to utilize the frame-to-frame coherence, the last partition result can be the initial status of the next partition operation. At that time, the system calculates the amount of the primitives in each partition region. Following the last partition sequence, system adjusts the partition lines according to the primitives' amount in each partition region. Figure
841
Figure 2: DMSP—Utilization of Frame-to-frame Coherence
2 shows the steps. When the workloads among the partition region are distributed imbalance, the vertical partition line moves from the initial position right or left until the amount of primitives in both sides are equal. Then the horizon partition lines move from the initial position up or down until the primitives are distributed evenly in both sides. This procedure repeats recursively until the workloads distributes evenly among the partition regions. By this way, the partition cost is reduced obviously because of the temporal coherence and space coherence. Therefore, the average of partition cost of every frame will not be much higher in comparison with some static partition method. 4
Image Transmission Scheme
Reducing the communication overhead between the remote display nodes and the rendering nodes is one of the key problems in distributed rendering systems. Typically distributed rendering system deploys the display nodes remotely from rendering nodes. The straightforward approach of transmitting and displaying rendered images results in a delay of one round-trip between viewpoint change and the corresponding change in the displayed image. The display nodes send the camera parameters to the rendering system and the rendering system send compressed images sequences back to the display nodes. Therefore, this solution will require high network bandwidth to display video at interactive frame rates. Temporal coherence, image space coherence and 3D geometry information are used to overcome the bandwidth limitation in our solution. This framework is designed for a cooperative client-server approach. Our approach is to combine 3D viewpoint prediction with the more general image warping and image interpolating to increase the quality of the raw images and reduce the image data that need to be sent from the rendering nodes. The scheme is to predict the new viewpoint arrived at after several frames according to the camera's motion vector in 3D scene. The predicted frame will be rendered and sent to clients. According to the previous frame and the predicted frame, a raw current frame is generated using image-based rendering techniques by both the client and server. At the same time, the rendering system renders the exact current frame, compares the raw frame and the exact frame and sends the difference to the client side. Therefore, the server sends only incremental amounts of information for each frame, greatly reducing the bandwidth required for remote navigation. As shown in figure 3, by
842 F«*m«1S
Figure 3: Frame Generation in Client
this way, in most cases, a whole image is transmitted once every 15 frames and the other 14 frames are generated using image-based rendering and the difference information. Therefore, the bandwidth requirement can be reduced greatly. Because of the motion prediction, the network latency can be hidden but the media generation latency is unavoidable. With image-based rendering techniques, in order to smoothly transforming one image to another, one option is to warp the referenced image to get the extrapolated images. However, this way is resource critical and difficult to be implemented in real-time system. It also needs pre-processing computation. Therefore, linear interpolating based on pixels' offset vector is a better choice. Image interpolating is the technology that uses image pixel depth information and camera parameters to find out the pixel-by-pixel correspondence between each pair of images. The procedure includes following steps. Firstly with the image warping equation 7 , we can get pairs of corresponding points in the reference image which are projections of the same world space point. Then, offset vectors can be defined for the pixels from the source image to the destination image in image space. The 'offset map' can be created to specify the pixels' motion between the reference images. Each pixel in both the source and destination image is moved along its offset vector by the amount given by linearly interpolating the image coordinate. Figure 4 shows the procedure. When the camera moves linearly, the camera's local coordinates remain parallel. We can get the interpolation equation as below: " 3 = Wl + 0{U2 - « l )
V3 = Vl + P(V2 ~ Vl)
Z(ux,Vi)
Z(u1,v1)
+
/3{Z(u2,v2)
Z(uuVl)
Z{ui,vi) + /3(Z(u2,v2)
Z(ui,vx)) -
Z(uuVl))
(1) (2)
Where the (ui,v\), (u2,v2) are the coordinates of the pixel in the reference images and («3,^3) is the correspond coordinates of that pixel in the interpolated image. (3(0 < (3 < 1) is the linear coefficient and Z(u, v) is the depth of pixel (u, v).
843
Figure 4: Image Interpolation
Therefore, we can find that when the camera pans (the image planes remain parallel), the Z{u2,V2) = Z(u\,Vi). Then, we can get the equation as: u3 = m +/3(u2 - ui)
(3)
v3 = vi + p{v2 - t>i)
(4)
Under this condition, the linear interpolation result precisely matches the images generated in the normal way. In the other words, linear interpolation of images produces valid interpolated views. In most cases, Z{u2,V2) / Z(ui,vx) , but difference between two reference images is slight and imperceptible in our system because the camera does not moves very fast. Therefore, the non-linear factor,-?-, .. -W/"1' •,—y? TT , is close to 1. The result of linear interpolation is much close to the image generated in the normal way. Image interpolation method gives a pair of interpolated images. These can be composited and using a pair of interpolated images in this way to abate the holes problem. This scheme uses image-based rendering technology on two frames to generate the raw image of the internal frame. Therefore, the raw image is much close to the exact one. Hence, the difference will be so slight that the cost of the communication is low enough. Because of the 3D motion prediction, the system can render the frames before they are required to display. Therefore, the network latency can be hidden. 5
Conclusion and Future Work
A load-balance image partition scheme are designed named Dynamic Mesh-based Screen Partition. This algorithm is designed to overcome the susceptivity to loadimbalance for sort-first. The basic idea of this algorithm is to partition the screen
844
based on the distribution of the primitives in the image space. In order to simplify the computation when partitioning, the bound volume is employed instead of primitive groups. In order to decrease the requirement of the bandwidth between the client and server, our research on image transmission results a scheme which is based on the image-based rendering technology. The main advantages of this system are: (1) low inter-communication cost within the cluster for primitive redistribution; (2) low communication cost for image transmission; (3)load balance across the rendering nodes; and (4) sorting is performed before the whole rendering pipeline, enabling the system to be suitable for various shading approaches and models. In the next phase, still a lot of research works need to be done before the proposed scheme to be a robust system. We will investigate the geometry preprocessing step before sorting in detail and try to obtain an efficient algorithm. On the other hand, exploiting frame-to-frame coherence for sort-first is quite challenging and useful. We also wish to implement our system on the GRID platform as well. References 1. Aliaga D., Cohen J., Wilson A., Baker E., Zhang H., Erikson C , HofF K., Hudson T., Stuerzlinger W.,Bastos R., Whitton M., Brooks F., Manocha D., MMR: An Integrated Massive Model Rendering System Using Geometric and Image-Based Acceleration, Proceedings of Symposium on Interactive 3D Graphics (I3D), pp. 199-206, April 1999. 2. Ellsworth D., "Polygon Rendering For Interactive Visualization On Multicomputers", Ph.D. Dissertation, Computer Science Department, University of North Carolina at Chapel Hill, 1996. 3. Eyles J., Molnar S., Poulton J., Greer T., Lastra A., England N., and Westover L., "PixelFlow: The Realization", Proceedings of the 1997 Siggraph/Eurographics Workshop on Graphics Hardware, pp. 57-68, Los Angeles, CA, Aug. 3-4, 1997. 4. Gautier L., and Diot C., Design and Evaluation of MiMaze, a Multi-player Game on the Internet, Proceedings of IEEE Multimedia Systems Conference, Austin, 1998. 5. Hancock D. J. , Hubbold R. J., Distributed Parallel Volume Rendering on Shared Memory Systems, Proceedings of HPCN Europe, pp. 157-164, 1997. 6. Jarmasz J. and Georganas N.D., Designing a Distributed Multimedia Synchronization Scheduler, Proceeding of IEEE Multimedia Systems '97, Ottawa, June 1997. 7. McMillan L., An Image-based Approach To Three-Dimensional Computer Graphics, Ph.D. Dissertation, University of North Carolina, 1997 8. Molnar S., Cox M., Ellsworth D., and Fuchs H., A Sorting Classification of Parallel Rendering, IEEE Computer Graphics and Applications: Special Issue on Rendering, Vol. 14 No. 4, pp. 23-32, 1994.
845 on Rendering, Vol. 14 No. 4, pp. 23-32, 1994. 9. Molnar S., Eyles J. and Poulton J., PixelFlow: High-Speed Rendering Using Image Composition, Proceedings of Computer Graphics: SIGGRAPH '9,2,111 25-31, pp.231-240.Chicago, 1992, 10. Mueller C. A., The Sort-First Architecture for Real-Time Image Generation Real-Time Image Generation, Ph.D. Dissertation, Computer Science Department, University of North Carolina at Chapel Hill, 2000. 11. Wilson A., Lin M., Manocha D., Yeo B. L., Yeung, M., A Video-Based Rendering Acceleration Algorithm for Interactive Walkthroughs, Proceedings of ACM Multimedia 2000, Los Angeles, CA, 2000.
IMMERSIVE VISUALISATION OF NANO-INDENTATION SIMULATION OF CU SHUHONG XU, JU LI*, CHONGHE LI, FRANK CHAN Institute of High Performance Computing (IHPC),Singapore E-mail:
117528
[xush,lich,chancj]@ihpc.a-star.edu.sg
f
MIT, Massachusetts Avenue, Cambridge, MA02139, USA E-mail: [email protected]
This paper introduces the immersive visualisation of nano-indentation simulation of Cu. The molecular dynamics simulation is performed in a system consisting of 326,592 atoms of size 18.6 x 17.0 x 18.4 nm, under periodic boundary conditions. Based on the coordination number calculation, a practical method for the extraction of "extraordinary" atoms is developed. The simulation has been visualised in the CAVE™ - an advanced full-immersive virtual environment.
1
Introduction
With the advent of high performance computing, materials research now embraces another approach: computer simulation. Increasingly materials modelling has taken on the meaning of theory and simulation of materials properties and behaviours. Indeed, in the near future, it is not far-fetched for new materials to be created with enhanced performance, extended service life, acceptable environmental impact and reduced cost. This is realised by exploiting advanced materials modelling and visualisation techniques. Scientists became aware of the importance of molecular graphics in the mid-1960s [1]. However, Due to the limitations of computer hardware and numerical technology, molecular modelling and visualisation research did not prosper until 1990s. In 1992, Richardson [2] described the kinemage, and supporting programs MAGE and PREKIN. This was the first program that brought molecular visualisation to lots of users. In 1993, Roger Sayle [3] developed a much complete molecular visualisation system RasMol and placed the C language source code in the public domain. This allowed others to adapt the program to additional types of computers, and to incorporate RasMol's wonderful user interface and renderings into derivative programs. Such derivatives include notably MDL Chime [4] and WebLab [5], RasMol is widely used throughout the world. However, its as well as PDB file format is not very convenient for large-scale atomistic simulations. So, Ju Li [6] developed his own visualisation program atomEye in 1999. More molecular visualisation tools can be found at [7]. One notable phenomenon is that all these systems confine the user to a 2D environment to visualise and interact with 3D molecular structures. This can be very limiting in that the spatial relationships between atoms may be unclear. Virtual reality systems such as CAVE™ make 3D spatial interaction possible. Being fully immersed in a CAVE™ environment, the user can interact visually, aurally, and tactilely with molecular models in the most natural way. This helps users efficiently extract intrinsic physical properties and gain mechanistic insights from atomistic modelling. Interactive and immersive visualisation of Molecular Dynamics (MD) simulations is still in its infancy, especially for large data. How to efficiently extract useful features within a large-bulk of uninteresting atoms and how to achieve real-time walkthrough speed remains two of the top challenges. This paper introduces the immersive visualisation of nano-indentation simulation of Cu. A practical method for feature extraction is presented and an integrated approach is
846
847 employed to achieve real-time walkthrough speed. The system has been implemented in a CAVE ™ virtual environment. The paper is structured as follows. Section 2 briefly introduces MD simulation of nano-indentation of Cu. The method for "extraordinary" atom extraction is presented in Section 3. Section 4 introduces the techniques for realtime visualisation. Conclusion and future work is drawn in Section 5. 2
Nano-indentation Simulation
MD simulation of nano-indentation of Cu is performed in a system consisting of 326,592 atoms of size 18.6x17.0x18.4 nm, under periodic boundary conditions. The (111) surface of the system faces the indentor, which is a blunt cylinder 2.5 nm in diameter, composed of immobile Cu atoms, as shown in Fig. 1. The simulation is carried out at T = OK and T = 300K. The EMT potential [8] is used. Fig. 2 shows one configuration [9].
Fig. 1 Nano-indentation simulation setup
Fig. 2 Nano-indentation of Cu
The MD simulation shows successive events of stress reliefs in variation with indentor depth, thus providing an atomistic link between heterogeneous deformation response and the theoretical strength of the bulk material. For the details, see [9]. 3
Extraction of "Extraordinary" Atoms
One of the problems of 3D materials simulations is that we may only be interested in a small subset of the data such as dislocations or defects. Often, a feature of interest is embedded within a large-bulk of uninteresting atoms. As a typical example, Fig.2 shows the surface slip lines caused by nano-indentation. But it cannot reveal internal dislocation information. A bulk view without treatment is not helpful due to the large quantity of atoms. To extract interesting features, a practical method is to make use of the coordination number. In a perfect crystal lattice, every atom would have the same number of neighbours since every atom is equivalent. However, in a configuration other than a perfect crystal lattice, the above scheme would yield some atoms with different number of nearest neighbours than the others. For perfect crystal Cu at equilibrium, its crystal structure is FCC and the lattice constant is a = 0.36078 nm. Each inner atom would initially have twelve nearest neighbours and 4r = a , as illustrated in Fig. 3. After indentation, the nearest neighbours of each atom can be found using a cut-off radius /^ ut . If the distance 4 j between arbitrary two atoms Pj and Pj is less than 7^ut, then Pj is one of Pi's nearest neighbours. Obviously, 2r < RcM < a, i.e. 0.255110 nm < /^ ut < 0.36078 nm. An enclosing box {[;q-Ax, xi+Ax], [yi-Ay, y,+Ay], [ZJ-AZ, ZJ+AZ]} can be used to look for the potential nearest neighbours and speed up the distance calculations, where fa, y„ zj)
848
are the Cartesian coordinates of Pj, Ax, Ay, and Az are pre-defined non-negative values. After "ordinary" atom deletion, the "extraordinary" atoms are shown in Fig. 4.
Fig. 3 Crystal structure of perfect Cu ((001) plane)
4
Fig. 4 "Extraordinary" atom extraction
Immersive Visualisation
Interactive visualisation in an immersive environment has crucial requirement on frame rate (normally should > 15 fps). To achieve real-time walkthrough, A. Nakano and R. K. Kalia [10] used a multiresolution MD algorithm and the associated data structures to visualise large quantity atoms. In another paper, A. Nakano [11] employed an octree data structure for visibility culling and levels-of-detail control for fast rendering. Theoretically, these techniques will decrease the number of polygons to be displayed. However, accounting for the user perspective requires additional calculations. In my opinion, solving large-scale MD visualisation problem requires an integrated systems approach. On a high level, the following four issues are worth mentioning: 1.
2.
3.
4.
Data pre-processing and feature extraction. Generally, a direct rendering of all atoms is not helpful. We are interested in extracting the physically based features that we specify. We can represent these features, if successfully extracted, in an economical way for interactive visualisation. Scalable parallel visualisation. To visualise large-scale MD simulations at the highest possible resolution, we need the processing power and memory of a parallel computer. PC clusters, which have become increasingly popular and affordable, make parallel visualisation an even more attractive approach [12]. Optimisation of basic graphics functionality. For molecular visualisation, only very few types of objects in massive quantities, such as spheres (as atoms), cylinders (as bonds), and points (as charge density), need to be rendered. Therefore, very efficient graphics routines might be written and optimised. Visualisation management. There exist some useful techniques for large-scale visualisation, such as visibility culling using octree data structure [13], Level of Detail (LOD) and database paging [14], etc.
We adopt an integrated approach. Before visualisation, MD simulation results are preprocessed and the coordination number of each atom is calculated. According to the purpose of MD simulations, atoms might be divided into three groups according to their significance, i.e. important, normal and less important. For important atoms, we use high quality spheres (more polygons) to represent them. We also extend the LOD technique for navigation. During fast navigation, atoms are rendered in a lower resolution. The system has been implemented in the CAVE™ located at IHPC. The host machine is a SGI
849 Onyx2 with four InfiniteReality2 graphics pipelines, sixteen R10000 CPUs and eight Gbytes of system RAM. Fig. 5 shows one example. 5
Conclusion
The techniques for immersive visualisation of large-scale MD simulations are introduced. Based on the brief introduction to MD simulation of nano-indentation of Cu, a practical feature extraction method is presented. This may help materials researchers identify the atoms of interesting and gain mechanistic insights from atomistic modelling, at the same time, reduce rendering burden. To achieve realtime walkthrough in a full-immersive virtual environment, we adopt an integrated approach. Fig.5 Immersive visualisation in CAVE Visualisation of MD simulations in the CAVE™ environment provides users a very intuitive way to understand, explore and interact with the microworld. The feedback from our users is very positive. Future work will focus on exploring collaborative visualisation of large-scale MD simulations in tele-immersive environments over broadband networks. References 1. Levinthal, Cyrus, "Molecular Model-Building by Computer", Scientific American 214(6):42-52(1966). 2. D. C. Richardson, J. S. Richardson, "The kinemage: a tool for scientific communication", Protein Science 1, 3-9 (1992). 3. http://www.umass.edu/microbio/rasmol 4. http://www.umass.edu/microbio/chime/ 5. http://www.accelrvs.com/about/msi.html 6. http://lonp-march.mit.edu/liju99/Graphics/A/ 7. http://molvis.sdsc.edu/visres/index.html 8. K.W. Jacobsen, P. Stoltze, J.K. Norskov, "A semi-empirical effective medium theory for metals and alloys", Surface Science 366, 394 (1996). 9. Ju Li. "Modeling microstructural effects on Deformation Resistance and Thermal Conductivity ", Ph.D. thesis, MIT, Dept. of Nuclear Engineering (2000). 10. A. Nakano, R. K. kalia, and P. Vashishta, "Scalable Molecular-Dynamics, Visualisation, and Data-Management Algorithms for Materials Simulations", Computing in Science & Engineering 1(5), 39-47 (1999) 11. A. Nakano, M. E. Bachlechner, etc., "Multiscale Simulation of Nanosystems", Computing in Science & Engineering 3(4), 56-66 (2001) 12. B. Wylie, C. Pavlakos, etc., "Scalable Rendering on PC Clusters", IEEE Computer Graphics and Applications 21(4), 62-70 (2001) 13. http://www.flipcode.com/tutorials/tut octrees.htm 14. J. Hartman and P. Creek, IRIS Performer Programming Guide, (Silicon Graphics, Inc. 1998)
Distributed processing and visualization of MEG data DATE SUSUMU, Graduate School of Information Science and technology, Osaka University, Japan Email: [email protected] PROF. SHIMOJO SHINJI, CyberMedia Center, Osaka University, Japan Emsil: [email protected] MIZUNO-MATSUMOTO, YUKO Graduate School of Engineering, Osaka University, [email protected]
Japan
SONG JIE, A/PROF. LEE BU SUNG, A/PROF. CAI WENTONG AND WANG LIZHE School of Computer Engineering, Nanyang Technological University, Singapore E- mail: [email protected]
Magnetoencephalography(MEG) is a non-intrusive method slowly gaining acceptance in use in brain wave analysis. It is an important medical tool for early detection of brain related disease. A major problem is the amount of data generated by such imaging systems and its analysis. In this paper we will present the use of GRID Technology to provide analised the data across Linux clusters located in Japan and Singapore. The data is then visualized and tools for detecting adnomality in the brain signals are highlighted.
1.
Introduction
Medical health care over the years has seen the use of new technology to assist medical personal to more effectively perform their tasks. Highly sophisticated medical technologies such as magnetoencephalography (MEG), positron emission tomography (PET), and functional Magnetic resonance imaging(fMRI) facilitates early detection and diagnostic of medical problem. All these technology produces large quantity of data that needs to be stored and processed. Medical imaging has emerged as an important practical application that requires huge amount of storage as well as computation, which fits nicely into the application that could be supported by Grid Technology. Understanding of the brain function, have always been a major area of interest. With the advance in technology especially sensors technology, researchers are certainly given the opportunity to understanding the brain function much better than before. In a society that has an aging population it is important to have early detection of brain disease. This would help doctors in the early detection. MEG is an important tool that is gaining wider usage for such purposes.
850
851
Our research focuses on the deployment of a distributed collaborative environment for the analysis of MEG data. In this paper we will describe the research and practical work carried out by the participants in this project. Section 2 will introduce the MEG system [1]. Section 3 will describe the infrastructure setup for the SC2002 demonstration. This paper concludes with section 4.
2. MEG system Brain imaging technology has advance tremendously over the past few years. MEG system is a highly sophisticated medical instrumentation system that is used in the early detection of brain disorders, eg. cerebrovascular disease and dementia. It can measure brain signals from multiple measurement points of the head mount. MEG measurement is characterized in terms of high degree of accuracy and non-invasiveness. The amount of data collected with MEG is very large. For example, in the case of 1-hour measurement, at a sampling frequency of 250 Hz using 64 sensors, the amount of data collected reaches 0.9 G bytes. The signal is processed using signal source localization method, i.e. to match the data from the various sensors so that we can find the origin of the signal. To date, there are a number of source localization methods such as SAM (synthetic aperture magnetometry) and wavelet crosscorrelation analysis[3]. The wavelet cross-correlation analysis method was used in our study. The computational requirements for processing the data is very high. The computational model adopted for the analysis is Single-Program Multiple Data (SPMD). The processed data are mapped on to a graphic model of a head with the location of the sensors indicated as shown in figure 1. Different colors are used to color the status of the region. Red color is used to color the sensor that has detected an abnormal signal. The display of the data signal is time synchronized.
852
Figure 1: Visualisation of MEG data 3.
Distributed MEG Infrastructure
The infrastructure to support the distributed computing environment is shown in Figure 2. This set-up is used for the SC'2002 demonstration. There is a set of PC clusters at School of Computer Engineering (NTU), Singapore and the Cybermedia Center of Osaka University in Japan. The experimental environment at the Cybermedia Center is a PC cluster system composed of 12 compute nodes, with each containing single 1266Mhz PHI processor and a PC cluster with 40 nodes. The access to the MEG machine is through a PC running a daemon process. The PC cluster in Singapore consists of 8 Linux machines, each of which has two 500MHz Pentium III processors. User access to the system is through a portal running servlets to control the system functions. The capturing of the data from the MEG machine is done life through the activation of the daemon on the capturing machine. The user defines the duration of the capture period. The MEG data are stored in a directory according to the record of the patient. A LDAP database also stores the patient details and date and time of capture.
853 Osaka thiversity, Japan (40 nodes cluster and 12 nodes cluster)
Ifenyang Technological University, Singapore (8 nodes duster)
Figure 2: Set-up for SC2002 Once the data capture is complete, the user can then initiate the analysis of the data. The distribution of the data and analysis requests are automatically done by the analysis servlet. It communicates with proto daemons of three clusters both in Osaka and Singapore. The proto daemon pulls the raw data to be analyzed within its cluster from the data server, monitors the progress of the analysis, and finally sends back the results to the data server. The signal process of wavelet and cross-correlation analysis is implemented by a MPICH enabled parallel program named "meg". Each "meg" process analyzes the data of one sensor which is distributed by the proto daemon.
854
When the user in Baltimore sends the visualization request, the visualization servlet transfers results from the data server to the client, and reorganizes the data structure as required by the visualization program. Then the user can observe the animation of brain signals of the patient. 4.
Conclusion
In this paper, a distributed infrastructure for real-time capture and analysis of the brain signals obtained from MEG has been explored. To the end, the MEG capturing system was made online at Osaka University, MEG data are analyzed across the distributed environment at Osaka University (Japan) and School of Computer Engineering at Nanyang Technological University (Singapore). The project will be demonstrated live at the coming SC'2002. Although presently we are mainly working on the wavelet cross-correlation as the localization tool, other localization techniques will be explored. In addition we will be working towards optimization of the processing to improve the performance. Certainly Grid technology is well suited for this application. Acknowledgement We would like to express our gratitude to SingAREN and Asia Pacific Advanced Network for the use of the network. References 1. 2. 3.
4.
5.
S. Sato, ed. Advances in Neurology Vol. 54: Magnetoencephalography. NY Raven Press. 1990 htttp: Wwww.globus.org H. Li and T. Nozaki," Wavelet cross-correlation analysis to a plane turbulance jet", JSME International Journal. Vol. 40, No. 1 pp. 5866,1997 I. Foster, C. Kesselman, and S, Tuecke, " An Anatomy of the Grid: Enabling Scalable Virtual Organisation", International Journal Supercomputer Applications, Vol. 15, No3. 2001 Yuko Mizuno-Matsumoto, et. al, " Telemedicine for Evaluation of Brain Function by a Metacomputer", IEEE Transaction on Information Technology in Medicine, Vol. 4 NO. 2 June'2000.
A FAST ALGORITHM OF LEVEL SET METHOD FOR 3D PROSTATE SURFACE DETECTION SHAO School ofEEE, Nanyang Technological E-mail: LING
FAN
University, 50 Nanyang Avenue, Singapore [email protected]
639798
KECKVOON
School ofEEE, Nanyang Technological University, 50 Nanyang Avenue, Singapore E-mail: [email protected]
639798
NG WAN SING School ofMPE, Nanyang Technological E-mail:
Uninersity, 50 Nanyang Avenue, Singapore [email protected]
639798
Prostate surface detection from ultrasound images plays a key role in prostate disease diagnoses and treatments and the level set method can be employed to fulfill this task. This paper presents a fast algorithm of level set method to automatically detect the prostate surface from 3D transrectal ultrasound images. To reduce the computational load, a so-called "narrow band" solution is used to implement the level set method. However, the computation expense increases rapidly with the number of voxels residing in the narrow band and the narrow band rebuilding at each iteration takes up most of the processing time. To speed up the algorithm further, a straightforward solution is to reduce the size of narrow band or the number of iterations or both of them. In this work, we first reduce the image size by sampling it every other slice along x, y, z direction respectively, then we apply the narrow band solution to it to start the surface detection procedure. We applied this fast algorithm of level set method to 8 3D transrectal ultrasound images and the results have shown its effectiveness.
1
Introduction
Prostate boundary detection from ultrasound (US) images plays a key role in prostate disease diagnoses and treatments [2]. Currently, boundary detection and volume measurement are performed manually, which is arduous and heavily user dependent. A possible solution is to improve the efficiency by automating the boundary detection and volume estimation process with minimal manual involvement. In fact, there have been a number of works so far on automatic segmentation of prostate from ultrasound images [1, 2, 4]. However, most of them are interactive (or semi-automatic) and applied to 2D images only. In our work, we developed a new approach based on level set method [3] to automatically detect the prostate surface from 3D transrectal ultrasound (TRUS) images. However, the heavy computational load hinders its practical use. This paper presents a fast algorithm of our proposed method by first reducing the 3D TRUS image size then applying the narrow band solution [3]. The algorithm is detailed in the following section.
855
856
2
Methods
In the level set formulation, the 3D surface (S) detection problem can be expressed as the computation of a 4D function yr as dt
'
'
'
3
where Xe 9t , F is the evolving speed and the deformable surface S(t) (zero level set) is embedded in y/(X, t). The key task of this level set method is to design an appropriate speed function F which can drive the evolving surface to the desired boundary. In this work, F = (\-0.5P-0.5R)-eK is designed, where K is the mean curvature of the surface which keeps it smooth, P is a designation function of intensity probability and defined as f0 [l
// the probability of I(X) > 0.01 else
The intensity probability at voxel X is calculated according to a Gaussian mixture model (GMM) which models the intensity distribution of prostate. After that, we pass a combination of standard deviations of GMM as RT to the region discrimination function R to roughly extract the prostate region: [0 [1
z/2<[max(/J-min(/„)]<^ else
where max(7„) and min(7„) stand for the maximal intensity and the minimal intensity respectively in a cubic (n ) sliding window. In this work, we built a five layer narrow band to implement the level set method. Its 2D projection is shown in Fig.la, where active set contains the set of grid points which lie adjacent to the zero level set, immediate neighbors constitute the actual boundaries of the narrow band and next neighbors are used to compute the second order derivatives of y/ for calculating the mean curvature K. The idea behind the "narrow band" solution [3] is that it only affects points close to the region where the evolving surface is located so that the computation time is reduced significantly. Fig. lb shows an example of this narrow band solution applied to a typical 3D prostate scan. However, the computation time increases rapidly with the number of grid points which reside in the narrow band and the narrow band rebuilding at each iteration takes up most of the processing time. As seen in Fig.l, every run to completely detect the 3D prostate surface takes more than 60 minutes. One way to speed up the algorithm further is to reduce the
857
(a)
(b)
Figure 1. The narrow band construction and its application, (a) 2D projection of a five layer narrow band construction, (b) narrow band application on a typical 3D prostate scan: image size: 256x256x256, iteration number: 2538, elapsed time: 68 min.
number of voxels in narrow band or the number of iterations or both of them. In this work, we reduce the narrow band size simply by compressing the original image into 1/8 of it, that is, rearranging the image data by sampling them every other slice along x, y, z direction respectively. Fig.2b illustrates the data sampling process.
(a)
(b)
Figure 2. The illustration of image size reduction, (a) voxels concerned with the computation in the original image, (b) image data sampling by every other slice along x, y, z direction respectively.
After image size reduction, we apply the narrow band solution to start the surface detection procedure. Obviously, through this way, the voxels concerned with the computation are reduced dramatically. At the same time, the necessary iteration number for deforming the evolving surface to the desired prostate surface is reduced accordingly. 3
Results
Fig.3 demonstrates the experimental results before and after the image size reduction, where the green curves are manually outlined contours (the reference). It is very clear that these two detected results are almost the same compared with the reference but the run time is reduced to 4.5 minutes from 68 minutes. In this work, we applied this fast algorithm to
858
seven other 3D TRUS images to test its effectiveness, and the detected results are similarly satisfactory. As we expected, this fast algorithm of level set method reduces the computation time to at least 1/8 of which is needed by the narrow band solution alone.
(a)
(b)
Figure 3. Comparison of the experimental results before and after image size reduction, (a) the detected result before image size reduction, iteration number: 2538, elapsed time: 68' 04"; (b) the detected result after image size reduction, iteration number: 927, elapsed time: 4' 34".
4
Acknowledgements
The authors are very grateful to Wu R.Y. for his help in data acquisition. They would like to thank Dr. Kwoh C.K. for his knowledgeable comments and suggestions. Thanks to members of CIMIL for their support. References 1. Liu Y.J., Ng W.S., Teo M.Y. and Lim H.C., Computerised prostate boundary estimation in ultrasound images using the radial bas-relief method. Medical and Biological Engineering and Computing 35 (1997)pp.4450-454 2. Pathak, S.D., Chalana, V., Haynor, D.R. and Kim, Y., Edge-guided boundary delineation in prostate ultrasound images. IEEE Transactions on Medical Imaging 19 (2000) pp. 1211-1219 3. Sethian J.A., Level Set Methods and Fast Marching Methods: Evolving Interfaces in Computational Geometry, Fluid Mechanics, Computer Vision and Materials Science. (Cambridge University Press, Cambridge, UK, 1999) 4. Wu R.Y., Ling K.V. and Ng W.S., Automatic prostate boundary recognition in sonographic images using Feature Model and Genetic Algorithm. Journal of Ultrasound in Medicine 19 (2000) pp.771-782
A PATHOLOGICAL DIAGNOSIS SYSTEM FOR BRAIN WHITE MATTER LESIONS HAN SHUIHUA, LI FAN Department of Computer.Huazhong University of Sci & Tech,430074, E-mail: hsuihua® hust.edu.cn
P.R.CHINA
An automatic quantitative analysis system for brain white matter lesions is discussed. In general, brain white lesions are caused by head trauma, cerebral infarcts and so on. It has been proved that properties of the lesions is related to cognitive impairment, but it is a nontrivial task to build reliable tools to relate MR images with pathological findings. Here we have presented a novel algorithm for the segmentation of the brain tissue by mapping red, green, and blue intensity values of the imaged specimens into L*u*v* color space and utilizing the fast nonparametric clustering method. The x, y coordinates corresponding to the outer boundary of delineated white matter lesions structures within the segmented images served as input for shape analysis. Then we propose an image content based algorithm to determine the amount of white matter lesions, to find the location of pathological findings, to detect the speed of progression of the white matter lesions. Using Learning Bayesian Probabilistic, an efficient brain diagnosis expert system is created, it can be used for classification and retrieving of brain white lesions. Experimental results demonstrate that our approach can be very useful for pathologists.
1
Introduction
In general, brain white lesions are caused by head trauma, cerebral infarcts and so on. It has been proved that properties of the lesions is related to cognitive impairment, The ophthalmologist uses images to aid in diagnoses, to make measurement for cerebral volume, to assess brain development, to detect the difference between normal brain and those in pathological states, to look for severity of disease, and as a medical record. It is a natural human desire to find ways to avoid repetitive or routine work and be left with interesting and challenging work. Also it is advantageous to make use of outside expertise at the moment it is needed. There is a need for an imaging system to provide physician assistance at any time and to relieve the physician of drudgery or repetitive work[ \ Here we introduce a PDSB computer vision system, it seeks to reproduce the capabilities of the human expert, who can extract useful information and make decisions about diagnosis or treatment from medical images, even if the images are degraded. The PDSB system extracts objects of interest (lesions and anatomical structures) from the rest of the image of the cerebral volume, identifies and localizes the objects, and infers about the presence and location of abnormalities to make diagnoses or look for change in sequential images. Using a neuroimage database composed of clinical volumetric CT image sets of hemorrhage (blood),
859
860
bland infarct (stroke) and normal brains, a framework of our approach is shown as Figure 1. The three major components in this scheme are (1) Feature extraction maps each volumetric image into a multi-dimensional image feature space; (2) Feature selection determines the relative scale of the feature space and the best metric for image comparison; and (3) Pathological diagnosis give user the most suitable classification result.
Vi *
Quay Image
Speech Mouse
Images Video
Voice Tent
Clinical Data
±
Recognition i | C I3 1
va
CLIENT PROCESSOR ROI
Segmentation
CLIENT PRESENTER
Feature Extraction
\ Internet \
Qutiy DaLa
Display
Rcui«viil JDa'a
DnlaUfcW
SERVER INDEXING
Throws*
l-'eature Mnlchinjt
, 1 * . .1 1
vju.sbU
.1
i 1 1HRARY
flnwiiai IH'A SERVER RETRIEVAL^
\
|
Fig 1 Architecture of the PDSB system
2
Image Preprocessing
The PDSB segmentation is based on nonparametric analysis of the L_u_v_ color vectors obtained from the input image. The algorithm detects color clusters and delineates their borders based on the gradient ascent mean shift procedure , Show as Figure 2. It randomly tessellates the space with search windows and moves the windows till convergence at the nearest mode of the underlying probability distribution. The nonparametric, robust nature of the color histogram analysis allows accurate and stable recovery of the main homogeneous regions in the image. i (Map into Apply I]nf'.-n\' Derive Define] Prune Delineate Input iL*«*v* Mean • Spri iii Cluster SampleCenters Clustersr rtT.-iirunts Shift Image*' Color Centers Set I ftceedure
Figure 2 the processing flowchart of the segmentation algorithm
861
First, the RGB input vectors are converted into L_w_v_vectors following a nonlinear transformation. A set of m points Xi : : : xm called the sample set is then randomly selected from the data. Distance and density constraints are imposed on the points retained in the sample set, automatically fixing its cardinality. The distance between any two neighbors should not be smaller than h, the radius of a searching sphere Sh(x), and the sample points should not lie in sparsely populated regions. A region is sparsely populated whenever the number of points inside the sphere is below a threshold 71. Next, the mean shift procedure is applied to each point in the sample set. The mean shift vector at the point x is defined as
Mh(x) = — T
xf-x~3&
(l'l
' ' «x JsZw /(x) where nx is the number of data points contained in thesearching sphere Sh(x). It can be shown that the vector (Eq. 1) has the direction of the gradient density estimate when this estimate is obtained with the Epanechnikov kernel. The m points of convergence resulted by applying the mean shift to each point in the sample set are called cluster center candidates. Since a local plateau in the color space can prematurely stop the mean shift iterations, each cluster center candidate is perturbed by a random vector of small norm and the mean shift procedure is let to converge again. The computation of the mean shift vectors is based on the entire data set, therefore, the quality of the density gradient estimate is not diminished by the use of sampling. Finally, spatial constraints are enforced to validate homogeneous regions in the image. Small connected components containing less than 72 pixels are removed, and region growing is performed to allocate the unclassified pixels.
Figure 3 Segmentation result for variation of lesions
862
3
Feature Extraction
The inherent categorization of pathologies in medicine provides a way to classify medical images (cases). This content-based organization of medical images naturally forms a hierarchical structure with its bottom leaves corresponding to a set of specific images (cases), and higher nodes correspond to a subcategory of pathological cases. Each image can be viewed as a data point in some multidimensional feature space. The feature vector, the coordinates of a 3D image, functions in turn as the image index. With the help of neuroradiologists, the following salient visual features, which have significant semantic meanings and medical implications in interpreting brain images, have been identified131: • mass effect: asymmetry with respect to the ideal center line, due to structural/density imbalance • anatomical location: where the lesion resides in terms of the brain's 3D anatomical structure • density: relative brightness and darkness of the lesion • contrast enhancement: lesion sensitivity to contrast enhancement • boundary: the region between the lesion and its surroundings • shape: a characterization of the 3D volume the lesion occupies • edema: lucent area around a lesion, usually caused by excessive liquid • texture: the texture of the lesion • size: the dimension of the lesion • age: some visual features vary with patient's age. Lesion Detector: Using the results from the symmetry detector we have further developed the lesion detector which aims at automatically locating possible lesions (bleeds, stroke, tumors) by detecting asymmetrical regions with respect to the extracted central symmetry plane. The goal is to make this process adaptive and robust to different image densities, for example, acute blood appears white on an CT image while acute infarct (stroke) appears dark. Mass effect detector: The extracted symmetry axis is used as the initial position of an open snake. The final resting position of the snake indicates how much the brain has shifted from its ideal centered position due to a tumor. The difference between the deformed midline of a pathological brain and ideal midline, the ratio of the maximum distance of
863
the two curves over their vertical length, is used as a quantified measurement of mass effect. Anatomical location of the lesion detector: In order to determine the anatomical location of a detected lesion in a 3D image, the atlas is deformably registered onto the pathological brain. Figure 10 shows the result of a deformable registration (affine warping) from a 3D digital atlas (MRI Tl image) to a brain with lesion (MRI T2 image). Since the atlas is completely labeled, the general anatomical location of the lesion can be identified from the labeled voxels. While we are working on more sophisticated automatic deformable registration of a brain atlas to a pathological 3D brain image for exact matching, an interactive tool has been used that allows the user to identify corresponding anatomical points on the images, and a simple warping algorithmwarps one image to the other. 4
Feature Selection
In this work, feature selection is designed as a mapping from a potential feature set Fl = {fl; f2; :::; fn} to another feature set F2 = {wifi; W2f2; :::;wnfn} where 0 ^ Wj^ 1. Since some of the w;s may take the value 0, IF2I^IF1I. Two types of features are expected to be removed from Fi: irrelevant features and redundant features. As a result, for any image semantic class ci the posterior probabilities P(cjlFl) and P(CJIF2) are equivalent. The values of the w;s are the result from our learning algorithm. When many w;s are zeros (as is the case in this work) F2 becomes a much lower-dimensional space than Fl, and computation cost is reduced greatly.
864
Salact TsrwlCaee Salsa Rankwf Petrlavac! Cue U*JL.,.11A I . J ^ J
Target
Case
. -
L»d...k._J. ^4 Rank •-, I
Rstrifive Oais^sl '
K«£ricv*<* Gi.<« , 4 1 ftCHOJS
in
WtdgHt
Sim Score:
Patient w
> 43
| j: j
F
in," - i , r - -. '.-i 1
| OO/C0/1B 3.1U.1C
! |-;->
. 2 0 / 4 0 / 4 0 00 00 :'">!?
! hlKh
hl-lh
'
Is ft
:
Field
i^y
iiht h*-* ganglion
' hs«al ganglion
•-•lllp-r -
i siHpsoid
^-,.. ' 11 "(IIN<1
. w i l l oaflnud ;
Mid i i :
Mild
SOM
Modality Voxel Si»(» •MBWher Of Lesion l e s i o n Si«i LBS Suit Density Lesion Side l.psio» loratinn l e s i o n SNtfSR l o s t o n Raundaiv Mass Efiucl
I r. frer.fcal hsactecha
Symptom
! acute, b i s e d
Diagnosis
Figure 4 feature selection by learning algorithm 5 Pathological Diagnosis We consider pathological diagnosis as a process of image classification, here present the format for the inferencing system as it is designed to be completed. Knowledge engineering: The evolution of the expert system involves selecting a set of diagnoses and features, defining a causal probabilistic structure over the set of diagnoses and features, quantizing the features into discrete values, assembling a representative set of images, annotating the images, teaching the network, filling out the feature set with beliefs and frequency values, improving performance, and validating the results of the process . The values of each feature (image manifestation) is correlated to each disease with conditional probabilities, p [M 11 D j ] . Such features include mass effect, anatomical location, density, contrast enhancement, boundary, shape, edema, texture, size ,age, etc. Figure 3 is an example of how relevant cases are grouped in the database. The key question is how we can reach the right branch at the pathological level by starting from the visual feature level.
865
Knowledge acquisition: We insert the relative incidence of each disease in four age groups, 0 to 6 months, 6 months to 2 years, 2 years to 60 years, greater than 60 years. We use annotated examples of each disease to propagate probabilities in the Bayesian network. The computational cost of obtaining each manifestation from an image is entered to help in deriving the utility of the next best feature. If necessary, probabilities are adjusted to optimize classification accuracy. Embedded expert system: Hypothetic deductive reasoning is applied. For any disease group, the values of specified features are always presented to the expert system. Based on the ranking of the diagnoses with the current set of features entered, the feature with the maximum utility from the remaining set is used to update the disease ranking. When the probability of a diagnosis reaches threshold probability, the diagnosis is accepted. (
a set of medical images
inlra-axial
Inside Brain
Q
a pathological subcategory
Extra-axial
Outside brain
DIAGNOSIS
mass aftcx:!
dtmsily
cettxinitj enkisncext
shape
boundary
IMAGERY
Figure 5: Pathological cases are classified by causes, anatomical locations and then visual features.
6
Discussion
In this work, we have demonstrated quantitatively the discriminating power of the statistical measurements of human brain asymmetry. One novelty of our approach in comparison to others in the medical image retrieval domain is to let the computer learn an image similarity metric suited for the given image semantics, instead of imposing such a metric by a human system designer subjectively. The main computational tools used in our study include memory based learning, Bayesian classification. References 1. Liu,Y., Rothfus,W.E., Kanade,T. Content-based 3D Neuroradiologic Image Retrieval: Preliminary Results. International Conference on Computer Vision (ICCV98), Bombay, India, January, 7998 2. David J. Foran, Dorin Comaniciu, Peter Meer etc. Computer-assisted discrimination among lymphomas and leukemia using imunophenotyping, intelligent image repositories . IEEE Transaction on information technology in biomedicine, 4(4), 265-271, 2000 3. Dorin Comaniciu, Peter Meer. Mean-shift: A robust approach toward feature space analysis. IEEE transaction on Pattern analysis and Machine Intelligent,24(5), 2002 4. Goldbaum M, Moezzi S, Taylor A, etc. Automated diagnosis and image understanding with object extraction, object classification, and inferencing in retinal images. IEEE International Conference on Image Processing, Proceedings , 695-698.1996 5. H. G. Schnack, H. E. Hulshoff Pol, W. F. C. Baare, W. G. Staal, M. E. Viergever, and R. S. Kahn, "Automated Separation of Gray and White Matter from MR Images of the Human Brain", Neurolmage, vol. 13, 230-237,2001. 6. D. H. Laidlaw, K.W. Fleischer, and A. H. Barr, "Partial-volume Bayesian Classification of Material Mixtures in MR volume data using voxel histograms", IEEE Trans. Med. Imag., vol 17, 74-86,1998.
USING STREAMING SIMD EXTENSION ON HIGH LEVEL IMAGE PROCESSING M. FIKRET ERCAN School of Electrical Engineering, Singapore Polytechnic, 500 Dover Rd, E-mail: mfercan @ sp. edu. sg
Singapore
YU-FAI FUNG The Hong Kong Polytechnic University, Department of Electrical Engineering, Hong Kong S.A.R.. E-mail: eeyffung® polyu.edu. hk
Hung Horn, Kowloon,
Streaming SIMD Extensions (SSE) is a unique feature embedded in the Pentium III and Pentium IV classes of microprocessors. By fully exploiting SSE, parallel algorithms can be implemented on a standard personal computer and a significant speedup can be achieved. In this paper, we study performance of SSE in higher-level image processing algorithms. Hough transform and Geometric hashing techniques are commonly used algorithms of this class and their implementation using SSE are presented.
1
Introduction
Recent microprocessors include special architectural features in order to boost their computing performance. Many of them employ a set of SIMD registers where data parallel operations can be performed simultaneously within the processor [1,2] and the overall performance of an algorithm can be improved significantly. The performance of Intel's SSE in common image and signal processing algorithms have been studied extensively in the literature. Nevertheless, most of these studies concerned with low-level image processing algorithms, which involves pixels in pixels out type of operations. In this paper, we exploit SSE technology in higher-level algorithms where the recognized features are the output of the operation. Hough Transform and geometric hashing techniques are the most commonly used algorithms of this type. The SSE registers are 128-bit wide and they can store packed values in various types (characters, integers etc.). There are eight SSE registers and they can be directly addressed using their register names [2]. Utilization of the registers is straightforward with a suitable programming tool. 2
Hough transform
The Hough transform is commonly used for detecting lines or circular objects in an image. In general, Hough transform has two phases: voting
867
868
process where the result is accumulated in a parameter space and candidate selection. Line detection: The voting phase involves calculating the candidate lines which are represented in terms of parameters by using the following equation: r = x CosG + y Sin G One method to utilize SSE registers for this computation is to pack four consecutive Cos 6 and Sin 6 values and compute four possible r values for a given x, y coordinate. Row and column values are copied to all the four words of a pair of data packs and computation is done for four values of 8 angle simultaneously. This method will be named as angle grouping (AG). Another method is to pack x, y coordinates of four image pixels into SSE registers. We named this method as pixel grouping (PG). Although the number of packing and unpacking operations is similar, the performance achieved with this method was slightly better. The PG method also enabled further optimisation by loop unrolling technique. This method named as SSE-optimised in the results section. Circle Detection: The Hough transform technique can be extended to circles and any other curves that can be represented with parameters. If point (x,y) positioned on a circle, then gradient (x,y) points to the centre of that circle. For a given radius d direction of the vector from point (x,y) can be computed then the coordinates of the centre can be found. Thus, circles are represented with the following equations where a and b indicates the coordinate of the centre point: a = x - d CosG and b = y - d Sin G The first method that we have employed packs edge pixels and gradient angles and calculation is performed for four of them simultaneously. This method will be called Gradients Grouping (GG). The second method deals with computation of values of a and b. This time coordinates x, y and gradient angle are copied into all the four words of data packs. The second method named as centre point grouping (CG). Timings given in our results are only measured for time consuming accumulator filling step. 3
Geometric hashing
In geometric hashing, a set of models is specified using their features points and a hash table data structure is established. All the possible feature pairs in a given model are designated as basis set. Feature point coordinates of each model are computed relative to each of its basis. These coordinates are then
869
used to hash into a hash table. During recognition phase, an arbitrary pair of feature points is chosen from the scene as a basis and the coordinates of the other feature points in the scene are computed relative to the coordinate frame associated with this basis. The new coordinates are used to hash into the hash table. In our application, the tactic used in parallel implementation was exploring data parallel segments of the algorithm and utilizing the appropriate SSE facilities. Time consuming operation is the computation of the coordinates of all other feature points referring to selected basis and vote for the entries in hash table; this is also where the data parallelism can be exploited using SSE registers 4
Experimental results
For line detection, we have used three different image sizes. For each image size, we have used three images with different percentage of edge pixel content. Table 1 shows execution times for all the test images. From the experimental study we can observe that each method provided a steady performance improvement regardless of image size. On the average PG provided better performance than AG method. However, loop unrolling provided a significant improvement. With the increasing percentage of edge pixels in the image, we could observe slight decline in the performance. This deterioration, which is due to increasing amount of data packing/unpacking operations resulted from increasing amount of edge pixels, did not affect the overall performance dramatically. For the circle detection we have used two different image sizes. For each image size, we have used three images with different percentage of edge pixel content where the smallest percentage has 4 and highest percentage 10 circular objects. The results obtained for the two different techniques are combined in the following Tables 2. We observed a tendency to better speedups with increasing edge pixels in circular Hough transform. CG resulted in slightly better performance compared to GG technique. Scanning the image and packing data into SSE registers generates a considerable overhead in the above implementations and makes a major speedup difficult.
870 Table 1. Performance of different approaches for Hough transform (in msec).
Imag;e size 256x256 Image size 640x480 Img size 1024x1024 %5 Non-SSE 80 60 AG 60 PG SSE Opt. 30
%10 160 130 120 70
%15 240 191 180 110
%5 420 340 320 191
%10 721 591 550 320
%15 %5 1092 1392 1101 892 1042 841 490 611
%10 2744 2274 2133 1212
%15 3996 3315 3054 1792
Table 2. Performance of different approaches for circular Hough transform (in msec).
Image size 640x480 Image size 256x256 %20 %15 %20 %10 %15 %10 Non-SSE 280 360 440 720 912 1592 GG 175 220 259 480 507 885 CG 185 212 254 450 536 838 For geometric hashing algorithm, we have used 1024 randomly generated models, each consisting of 8 points. The scene used in the experiments was also created synthetically and total number of points in the picture was 128. The probe time, that is calculation of coordinates for the remaining 126 points for a selected basis, was 14.8 sec for the sequential case. The SSE utilization reduced this timing to 7.9 seconds. By utilizing SSE, we could save time in computation of coordinates, though voting process has to be sequential which is a considerable bottleneck on the speed-up. 5
Conclusions
In this paper, we have examined the application of SSE to object recognition algorithms. According to our results, a minimum speedup ratio of 1.6 can be obtained without difficulty for all the algorithms we have experimented. In order to utilize the SSE features, only minor modifications to the original program are needed. Most importantly, additional hardware is not required. References 1. AltiVec Programming Environments Manual (Motorola, 2001). 2. The Complete Guide to MMX Technology (Intel Corporation, McGraw-Hill, 1997).
AN APPROACH FOR OPTIMIZATION OF IMAGING PARAMETERS FOR GROUND SURFACE INSPECTION USING MACHINE VISION
V. SIVASANKARAN, A. JOTHILINGAM, B. RAJMOHAN AND G.S. KANDASAMI Department of Production Technology, M.I.T, Anna University, E-mail:vsivasankaran@yahoo. com, lingamjothi@yahoo.
Chennai-600044 co. in
The paper presents a method for the adaptive control of imaging parameters in automized machine vision systems. The imaging parameters are focus, zoom, stop and illumination. By this approach, in ground surfaces that show specular reflections or polished surfaces, the imaging parameter, illumination is adjusted optimally without any prior knowledge about the surface characteristics. As a result, the image generated may be free of irrelevant information in the image. An image series is generated and between each image capture only the angle of incidence of the illuminating light is slightly changed. The image series comprises of about 10 to 20 images, so that between each image the position of the illumination has step forward about 20° to 35°. The criterion applied is "edges or contours from shadows and reflections change their location within the image when the illumination parameters are changed. True edges belonging to features really existing on the object's surface do not change their location. The only change that can be observed with true edges is just a change of the contrast with which these contours appear in the image. An algorithm is developed which measures a contour's tendency to change its location, when the position of the illuminating light source is changed. By comparing the images in the series the optimized image is obtained which is free of contours originating from reflections and shadows.
1
Introduction
Many measurement and inspection tasks, which are not controlled so far, can be automized by means of machine vision systems. Solving these problems is a contribution to enhance the stability of many manufacturing processes and the quality of the related products. In cases, where these tasks are already executed by human inspectors using their own human vision, a considerable amount of non-detected errors can be assumed. This is mainly due to the fast decline of concentration going along with such inspection tasks, high inspection rates, subjective decisions of the inspectors and a lack of training. Nevertheless, human inspectors distinguish by their ability to adapt themselves very fast to new situations and inspection tasks, they are equipped with an extraordinary powerful image processing-the brain and have access to a huge knowledge-base trained over decades. 2
Requirements for modern vision systems
At present vision systems are rather inflexible in use, since they are very much restricted to the inspection tasks, which they specially have been designed for. This applies for the type and arrangement of illumination, the choice of cameras and optical components as well as for the software for image processing. Since vision systems do lack of flexibility, from an economic point of view their investment can only be justified, if inspection tasks for batches, which are large in size, will be automized.
871
872
In many areas of manufacturing a huge number of components and parts can be found, which have metal or polished surfaces showing specular reflections. The realization of an automized inspection for this class of objects but machine vision affords much efforts and skill for the optimized adjustment of imaging parameters like focus, zoom and especially of the illumination, which is a crucial parameter for the images quality. Even if the illumination is optimized, there still have much influence on the image, so that in most cases the vision systems are not robust. Besides the aspect of information loss, illumination is also important means to enhance image features like tiny scratches or engravings on polished surfaces, which can only be observed, when they appear with sufficient contrast in the image. This requires an illumination from special directions. 3
The "optimized image"
The methods presented here focus on 2D-inspection tasks using machine vision with top down illumination directly onto metal or polished reflective surfaces. The optimized control of the illumination is one of the key points in this discussion, since this parameter is decisive for the images quality. The developed techniques for image optimization provide an image as a result, which only contains the edges and contours actually found on the object's surface. This image is almost free of edges, which are originating from undesired artifacts like reflections and shadows. Furthermore, the contours in the optimized images are entire and fully embrace the related feature on the surface. The optimized image is achieved by automatically identifying and enhancing true edges and by damping false edges in intensity. It must be emphasized here, that the resulting optimized image is not a grey scale image, but an image containing only contours like in sobel or any first order high pass filtered image. 4
Methods for controlling imaging parameters
Imaging parameters like focus, zoom, stop and illumination can be split into two groups. Parameters like stop or focus can be optimally adjusted by well known standard feedback control loops. This is due to the fact, that the influence of any change in one of these parameters can be predicted at least qualitatively. It is different with respect to variations of the illumination. Here the exact changes in the image cannot be predicted anymore. This specially applies to metal, polished or reflective surfaces with complex shapes. The reason for this is, that much of their behaviour is determined by the micro-topography of their surface, which principally is unknown in detail. For example grinded or turned surfaces locally behave like grids, which have strong reflections when illuminated from very special directions. 5
Active exploration of the scene by analyzing image series
In many cases the scenery first has to be explored by the vision system actively, before an optimized adjustment of the imaging parameters can be achieved. This exploration has to take place without any knowledge about the scenery. In order to study the sceneries
873
behaviour under variations of the illumination, in the proposed vision system a special procedure initiates the automized capture of an image series, where between each image capture only the angle of incidence of the illuminating light is slightly changed. For this, light sources, which are located in a circular arrangement around the object, are successively switched and an image is captured for each light position. The object and the camera stay in the same position all the time. This makes an comparative evaluation of the images much more easier, since no coordinate transformations have to be applied to the images, before combining or comparing them. The generation of image series is of special interest since in many situations it cannot be assumed, that an interesting feature of an object's surface can be viewed entirely and in full contrast in just one image. Also, the analysis of an image series implies an aspect of active learning. Only by investigating the different behaviour of image contours under variations of the illumination, so called true edges can be distinguished from false edges. 6
Generation of optimized images by means of image series
An image series comprises of about 10 to 20 images, so that between each image the position of the illumination has step forward about 20° to 35°. Due to the characteristic behaviour of true edges and false edges under variations of the illumination, the contours in an image can be classified. A general criterion has been formulated: edges or contours from shadows and reflections change their location within the image, when the illumination parameters are changed. True edges, belonging to features really existing on the objects surface, do not change their location. The only change that can be observed with true edges is just a change of the contrast with which these contours appear in the image. Applying these general rules does not require any prior knowledge about details of the underlying scenery like surface characteristics, features etc. 7 Algorithmic implementation of an image optimization by analyzing image series A method is developed, which measures a contours tendency to change its location, when the position of the illuminating light source is changed. For this, the contour images G„, c and G„.i, c of the captured images say Gn(x,y) and Gn_i(x,y) are generated by high pass filtering. In the contour images, due to the variation of the illumination the shadow and reflex-contours change their location. Then the binarized contour-images are multiplied pixel wise so that only those areas are found in the resulting multiplication-image, where a contour pixel Ck,o(n) m Gn,c has overlapped with a contour pixel C;, o(n-i)in Gn_i There appears a large overlap area OV for the contours. Each contour Ck, o
874
The DOV value of a contour Ck, a© resulting out of the comparison with image Gm is given by DOV (Ck,G,) = E S [ OVk;I (G,(x,y), Gm (x,y) ].50,,. After all contours has been ascribed a DOV, the contours are plotted into a resulting image, where each contour pixel is given DOVkj', G(n), G(n-l) as an intensity value. Repeated application of the procedure to all possible combinations of two images in the series will lead to the completion of the contours of true edges by dominating contours of shadows. The intensity of a pixel in the final optimized resulting image Gres(x,y) id computed by Gres (x,y) = L Z DOV (Ck, G,) 5 (Ck,G(l) -1) The resultant optimized image will be free of contours originating from reflections and shadows. References 1. Pfeifer T and Wiegers L, Adaptive Control for the optimized Adjustment of Imaging Parameters for Surface Inspection using Machine vision. In The Annals of the CIRP Vol. 47/1/1998. 2. Defigueiredo R I P , Illumination control as a means of enhancing image features. In IEEE Trans. On Image Processing, Vol.4, Nr 11. 3. Rafael C Gozalez and Richard E Woods, Digital Image Processing, Pearson Education Asia Pte Ltd. (2000).
MODEL DEVELOPMENT AND BEHAVIOR SIMULATION OF pH-STIMULUSRESPONSIVE HYDROGELS HUA LI, T. Y. NG AND Y. K.YEW Computational MEMS Division, Institute of High Performance Computing, 1 Science Park Road,#01-01,The Capricorn, Singapore Science Park II, Singapore 117528 One of the most intriguing features of pH-stimulus-responsive hydrogels is its ability to perform as an actuator/sensor in BioMEMS devices. A classical example is a closed-loop insulin device constructed by glucose-sensitive swellable hydrogels. This paper presents the development of mathematical models and the behavior simulation of the pH-sensitive hydrogels when immersed in a bathing solution with varying pH. As a preliminary work, one-dimensional models are developed. The diffusion mechanisms of the different species of ions into the fluid-filled region of the polymerbased porous hydrogels from the external bathing solution are described by the Nernst-Planck equations. The Poisson equation describes the variation of the electric potential distribution with diffusing. The mechanical equilibrium equation is used to describe the deformation of skeletal solidphase long-polymer-molecules network of the hydrogels, in which the osmotic pressure generated by the ionic concentration differences between the interior and exterior hydrogel acts as a driving force for deformation. The relation between the concentrations of the fixed charge and hydrogen ions is obtained based on the Langmuir isothermal adsorption. The response of pH-sensitive poly-HEMA (2-hydroxyethyl methacrylate) hydrogel immersed in a NaCl solution with a simple buffer generating various pH environments is simulated. Numerical results are presented for validation with experimental data. They prove the presently developed models to be satisfactory in predicting the swelling/shrinking trend of the hydrogel behavior in various pH environments.
1
Introduction
Based on the Poisson-Nernst-Planck (PNP) equations, the mathematical models including the chemo-electro-mechanical multi-field effects are developed for the first time, known as the Multi-Effect-Coupling (MECpH) model for pH-responsive hydrogel. The presently developed models are able to simulate the behaviors of the responsive hydrogels stimulated by pH. Usually the Nernst-Planck flux system is used to describe the transport mechanisms of ionic species in solution. It is obvious that the Nernst-Planck system is insufficient as it includes only the gradient effects of ionic concentrations and electrical potential. Therefore, a more rigorous model is required to include the variation of the electric potential with the spatial distribution of the electric charges. This requires to couple Poisson equation to form the PNP system. Further, this models couple the mechanical equilibrium equations with the PNP equations for deformation simulation of hydorgels. One of the important contributions of the presently developed MECpH models is to develop a relation between the fixed charge bounded on the long-molecules chain network and the diffusive hydrogen ion for hydrogels stimulated by varying pH of surrounding solution, based on the Langmuir absorption isotherm. The MECpH models for pH-stimuli-responsive hydrogels is able to simulate the concentration distributions of diffusive ionic species, the electric-potential distribution and mechanical deformation of the hydrogels when immersed in the bathing solution with varying pH. In order to validate the MECpH model, a one-dimensional steady-state simulation is conducted numerically by a novel developed meshless technique, Hermite-Cloud method [1, 2]. After compared with experimental results [3-5], it is observed that the present model is accurate and stable.
875
876
2
Presently Developed MECpH Model
If the convective transport of the ionic species is neglected, the Nernst-Planck equation describing flux of ionic species k in solution, is derived based on the mass conservation Jk =-[Dk](grad(ck) + zkFck gratify)/ RT + ck grad(lnyk)) (k=l,2,3,...N) (1) where Jk is the flux of the Mi species , Dk the diffusivity tensor, ck the Mi ionic concentration, zk the Mi-ionic valence number, \ji the electrostatic potential and yk the chemical activity coefficient. F, R and Tare the Faraday's constant, universal gas constant and absolute temperature respectively. The Poisson equation is used to describe the spatial distribution of the electric potential in domain,
V2y/ = -(F/ee0)(lzkck + zfcf)
(2)
where cy is the density of fixed charge group in hydrogels, e relative dielectric constant of the surrounding medium and 8o the vacuum permittivity or dielectric constant. According to the Langmuir absorption isotherm, a relation between the fixed charge and the diffusive hydrogen ion is developed as, zfcf=-(csm0K/H(K + cH)) (3) where csm0 is the xerostate concentration of fixed charge, cH the concentration of hydrogen ion H+ within the hydrogels, K dissociation constant of carboxylic acid groups, H the local hydration of the hydrogel. The mechanical equilibrium equation that describes the hydrogel deformation is written as ffI=((2^1)«J-J!TI(crcJ))I=0 (4) where ck is the concentration of Mi ion species in stress free state and ck the concentration of Mi ion species within hydrogels. 1 and// are the Lame's coefficient of the solid matrix. 3
Meshless Hermite Cloud Method
Based on the classical RKPM [6], the approximation f(x, y) of function/(*,)>) is given as / ( * , y) = | C(x, y, p, q)K(xk -p,yk-q)f(p, q)dpdq (5) where C(x,y,p,q) is correction function, K(x-p,y-q) kernel function that is centered at points (xk ,yk) and constructed by different weighted window functions depending on PDBV problems. A cubic spline function is considered here as K(xk-p,yk -q) = l/(AxAy)*W'[(xk - p)l Ax)]W*[(yk -q)/Ay] (6) where W*(z) is a cubic-spline window function. The C{x,y,p,q) is expressed as a linear combination of independent basis functions. Therefore, the C(x,y,p,q) is expressed as a product of a jS^-order row basis function vector B(p,q) and a jS^-order column coefficient vector C (x,y), C(x,y,p,q)=Yl(p,q)C (x,y), where 1-D linearly independent basis function, B(p) = {b1(p),b2(p),...,bp(p)}
= {\,p,p2} ( £ = 3 )
The correction function coefficient is provided by C\x,y) = A-\xk,yk)BT(x,y)
(7) (8)
where \{xk ,yk) is a symmetric matrix. By employing the point collocation technique for discretization with combination of the Hermite interpolation theorem, a true meshless approximation f(x, y) of the unknown real function f(x,y) is constructed as follows, / ( * . y ) - I N J „ + l(x-NiNnx„)MmGxm n=l
m=\
n=l
+ I ( y - IJV„yn)MmGym m=l
n=\
(9)
877
in which Nn = N„(x,y) = B(pn,qn)A-i(xk,yk)BT(x,y)K(xk
-p„,yk
-qJASJs
as shape functions corresponding to f(x, y) and in similar manner as Mm corresponding to fiX(x,y)
defined -Mm(x,y)
1
and f
The Hermite-Cloud method is able to directly compute the approximate solutions of both the unknown function and its first-order derivatives. Further, the results at discrete points in the domain are much more refined when compared with the classical RKPM. 4
Numerical Results and Discussions
0.01
0.C05 x[m]
(a)
(b)
Figure 1. Comparison of electrical potential in the gel and bathing solution due to applied external electric field betwen (a) stabilized space-time FEM [5] and (b) Hermite Cloud meshless method [1,2].
bathing sojytion
^ hydrogel
(a)
(b)
Figure 2 (a) Geometrical narration of the hydrogel and its bathing solution domain, (b) Comparison between experiment and numerical results for equilibrium swelling of hydrogels as a function of bath pH at 25°C.
4.1
Effect of Externally Applied Electric field
Let us consider a stimuli-responsive hydrogel immersed in a NaCl bath solution with simple buffer (15x15mm2). The concentration of fix-charge groups within the hydrogel is Cf - lOmM. The boundary conditions of Na+ and CV ions in the solution are set to c + = c =lmM. An external electric field is applied with a time-constant electric potential 0.1V next to the anode and -0.1V to the cathode. Figure 1 shows the comparison
878
of numerical simulations of the MECpH model with those of FEM, where the maximum relative error is less than 10%. Curve 1 depicts the linear variation of electrical potential in the solution before the hydrogel is immersed. Curve 2 shows the superposition of both the electrical potentials after immersing the hydrogel without applied electric field and the linear curve 1. Curve 3 represents the simulated electric potential in both the hydrogel and exterior solution when external electric field is applied. 4.2
Equilibrium swelling patterns of pH-stimuli-responsive hydrogel
As shown in Figure 2(a), a cylindrical hydrogel with 400um in diameter at dry-state is immersed in a bathing solution with ionic strength of 300mM, where, as a pH stimulus, the pH of surrounding solution is subjected to step change. The 1-D steady-state stimulation is conducted, in which the computational domain covers only half of the problem domain due to symmetry. Figure 2(b) presents the comparison between the present simulation results shown by solid line and equilibrium experimental data [3-4] of the diameter of cylindrical hydrogel by circular markers. As visualized from the figure, the simulation results were comparable well with the experimental works. 5
Conclusion
This paper develops a chemo-electro-mechanical multi-field coupling (MECpH) model which is able to make qualitative comparison with experimental swelling measurements of poly-HEMA hydrogel. Despite the complexity of the swelling behavior with steps changes in bath solution pH, plenty of physical insight can be obtained by systematically varying the hydrogels and outer solution composition, based on computer-based numerical simulation. References 1. Li Hua, Ng T. Y., Cheng J. Q. and Lam K. Y., Hermite-cloud: a novel true meshless method. Comoutational Mechanics (Submitted). 2. Ng T.Y., Li H., Cheng J.Q., Lam K.Y., A new hybrid meshless-differential order reduction (hM-DOR) method with applications to shape control of smart structures via distributed sensors/actuators, Engineering Structures, (in press). 3. David J. Beebe, Jeffrey S. Moore, Joseph M. Bauer, Qing Yu, Robin H.Liu, Chelladurai Devadoss, Byung-Ho Jo, Functional hydrogel structures for autonomous flow control inside microfluidic channels, Nature 404 (2000) pp.588-590. 4. B. Johnson, D. J. Niedermaier, W. C. Crone, J. Moorthy, D. J. Beebe, Mechanical Properties of a pH Sensitive Hydrogel, Society for Experimental Mechanics, 2002 SEM Annual Conference Proceedings, Milwaukee. (2002) 5. Gulch R. W., J. Holdenried, A. Weible, T. Wallmersperger, B. Kroplin, Polyelectrolyte gels in electric fields. A theoretical and experimental approach, Smart Structures and Materials 2000, Electroactive Polymer Actuators and Devices, Proc. SPIE 3987 (2000) pp. 192-202. 6. Liu W.K., Jun S., Zhang Y.F., Reproducing kernel particle methods. Int. J. Numer. Meth. Engng 20 (1995) 1081-1106.
FRINGE-FIELD AND GROUND PLATE EFFECTS FOR ELECTROSTATIC MEMS SIMULATIONS ANDOJO ONGKODJOJO1 AND FRANCIS E.H. T A Y 1 2 'Micro- & Nano- Systems Cluster, Institute of Materials Research and Engineering, Link, Singapore 117602 E-mail: [email protected] 2
Department
3 Research
of Mechanical Engineering, National University of Singapore, 10 Kent Ridge Crescent, Singapore 119260 E-mail: [email protected]
This paper considers inherent non-idealities such as fringe field and ground plate effects that are currently limiting wide commercialization of certain MEMS devices. In this case, electromechanical derivations including fringing fields tend to significantly change the electrostatic force and the capacitance detected. The analytical derivations are compared with an electromechanical analysis of IntelliSuite™ for verifications, and the generated FEA results are in good agreement with the analytical computation results. In addition, it is clear that there is a fringe field correction factor of 0.046 and a ground plate factor of 0.5 affecting the resulting displacements. Thus, this research work certainly has an important impact on the design of MEMS by solving the difficulties associated with the realistic problem formulations.
1
Introduction
Research and development efforts in MEMS have been directed at design, fabrication and testing of MEMS devices. Nevertheless, analysis and computer simulation of MEMS devices are also active fields of research and development [1]. There are many commercially available MEMS simulation tools. CAD tools based on the finite element method and the boundary element method such as Oyster™, MEMCAD™, CAEMEMS™, SESES™ and IntelliSuite™ have been developing rapidly since the late 1990's [2]. These CAD systems applicable to MEMS are primarily aimed at simulating fabrication processes and the electromechanical behavior of a given design. The determination of dynamic behavior via full 3-D simulation is computationally very expensive and time consuming. To overcome these shortcomings, we have reported the novel backpropagation approximation approach based macromodeling technique [3]. In this paper, we propose analytical modelings of MEMS devices by deriving their mathematical equations for design problems of actual MEMS devices. We focus on more realistic cases such as fringe field and ground plate effects for electrostatic simulations, which are required for the analysis of micro electromechanical systems. The mathematical model and the FEA results are compared for yielding suitable theoretical design considerations for practical measuring structures. This process is applied to our single mass micro-gyroscope as shown in Fig. 1.
879
880
2
Mathematical Modeling of Electrostatic Force and Capacitance
In order to construct a model in the electrostatic energy domain, an analytical model of the capacitance of the system must be developed. The electrostatic forces are found by computing the spatial derivative of the electrostatic co-energy T51: Vibrating Direction Beam
V^P
/
Proof Mass Resonator (Movable Part) z
,r y
Stationary Comb-Drive Figure 1. Schematic diagram of the comb-drive single mass microgyroscope [4]
F, = -
2
V2
(1) where Fe, V and C are the electrostatic force, the applied voltage and the capacitance between the conductors respectively. As the applied voltage is independent of motion, the gradient is applicable to the capacitance only. The electrostatic forces in the 'push' direction (per unit length) are expressed by:
F
V( 2
--i£ '>
(2) where V(t) is the applied voltage for step actuation input voltage (V). The capacitance between the comb fingers in the 'push' direction is given by equation (3), with the assumption that the capacitance due to fringing fields between the interdigitated fingers and a presence of a ground plate can be neglected [6]: c = ( N + 2)eoh(x 0 + x) g (3) where N, Co, Xo, x, h and g are the number of comb-drive fingers (one side), the permittivity of air (F/m), the overlap displacement (initial displacement) in the x-direction (m), the deformed displacement (m), the height of the comb finger (m) and the finger gap (m) respectively. The derivative of the above equation with respect to x is given by: 3 C _ ( N + 2)e0h 9x
g
(4) If we consider the more realistic case where there are fringing field effects and a ground plate beneath the fingers of our device, equations (10), (11), and (12) of [7] have been reported. However, we need to modify and apply the equations for our MEMS device application as equation (12) of [7] is per unit length and per movable finger. In this paper, parameters such as c, d, I, g, Xo, and h are replaced by Wfmgen h, lfmger, g, XQ, and d respectively. The equation has to be multiplied by the number of comb-drive fingers (one
881 side), N and the overlap finger length, x0 as shown in equation (5). The global problem including fixed and movable fingers and their interactions, which is also influenced by the presence of the ground plate, has been considered to obtain the true results for our problem. The global problem has been modeled by placing the magnetic line currents, and more descriptions have been reported [7]. Thus, the corrected electrostatic forces with the ground plate effect are given by: c
_knger+gL
•!Nt0-
2TC
4>, (<|>2 V
^ ^ 2 v(t) 2V
(5) where CQrmger is the width of the comb-drive finger (m), if\ is the potential above the engaged fixed-movable comb finger regions (Volt), expressed by equation (14) of [4], and §2 is the potential above the unengaged fixed comb finger regions (Volt), expressed by equation (15) of [4], The equation is similar to equation (12) of [7] with some modifications as presented before, and is suitable for our MEMS device application, which considers the fringe field and ground plate effects. In addition, the applied input voltage for step input actuation is included explicitly in equation (5). If we adopt equation (2), the corrected electrostatic forces including the fringe field effect (per unit length) will be expressed as: Fe_c= e c
I^LV(t)2 w 2 3x
(6) where dCJdx is the corrected capacitance gradient due to the fringing fields between the interdigitated fingers as shown in equation (7). dx
e
g
(7) where o^, is the fringe field correction factor for our MEMS device application as given by: N g(w finger +g) '
Numerical Results and Discussion
In this section, we discuss our analytical simulation results based on the equations as expressed in the previous section. These results are compared with those for the full 3dimensional coupled simulation using IntelliSuite™ as shown in Table 1. The analytical results are obtained using the Simulated Annealing algorithm, which has been reported in [4]. This table clearly shows good agreement between the analytical simulation results and the FEA results. Fig. 2(a) and (b) show the deformed MEMS structure in the lateral direction generated by IntelliSuite™ without and with the ground plate respectively.
882 Table 1. Comparison of the Maximum Lateral Displacements between the SA-based Analytical Simulation and the FEA Simulation Generated by IntelliSuite™
Structure
No Ground With Ground
SA without Fringe Field Effect (urn) 5.98 x 10'2
SA with Fringe Field Effect (urn) 5.54 x 10 3 2.77 x 10"3
FEA (um) 5.39 x 10"3 2.83 x 10'3
Error (%) 2.78 2.12
Error*' (%) 9.1x10'
This error is a discrepancy between the SA and the FEA without fringe field effect.
It is also clear that a ground plate effect, having a factor of 0.5 has been inserted into equation (5). Thus, the presence of the ground plate weakens the resulting displacement by exactly 50%. For the fringe field effect, the displacement is clearly weakened by roughly 90%. By using equation (8), a fringe field correction factor (a,.) of 0.046 has been easily obtained. The presence of the ground plate without the fringe field effect has been reported [6]. However, the results show the ground plate effect, which weakens the capacitance gradient by roughly 30%, is only for a gap distance of 2 u.m and a structure height of 2 urn These are really different from our design values as reported in [4].
(a)
(b)
Figure 2. (a) The Deformed Microgyroscope without the Ground Plate; and (b) The Deformed Microgyroscope with the Ground Plate generated by IntelliSuite™ [4]
4
Conclusion
This research work certainly has an important impact on design of MEMS by solving the difficulties associated with the practical and realistic problem formulations. We can directly use the mathematical modelings, which consider the non-idealities, for determining dynamic behaviors of actual MEMS devices without using the full 3-D simulation. The results, which have been shown and demonstrated, validate our mathematical modelings; and our mathematical models in turn, are valid for electromechanical and electrostatic simulations. 5
Acknowledgement
This work was supported by the DSO Gyroscope Project (DSO/C/98063/L), Singapore Defence Science Organisation, Singapore.
883
References 1. Ye W. and Mukherjee S., Optimal shape design of three-dimensional MEMS with applications to electrostatic comb drives. Int. J. Numer. Methods Eng. 45 (1999) pp. 175-194. 2. Madou M. Fundamentals of microfabrication (Boca Raton, FL: CRC Press, 1997), pp 375-380. 3. Tay F. E. H., Ongkodjojo A. and Liang Y. C , Backpropagation approximation approach based generation of macromodels for static and dynamic simulations. Microsyst. Technol. 1 (2001) pp. 274-282. 4. Andojo Ongkodjojo and Francis E. H. Tay, Global optimization and design for microelectromechanical systems devices based on simulated annealing. J. Micromech. Microeng. 12 (2002) pp. 878 - 897. 5. Gabbay L. D., Computer aided macromodeling for MEMS (PhD Thesis, Massachusetts Institute of Technology, Cambridge, MA, 1998). 6. Tang W. C. Electrostatic comb-drive for resonant sensor and actuator applications (PhD Thesis, University of California at Berkeley, 1990). 7. Johnson W. A. and Warne L. K., Electrophysics of micromechanical comb actuators. J. Microlectromech. Syst. 4 (1995) pp. 49 - 59.
A COUPLED MULTI-FIELD FORMULATION FOR STIMULI-RESPONSIVE HYDROGEL SUBJECT TO ELECTRIC FIELD
ZHEN YUAN, HUA LI, T. Y. NG AND JUN CHEN Computational MEMS Division, Institute of High Performance Computing, 117528, Singapore E-mail: lihua @ ihpc.a-star. edu. sg Based on a multi-phasic mixture theory and convection-diffusion equations for ion concentrations, a multi-field formulation including the effect of chemo-electro-mechanical coupling is presented to simulate the response behaviors of stimuli-sensitive hydrogel immersed into a bath solution applied to an external electric field. The presently developed mathematical models consist of the continuity equations for the solid, interstitial water and ion phases, the convection-diffusion equations for the distributions of diffusive ion concentrations, Poisson equation for the electric field and the momentum equation for the mixture phase. To solve the multi-field coupling nonlinear governing equations, a hierarchical iteration procedure is conducted and the steady-state response of a hydrogel strip subject to the external electricfieldis numerically simulated by a developed meshless HermiteCloud method. The ionic concentrations, electric potential in both interior and exterior hydrogel and the strip deformation are studied. The simulating results validate the presently developed model.
1
Introduction
Over the past decades, the actuators/sensors based on stimuli-responsive polymer hydrogels have attracted much attention for wide-range biological applications such as artificial muscles and Bio-MEMS. Their reversible volume changes can be induced by external bio-stimuli including pH, temperature and electric field. Usually hydrogels are composed of the solid, interstitial water and ion phases. If an external electric field is applied, the ions flow, osmose and redistribute. This results in the hydrogels swelling, shrinking and bending with the multi-field coupling response. The triphasic mixture theory was early developed by Lai et al. [1] for the swelling and deformation behaviors of articular cartilage. Based on this theory, a new mathematical model is developed with consideration of chemo-electro-mechanical coupling effect, in which the modified continuity equations include the influence of electric potential, and the solid-phase displacement is explicitly computed easily. Further, the present computational domain covers both the hydrogels and surrounding solution. In the developed mathematical model, the continuity equations describe the solid, water and ion phases, the convection-diffusion equations are for the distributions of diffusive ion concentrations, Poisson equation is for the electric field and the momentum equation for the mixture phase. By a novel meshless numerical technique called Hermite-Cloud method [2], the coupled nonlinear governing equations are solved for simulation of responses of stimuliresponsive hydrogels applied to an external electric field, including the distributions of ionic concentrations and electric potential as well as swelling deformation of hydrogels.
2
Developed mathematical models
If the body and the inertial forces are neglected, the governing equations of the multiphasic model are briefly summarized as follows, Momentum equations: V • a = 0 and paV{ia - II" = 0 (or = w,+,-)
884
885
Continuity equations -2— + V(>ava) = 0(a = + -) and V - ( 0 V +
Constitutive equations ff = - P / + 4 s fr(£)/ + 2//,£ w
M =Mo-Rm(c+ + c-)/pT + ^ r Ma=Mg + RT\n[ya(ca)]/Ma + zaFyf/Ma (a = + -) Poisson equation for electric field p
n +n
f i>
For infinitesimal deformation, if assuming 0s +
cF=cg(l + e/tf),
^ = l - ( ^ / ( l + e))
According to the constitutive and momentum equations, the continuity equation of the mixture phase is derived as V ( « s , ) = V - { ( ( ^ ) 2 / / w s ) [ V p - / ? r ( a ) - l ) V ( c + + c _ ) + F c (z + c + + z_c_)V^]} A diffusion analysis is coupled with the Poisson equation to describe the chemical and electric response of hydrogel immersed into a bathing solution. The present diffusion equations for ion concentrations are given as, Ca,,=(.Dacaj + (F/RT)zacaDa^ti)J-(cavi)j + ra(ca) ( a = +,-) In the governing equations above, a is the elastic stress of solid matrix, us displacement, c ion concentrations, y/ electrical potential and p fluid pressure, they are solved by the meshless Hermite-Cloud method. The fixed charge concentration cF and fluid-phase volume fraction (j>w are also computed iteratively. For steady-state analysis, the continuity equations can be rewritten, V • { ( ( f ) 2 / fws )[Vp - RT(.0 - l)V(c+ + c_) + Fc (z+c+ + z_c_ )V ¥]} =0 V • {DaVca)+ F(zaDaCa¥,), /RT=0 where,
886
3
Numerical results and conclusions 0.005
I < j(mm) -7.5 solution
_ _ _ •»~ — '"' ' — .. ; — \ ,•" — '•^
hydrogel
+ + + + + + + +
0.006
0.007
0.008
0.008
-•"
0.00065-
15 x<mm
0.0
,.'
0.00080-
«_
•
•
0.00045-
_••'
0.00040-
•"
0.006
•
0.006
0.007
0.O0B
0.000
0.010
X(rr»
Figure 1. A hydrogel strip in a bathing solution applied to an external electric field.
Figure 4 (c). Strain distribution.
Figure 2. Distributions of ionic concentrations (right) and electric potential (left) without external electric field.
ooDO
ocffi
C U M awe
QOIO
ooiz
oow
Figure 3. Distributions of ion concentrations (left) and electric potential (right) with external electric field. O.0O5
0.006
0.005
0.008
0.007
0.006
0 009
0.010
0006
0 006
0 007
OOP.
0O1O
Figure 4. Mechanical behaviors with external electric field: (a) Displacement, (b) Pressure and (c) Strain.
887
For a numerical validation, let us immerse a hydrogel strip into a bathing NaCl solution with 2 electrodes for applying an external electric field, as shown in Figure 1. Three simulating cases are considered here, where the 1-D computational domain is always set at y=Q, also the boundary conditions of the ionic concentrations are ImM at x=0 and 15. In Case 1, c / = 1 0 m M and the external electric field is not considered. The corresponding simulation distributions of diffusive ionic concentrations and electrical potential are shown in Figure 2, where the electroneutrality is observed directly in the bathing solution and also examined in the hydrogel domain after considering c{ = 10 mM. Further, Case 2 takes the external electric field (+0.1V, see Figure 1) and c/ = 10 mM without consideration of mechanical deformation. Corresponding computed distributions of diffusive ionic concentrations and electrical potential are shown in Figure 3. It is seen from the left figure that, in the bathing solution, the ionic concentration increases near cathode side and decreases near anode side. It is also clear according to the right figure that the variation of electric potential of interior hydrogel is smaller when compared with that in the exterior solution, due to the higher conductivity of the hydrogel strip for mobile ions. The present simulating electric-potential distribution agrees well with Wallmersperger's FEM results [3] and is satisfactory if compared with his experimental results [3]. Finally, Case 3 applies the external electric field (±0.1V) and considers the mechanical deformation of the hydrogel strip by taking 2"=293K, /?=8.314J/mol.K,
NUMERICAL SIMULATION OF ELECTROMECHANICAL BEHAVIOR FOR MEMS OPTICAL SWITCH F. WANG, C. LU, Z. S. LIU Computational Mechanics Division, Institute of High Performance Computing, 1 Science Park Road, #01-01 The Capricorn, Singapore Science Park II, Singapore 117528 E-mail: wangf@ ihpc. a-star. edu. sg This paper presents finite element (FE) dynamic and theoretical analysis of a new MEMS optical switch. The switch is composed of a skew plate, one drawing beam, a mirror and a substrate. The plate is restrained translationally at one end. The drawing beam provides additional translational restrain to the plate. The mirror is mounted on the plate. Two identical bending beams are inserted inside the plate to adjust the bending rigidity of the switch. The switch is actuated by electrostatic attraction applied on the skew plate. Finite element dynamic simulation is performed to predict the mechanical behaviours of the switch. The minimum natural frequencies and the corresponding mode shapes, maximum stress distributions, dynamic responses under different levels of electrostatic attraction loads are derived. A theoretical dynamic model is further applied to validate the finite element simulation results. Good agreement is found between the finite element simulation and the analytical results with regard to the pull-in voltages, and the time to pull in at different voltages. The current study is part of the research work for development of a novel MEMS optical switch with optimal electromechanical characteristics. The new MEMS optical switch presented in this paper is the second version of the novel switch. Design and experiments are carried out with the help of numerical simulation tools.
1. Introduction In a joint research program by School of Electrical & Electronic Engineering, Nanyang Technological University, Singapore and Division of Computational Mechanics, Institute of High Performance Computing, Singapore, novel MEMS fiber-optical switches are developed. Design and experiments are carried out with the help of finite element analysis and theoretical study. Scenario for the numerical study of the primitive designs of the MEMS fiber-optical switches is seen in Reference [1,2]. According to the numerical analysis for the primitive design described in [1,2], the minimum driving voltages are higher than 250 V and the switching times are longer than 170 [xs. Experiments are then carried out for the primitive designs. It fails to derive any useful data because of the high driving voltages. On the other hand, FE parametric study of the primitive designs reveals that the dimensions of the bending beams play a critical role in determining the driving voltages and the switching times. Slender bending beams yield lower driving voltages and shorter switching times. Based on these numerical results, a final design for the switch is derived with the footprint of 1 x 0,6 mm and skew angel of 3.7°. The number of the bending beams is reduced from 3 to 2 and the dimensions are also marginally reduced. Efforts of the current study focus on FE simulation and theoretical analysis of this revised or final design.
2.
Finite element analysis
Figure 1 gives the schematic diagram for the revised or the final design of the switch. It is composed of a skew plate (1 x 0.6 x 0.0021 mm), two bending beams (0.04 x 0.01 x 0.002 mm), a mirror and a substrate. The plate is restrained translationally at one
888
889
end. The drawing beams provide additional restrain to the plate. The mirror is mounted on the plate.
Figure 1. Schematic Diagram of the Final Design of the Fibre-Optical Switch
In the FE Dynamic analysis, the switch structure is modelled as that shown in Figure 2, in which the boundary conditions are also illustrated. For the restraining condition of the drawing beams to the plate, it is assumed that the translations of the edge of the plate connected to the drawing beam are restrained, while their rotations are independent. When the plate moves to the substrate, impact/contact between the plate and the substrate takes place. To simulate this phenomenon of impact/contact, the contacts between the plate and substrate are introduced. The surface to surface contact algorithm between each part is established.
Figure 2. Finite Element Mesh for the Switch
The material for the whole structure is polysilicon except that a 0.5 (im thick gold layer is directly deposited on the mirror frame, which is composed of 1.5 u.m thick polysilicon. Properties for these two materials are shown in Table 1. The plate and the substrate are charged with opposite static electricity, so that the plate is driven by the distributed electrostatic attraction. The magnitude of the attraction is the inverse to the square vertical distance of specific loading point to the substrate, which has the following expression [3], v &.&5xW-nxVolt2
, (,
'~
(1)
2y(X,,f
where x represents the longitudinal variable, t is time variable. y(x,t) is the vertical distance of the loading point to the substrate, which varies with the position of the loading point and time. In the present dynamic analysis, user-subroutine is used to apply the
890 variable distribution load. The switching on and off times refer to the time duration that the plate moves from the original position (plate has a skew angle of 3.7° with the substrate) to touch the substrate and vice versa. The derived pull-in voltage is 8.27 V and the corresponding pull-in time is 38 ms. Table 1. Material Properties. Material
Polysilicon
Gold
Young's modulus (MPa)
150,000
77,000
Poisson's ratio
0.3
0.3
Density (Kg/ m3)
2,340
19,300
It is seen from Table 2 and Figure 3 that the switching on times vary with the switching voltages, where a larger switching voltage yields shorter switching on time. On the other hand, marginally different voltages have insignificant effects on the switching off times (see Table 2). The switch off voltages from the FE analysis is about 3.7 V regardless of the different switching voltage. The rise-up times of the switch under different switching voltages fall in a narrow range of 39-41.2 ms. The reason of the almost constant rise-up times is that switch-off speed depends on the restoring force only. It may vary due to the capacitance resistance (CR) time constant of the electrostatic actuator. In the present study, the CR time constant is not taken into account. FE mode analysis is also performed to derive the natural frequencies and the corresponding mode shapes. The derived first ten natural frequencies are tabulated in Table 3, and the first three mode shapes are shown in Figure 4. Voltage=8.387V
Voltage=8.8V
Tlme(ms)
Time(ms)
0 •5
•10
I -16
-Theoretical study | "FE analysis I
1
-20
j
S 2
FE analysis
- Theoretical study
S -25 -30
a
L _
_J
-35 -40 -45
Voltage=7.6V
Voltage=8.359V
T)me(ms) 150 /
/
E 3. -is
1 -*H J -25
-Theoretteal study ~- FE analyst
Theoretical study ™***- FE simulation
Figure 3. Displacement at Mid-point of Mirror Versus Time Calculated from FE Simulation and Theoretical Study.
Tlme(ms)
891 Table 2. Switching On and Off Times.
Switching Voltage (V) Switching Times (ms) FE Simulation Results
On 60.03
Off 41.00
8.387 On Off 43.68 39.79
On 9.51
Off 39.79
Theoretical Results
57.00
NA*
45.18
NA*
11.58
NA*
Percentage Errors
5.32%
NA*
-3.32%
NA*
-17.88%
NA*
8.359
8.8
Note: NA* —The data is not available. Table 3. The First Ten Natural Frequencies for the Switch Structure from FE Simulation.
Natural frequencies (Hz) 1st 1,425
nd
2
1,898
3rd
4th
5th
6*
7th
gu,
9*
10th
2,581
6,310
8,100
9,168
9,485
13,752
13,982
16,201
1st mode shape
Figure 4. The First Three Mode Shapes Derived from FE Simulation.
3.
Theoretical Analysis
To verify the FE dynamic simulation results, an analytical beam model is derived. The formulation of the analytical beam model is based on the observations that the structure, the boundary and loading conditions of the switch are symmetric about the central line of the plate and only vertical or transverse loads are involved. Therefore, the switch is firstly simplified as a Bernoulli-Euler beam assembly. Furthermore, the beam model is simplified as a cantilever beam according to FE simulation results. Observation on the dynamic response reveals that the restraining of the drawing beam on the plate is rather rigid so that the displacement of the near end of the bending beam is almost zero even under a high level of attraction load. Therefore, the beam model can be further simplified as cantilever beam clamped at the near end of the bending beam. In solving the analytical model, Rayleigh-Ritz method is used to calculate the natural frequencies and mode shapes. Dynamic response is derived using the mode-superposition method. Computer codes in FORTRAN is developed to solve the equations involved in the derivation of the coefficients. The simplified Bernoulli-Euler beam model consists of
892
a number of beam segments where the properties of each segment match the transverse properties of the original structure. In particular, the transverse properties of the mirror are combined into a specific segment of the beam. The dynamic response of the beam under electrostatic attraction load is then derived with the mode-superposition method. In the present study, only the first three normal modes are used for calculation of the displacement. FORTRAN computer codes are developed to implement the calculation. The derived displacement versus time curves under different levels of attraction for the far mid-point of the mirror are shown in Figure 3. The switching on times at different electrostatic voltages are tabulated in Table 2. Percentage errors of switching on times between the FE model and analytical model are also listed in Table 2. The analytically derived pull-in voltage is 8.18 V. 4.
Discussions and Conclusions
Comparison of the results from the analytical and FE dynamic model shows that the switching on times at different voltages (see Figure 3 and Table 3) are close to each other. The percentage errors of switching on times between the FE simulation and theoretical study at different switching voltages of 8.359 V, 8.387 V and 8.8 V are 5.32% and 3.32% and -17.88% of the analytical derivations. It is seen that the percentage error of the switching on time between the FE simulation and the analytical results is a little bit high for the switching voltage of 8.8 V. It is because the absolute value of the switching on time at switching voltage of 8.8 V is very small compared with the values at voltages of 8.359 V and 8.387 V (see Table 2). Still it can be stated that the FE simulation and the analytical results agree with each other. Especially, very good agreement is found between the pull-in voltages. The FE simulation result is 8.27 V, while the analytical calculation is 8.18 V, which yields a percentage error of 1.1% of the analytical result. The switching off voltages from the FE analysis is about 3.7 V regardless of the different switching on voltages. Therefore, conclusions can be drawn that a dynamic analytical and FE models for novel MEMS fibre-optical switch have been successfully established for the MEMS fibreoptical switch. Very useful results are obtained from FE simulation and theoretical study with regard to the mechanical properties of the switch, such as the natural frequencies, the corresponding mode shapes, the pull-in, lift-up voltages, the switching on and off times, and the maximum stress distributions. References 1. F. Wang, C. Lu, Z. S. Liu, A. Q. Liu, X. M. Zhang, "Modeling of Optical Mirror and Electromechanical Behavior," Proceeding of SPIE, Vol 4582, APOC 2001, Beijing, China, pp. 95-105, 2001. 2. F. Wang, C. Lu, Z. S. Liu, J. Li, A. Q. Liu and X. M. Zhang, "Finite element simulation and theoretical analysis of fiber-optical switches," Sensors and Actuators A Physical, Vol. 96, pp. 167-178, 2002. 3. G. C. Wetsel, Jr. and K. J. Strozewski, "Dynamical model of microscale electromechanical spatial light modulator," Journal of Applied Physics, Vol. 73, No. 11, pp. 7120-7124, 1993.
DESIGN AND MODELLING HIGH-EFFICIENCY ACCELEROMETERS A. T. NG, W. H. LI, H. DU AND N. Q. GUO School of Mechanical
& Production Engineering, Nanyang Technological 50 Nanyang Avenue, Singapore 639798 E-mail:mwhli@ntu. edu. sg
University
The paper presents design and modelling of a new uniaxial silicon-based micro-accelerometer. This design utilizes the piezoresistive sensing concept to detect the mechanical strain induced by the acceleration of the proof mass. Unlike the conventional common cantilevered proof mass type of design that relies solely on bending strains, this design adopts the combination of both bending and axial deformation. Basically, the deformation of the bending structure rotates and displaces the proof mass, which consequently induces axial strains that will be sensed by the piezoresistors. The level of sensitivity and natural frequency can be controlled and adjusted to suit different requirements depending on the application intended. During the design and analysis stage, theoretical models were constructed to predict the behaviour and performance of the new design. Subsequently, finite element simulations were carried out to verify the predictions from theoretical models, including sensitivity and natural frequency. It was observed that the sensitivity of the proposed new design was significantly higher when compared with other commercial accelerometers.
1
Introduction
Micro-machined accelerometers, based on a variety of working principles had been developed over the years, striving for improvement in performance. Regardless of the operating principle, all micro-machined accelerometers (and in fact the conventional accelerometers) require a transduction mechanism to transform a mechanical input (such as displacement, stress and strains) induced by the applied acceleration into a measurable electrical signal [1]. The common micro-machined devices include those utilizing piezoresistive sensing, piezoelectric sensing, capacitive sensing, thermal sensing and etc. [2]. Each design has its set of unique characteristics that depicts their advantages and disadvantages. Comparing among the different modes of sensing, the piezoresistive type can be constructed easily using the wafer fabrication techniques. Piezoresistance of a material is the fractional change in bulk resistivity induced by small mechanical stresses applied to the material. The piezoresistive effect can be normally measured with a simple Wheatstone bridge with very minimum signal conditioning [3]. In this paper, a new piezoresistive micro-accelerometer design is proposed and developed. The behavior and performance of the new design are theoretically analyzed and evaluated with FEM modeling.
893
894
2
Conceptual Design
2.1 Working Principle of conventional accelerometers Accelerometers are typically specified by their sensitivity, maximum operation range, frequency response, resolution, full-scale nonlinearity, and cross-axis [2]. The most common piezoresistive micro-accelerometer design utilizes the cantilever concept. In this design, a beam with a seismic mass (proof mass) attached to one end is cantilevered onto the supporting frame of the housing for the sensor. When an external acceleration is applied to the supporting frame, it moves in relative to the proof mass due to the latter's inertial effect. This causes the cantilevered beam to deflect under the inertial force, which induces bending stresses / strains on the beam. In micro-machined accelerometers, the inertial forces that are detected as the measurement of acceleration are usually very small because of their tiny masses. Consequently, the value of induced strain values will also be limited, affecting ultimately the sensitivity and resolution of the sensor. For the conventional cantilever beam design, increasing the proof mass or reducing the stiffness of the beam allow a larger displacement under the same applied acceleration. Although such remedies increase the inertial force required for improving sensitivity, they inevitably result in lower natural frequency of the sensor. Moreover, it might cause the overall structure to be less robust and reliable. The useful bandwidth of the microaccelerometer will be narrowed, hence yielding a poorer frequency response. 2.2 New Design The new design takes advantage of the fact that structure experiencing bending deformation yields a larger displacement with low strain/displacement ratio, while structure experiencing axial deformation produces high strain/displacement ratio but limited range of displacement values. Ideally, these two modes of deformation can be integrated into a combined system to achieve high strain values, thereby increasing the sensitivity. The main idea in the integration is to obtain an initial large displacement from the bending structure, and translating this displacement into an amplified, focused axial strain through the axial structure. Figure 1 (a) and (c) show the proposed new design before and after deformation, respectively. The new design encompasses basically four main
895
components, namely the bending element, proof masses, sensing elements, and the supporting frame.
lement
(b) Section A -A
(c)
Acceleration
Figure 1. Schematic of the proposed microaccelerometer before and after deformation.
3
Mathematical Modeling
With reference to Figure 1, the inertial force exerted by the proof mass can be represented as F. the bending element itself is also experiencing a distributed inertial force, Q. Fs represents the axial spring forces from the sensing elements. The moment acting at the upper end of the element is M. The rotation angle of the bending element is 0. The governing equation for the free body at y = 0 and the approximated axial strain are given by: FL; EIb6 = MLb + ». + M (1)
9 Jk e=- 21 4
(2)
Analytical Results based on FEM
As listed in Table 1, four models of the new design, model 1 to model 4, are devised for the analytical purpose. Each model differs from one another in terms of the width of the bending elements, the size of the proof masses, and the gap between each pair of sensing elements, i.e. Hg. By manipulating these parameters, different sensitivity values and their corresponding frequencies are achieved.
896 Table 1. Dimensions the proposed models Models Dimension (mm) H„
L, Hb U Hs L, H„ T„
Tb Ts
1
2
1.5 2.0 0.12 0.08 0.003 0.01 0.16 0.35 0.03 0.03
1.5 2.0 0.2 0.08 0.003 0.01 0.25 0.35 0.03 0.03
3 1.5 2.0 0.34 0.08 0.008 0.01 0.46 0.35 0.03 0.03
4 3.0 4.0 0.4 0.16 0.006 0.02 0.5 0.35 0.03 0.03
The performance comparison in terms of sensitivity and natural frequency is summarized in Table 2. It can be seen that the new designs surpass the commercial products significantly with respect to their sensitivity levels at the corresponding nature frequency levels. Table 2. Dimensions the proposed models
5
Models
Sensitivity (mV/g)
Natural frequency (kHz)
New design 1
6.49
16.1
New design 2
3.37
26.9
New design 3
0.69
56
New design 4
7.58
12.8
Entran EGAX-2500
0.1
6
E ndevco 7264B-2000
(1.25
28
Conclusions
In this paper, a new uniaxial micro-accelerometer was proposed and designed. The analytical results demonstrate that the new design is capable of achieving both high sensitivity and natural frequency, as compared with the contemporary commercialized products' performances. References 1. Maluf N., An introduction to micro-electro-mechanical systems engineering (Artech House, Boston, 2000). 2. Navid Y., Farrokh A. and Khalil, N., Micromachined inertial sensor. Proceeding of the IEEE, 86 (1998), pp. 1640-1659. 3. Chen H., Shen S. Q. and Bao, M. H., Over-range capacity of a piezoresistive microaccelerometer. Sensor Actuators A: Phys 58 (1997) pp. 197-201.
A FINITE ELEMENT ANALYSIS FOR PIEZOELECTRIC SMART PLATES INCLUDING PEEL STRESSES QUANTIAN LUO AND LIYONG TONG School of Aerospace, Mechanical and Metrachonic Engineering, University of Sydney NSW 2006 Australia E-mail: [email protected] This paper presents a novel finite element analysis (FEA) formulation for piezoelectric (PZT) smart plates taking into consideration peel stresses. To model shear and peel stresses at the interface between the PZT patches and the parent plate structure, a finite thickness adhesive layer with a lower elastic and shear moduli, as compared with those of the PZT and host structure, is considered. This layer is modeled as a continuous spring system with a constant shear and peel stiffness. It is then sandwiched between two collocated 4-node Reissner-Mindlin plate elements to form laminatedelements for composite plates. This FEA framework can consider independent rotational angles, and is applicable to thin or moderately thick plates with debonded PZT actuators and sensors. Numerical results are presented to validate the present formulation.
1
Introduction
Crawley and de Luis ' developed an analytical model for smart structures, in which, a finite thickness adhesive layer was assumed to experience pure shear strain. This classic shear lag model has been widely used for piezoelectric smart structures and bimorph applications. In our recent studies |21[31, it has been shown that peel stresses in adhesive layer can significantly affect the mechanical behavior and dynamic response. This is especially true for flexible smart structures and in the presence of debondings. To date, analytical solutions for smart structures are very limited or too complicated for practical applications. Finite element analysis has been widely used in the area of smart structures as it effectively deals with complicated geometric shapes, loadings and boundary conditions. Allik and Hughes [4] presented a finite element method for piezoelectric or electroelastic structures. They derived the FEA formulation by applying a variational principle to the virtual work density of mechanical strains and electric fluxes, obtaining the following dynamic equations for a piezoelectric element: [m]{ Ui}+ VUM+UcM^m+ifsHVp) [k
1 J
(1)
In equation (1), [m], [kuu], [ku(^ and [k^ are defined as the kinematically consistent mass matrix, structural, coupling piezoelectric and dielectric stiffness matrices respectively. \fB], [fs] and {//»} are the body force, surface force and concentrated force vectors respectively. {qB}, {qs) and {/>} are the body charge, surface charge and point charge vectors respectively. {«,} and {$} are the displacement and electric potential vectors respectively. Once the nodal values of the displacement and electric potential for a PZT element have been found, the stresses and electric flux density at any point in the element are given by: {T} = [c][Bu]{u,} + [e][B^]{(/,i}
{D} = [e]T[Bu]{u,}-{JtHBJ{6}
"I
\
897
(2)
898
where, {T} and {D} are stress and electric displacement vectors; [Bu] and [B^ are strain and electric field matrices; [c], [e] and [$ are elastic, coupling piezoelectric and dielectric matrices respectively. In smart structures, PZT patches are normally bonded to or embedded in the host structure to implement self-monitoring and controlling functions. Therefore, the global mass and stiffness matrices of smart structures are comprised of PZT patch structural element matrices, coupling piezoelectric and dielectric stiffness matrices, and the normal host structural element matrices. PZT patches in smart plates can be used as actuators and sensors. This paper presents actuated performance of PZT patches in smart plates only. 2
A FEA framework for smart plates including peel stress
To implement FEA formulation for a host plate with the bonded PZT patches, we use 4-node Reissner-Mindlin plate elements for the host plate only and the pseudo-elements derived by Tong and Sun [5] in the area with the bonded PZT patches. The 4-node element is based on the first order Reinssner-Mindlin theory. Displacement fields of the local element are: U{x, y, z) = u0(x, y) + zOy (x, y) -s V(x,y,z) = v0(x,y)-ZeAx,y) L W(x, y, z) = w(x, y) J
(3)
where, u0, v0 and w are the translational displacements in the mid-plane; 6y and 0X are the rotations about coordinate axes y and x respectively. The stiffness matrix of the pseudoelement is: -\ kp + k„u ka]2 [*»] = kan kh + ka22
r [ka] =
kail
K12
ka21
ka22
(4)
where, [ka] is a stiffness matrix of the adhesive element derived on the basis of the adhesive model developed by Goland and Reinssne^ and shape functions defined in the 4-node plate element. By assembling conventional plate elements and the laminated-elements, the structural dynamic equations can be obtained. When the thin PZT actuators are used in plate or shell structures, the electric field is only poled in the direction perpendicular to the structural plane. Actuated equations can then be expressed in the form of: [M]{d}+[C]{d}+[K]{d}={Fv}+{Fp} (5) where, [Af], [C] and [K] are structural mass, damping and stiffness matrices; {d} is global nodal displacements; {Fv} and {FP} are electric force and loading matrices respectively. 3
Numerical results
By implementing the present FEA framework for smart plates, two examples of a static analysis are presented here.
899
Example J Verification of the present FEA for PZT smart beams For a smart beam shown in Figure 1, a PZT actuator is bonded a distance of 20 mm from the clamped end of the 0.24 m long host beam whose thickness is 10 mm. The PZT actuator has a thickness of 1 mm and a length of 10 mm. The adhesive layer is 0.1 mm thick. The elastic moduli of the PZT actuator, host beam and adhesive layer are: Ep = Eh = 70 GPa, Ea = 3 GPa respectively, with the adhesive shear module, Ga = 1.07 GPa. The coupling PZT constant is e31 = -5.2 Nl(m v) and the applied voltage is assumed to be -100 (v).
Figure 1 A cantilever beam with the bonded PZT patch To verify the present FEA for smart plates, we set e32 be zero and the plate width as 2 cm, and thus its deformation is equivalent to that of a smart beam whose exact solution was obtained [2] I31. Four equal elements are used along the width direction. Small elements of 1 mm long are used near the PZT edges along x direction, whereas large element of 1 and 2 cm long are utilised in the rest. Implementing the FEA program for this PZT smart plate, we find that the non-dimensional tip displacements are: «„ = u/h = 7.92xl0"3, w„ = win = 1.09, and 8y = -5.20xl0"6, whose errors are all less than 3%; therefore, the accuracy of the present FEA for smart plates is validated. Example 2 comparison with higher order theory For a smart pate shown in Figure 2, two PZT patches with the thickness of 0.5 mm are bonded to a 0.2x0.2 m2 and 0.5 mm thick host plate. The other material are: Ep - 76 Gpa, vp = 0.36, Eh = 72.4 Gpa, vh = 0.33 respectively. The coupling PZT constants for actuator 1 and actuator 2 are: e3I' = e32' = -15.56 N/(m v), e3I2 = e322 = -17.58 N/(m v) respectively. The same adhesive as that used in example 1 is used here. 4
I Y (cm)
Actuator 1 is 2 cm to the clamped end and 3.5 cm to the top edge. Actuator 2 is 2 and 3.5 cm to the left and bottom edges.
/ / / /
Al
P. (15,113.5)
A2
P2 (15,13.5)
A size of both actuators is 3x4 cm2. >
x (cm)
Figure 2 A thin plate with the distributed PZTs When a voltage of-100 (v) is applied to actuators 1 and 2, the deflections at points P! and P2 are: wPi = 0.125 mm and wP2 = 0.123 mm respectively. The deflections computed
900
by the FEA based on a higher order theory (HOT) |71 [81 are 0.156 and 0.152 mm respectively, which are 24.8% and 23.8% higher than the present results. In the exact static solutions[31 to smart beams, we showed that the difference between the shear stress model and the present model might be up to 20% for the thin host beam. It can be seen that the deflection difference between the present FEA and the FEA based on HOT is also in this range, as the adhesive layer and the peel stresses are not modelled in the HOT. 4
Discussion and conclusion
This paper presents a new finite element formulation for piezoelectric plates, in which, an adhesive layer sandwiched between the host structure and the piezoelectric patch is assumed to transfer both constant shear and peel strain along the thickness direction. The numerical results show that the present FEA is effective for analyzing smart plates, also, that peel effects on PZT smart plates may be significant for flexible plates and debonding analysis. Acknowledgements The authors are grateful to the support of the Australian Research Council through a Large Grant Scheme (Grant No. A10009074). References 1. Crawley E. F. and de Luis J., "Use of Piezoelectric Actuators as Elements of Intelligent Structures", AIAA Journal, Vol. 25, No. 10, 1987, pp. 1373-1385. 2. Luo Q. and Tong L., "Exact Static Solutions to Piezoelectric Smart Beams Including Peel Stresses, Part I: Theoretical Formulation", International Journal of Solids and Structures, Vol. 39, No.18, 2002, pp.4677-4695. 3. Luo Q. and Tong L., "Exact Static Solutions to Piezoelectric Smart Beams Including Peel Stresses, Part II: Numerical Results, Comparison and Discussion", International Journal of Solids and Structures, Vol. 39, No.18, 2002, pp.4697-4722. 4. Allik H. and Hughes T. J. R., "Finite Element Method for Piezoelectric Vibration", International Journal For Numerical Methods in Engineering, Vol.2, 1970, pp.151157. 5. Tong L. and Sun X., "Stresses in Bonded Repair to Cylindrical Curved Shell Structures", Research Report, Department of Aeronautical Engineering, The University of Sydney, 2000. 6. Goland M. and Reissner E., "The Stresses in Cemented Joints", Journal of Applied Mechanics, March 1944, A-17 - A-27. 7. Chee C. Y. K., Tong L. and Steven G. P., "A mixed model for beams with piezoelectric actuators and sensors", Smart materials and Structures, Vol. 8, 1999, pp.417-432. 8. Nguyen Q. and Tong L., "Shape Control of Smart Composite Plate Structures with Non-rectangular Shaped PZT Actuators", Proceedings of the Third Australian Congress on Applied Mechanics, 2002, pp.421-426.
A STUDY OF THREE-DIMENSIONAL MESH GENERATION FOR COAL MINING MODELLING S.G. CHEN, S. CRAIG, D.P. ADHIKARY AND H. GUO CSIRO, Po Box 883, Kenmore QLD 4069, Email: [email protected]
Australia
An automatic mesh generator is developed for coal mining modelling. The mesh generator is incorporated into the preprocessor that is an accompanying part of a finite element package COSFLOW. An example is given showing that, in most cases, the mesh generator meets the requirements for coal mining modelling.
1
Introduction
In finite element modelling, the problem domain is discretised into a mesh, possibly consisting of more than one type of element, e.g. segments, triangles, quadrilaterals, tetrahedral, pentahedra and hexahedra [1, 2]. These elements must be connected, cover the domain and not overlap. This paper reports on a user-friendly automatic mesh generator used for underground coal mining applications. The mesh generator is incorporated into the preprocessor developed for the finite element program COSFLOW [3]. COSFLOW is used to simulate deformation and two phase fluid flow in rock. An example is presented to show the application of the mesh generator. 2
Mesh generation
2.1
Mesh requirement for COSFLOW
Coal forms in seams in sedimentary, layered, relatively soft rock. When surface mining is uneconomic, coal is mined by longwall methods where coal is extracted and roof rocks are allowed to cave behind the supported mining face. The selected regions for extraction are called panels (rectangular in plan) separated by pillars that are not mined. To gain access for machinery and transport of mined coal, roadways are first driven in the coal near the outer perimeter of the panels. The finite element model COSFLOW is used to simulate the rockmass deformations, stress changes and the flow of water and gas. The mesh used for COSFLOW modeling needs to be aligned with the boundaries between rock layers so that appropriate material properties can be given to the elements and it also needs to be aligned with the coal seam to be
901
902
extracted in each notional step so that the appropriate elements can be removed to simulate mining. Thus the domain of interest is divided into a number of subzones, each of which must be meshed, with the meshes of neighbouring subzones matching on their boundaries. The design of the numerical mesh is a compromise between accuracy and solution time. The accuracy of an analysis is usually greater with smaller elements, but, especially in three dimensions, the number of elements must be limited to enable feasible computer run times that may still be in the order of hours or days. Finer meshes are required near the excavation where the gradients of displacement and the pore pressure etc. are greatest, while the mesh at some distance from the excavation may be coarser. For this reason, a graded mesh may be required within each subzone. COSFLOW uses quadrilaterals elements for two-dimensional analyses and hexahedra elements for three-dimensional problems. 2.2
Meshing in a subzone
For many applications in underground coal mining, the subzones are hexahedral. In this case, the meshing in a subzone consists of three steps. Firstly, the subzone with six faces is converted to a cube with side length of 2 in the manner similar to the concept of transforming a hexahedral element to a cube element used in the finite element method. The local coordinate origin of the cube is located at the center of the cube and thus x, y and z-coordinate in the cube ranges from - 1 to +1. Secondly, by giving seed numbers and ratios, in x-, y- and z-axes, the mesh is generated in the cube and the node coordinates are calculated. The nodes and elements are numbered sequentially along the x-axis first, then the y-axis and finally the z-axis. Thirdly, the node coordinate in the cube is transformed to the subzone according to the function: x = A, + a2x0 + a3y0 + a4z0 + a5x0y0 + a6y0z0 + anzax0 + agx0y0z0 y = bx +b2x0+biy0
+b4z0+b5x0y0+b6y0z0
+b7z0x0+b&x0y0z0
Z = cx + c2x0 + C^Q + c4z0 + c5xQy0 + c6y0z0 + CTZQXQ + csx0y0z0 where the subscript 0 refers to the coordinates in the cube. Substituting the coordinates of the eight vertexes in the subzone (x, y, z) and the cube (x0, yo, Zo) in the above equation and solving them, the parameters (a„ £,, c„ / = 1 ~ 8) could be determined.
903
23
Mesh connection
The meshes generated in the subzones are independent from each other and need to be connected together to form a complete mesh. To do this, the nodes are merged at the interfaces. This is done by including the nodes of the previous subzones on common boundaries in the list of nodes for the current subzone. The node transferring from the previous subzone to the current subzone is closely related to the sequence of numbering of the nodes and the elements in the subzones. 3
A roadway example
A typical example using this mesh generator is illustrated in Figure 1, for simulating the gas emission during the roadway excavation. The model involves seven strata including a coal seam of 3 m thickness (the black layer between the red and green layers in the model). The domain of interest has a size of 250, 100 and 90m in x, y and z directions, respectively. The roadway with the cross section of 5.2 m in- width and 3.0 m in height is constructed in the coal seam advancing along the x-axis. The generated mesh consisting of 5200 elements and 6237 nodes is also shown in the figure. Typical longwall meshes are much larger, but constructed in the same way.
Figure 1, A three-dimensional mesh of a coal mine roadway generated by the mesh generator.
904
The model is divided into 1, 3, 7 segments in x, y and z directions, respectively, and thus a total of 21 subzones. Only one segment is given in x-direction because no grading is required as the roadway advances at a constant rate of about 40m per day. From the figure, it can be seen that the interfaces between different strata are not horizontal and slightly oblique to the horizontal plane. As the roadway is almost located in the middle of the model in the y-z plane, the mesh is finer at the area around the roadway and coarser further from the roadway in both the y and z directions. 4
Discussion and conclusions
The mesh generator is specially developed for coal mining modelling and is incorporated into the COSFLOW preprocessor. An example using the mesh generator indicates that models with oblique strata interfaces or excavations could be generated. The restriction of the mesh generation is that the subzones must have six faces and the nodes at the interfaces of adjacent subzones must be matched. This may produce a mesh with poor quality when the area of one face is significantly different from that of its opposite face in the subzone, as the two faces must have the same number of elements with this mapped meshing approach. More general meshing schemes [2] could be incorporated, but are rarely required for underground coal mining applications. 5
Acknowledgements
The authors would like to thank NEDO, JCOAL and CSIRO for providing the funds for conducting this research work. The authors also wish to express their thanks to Dr Baotang Shen and Mr Brett Poulsen for their comments on the paper. Reference 1. Pande, G.N., Beer, G. & Williams, J.R., Numerical methods in rock mechanics. Wiley, New York, 1990, pp223. 2. George P.L., Automatic mesh generation, application to finite element methods. John Willey & sons, 1991,pp333. 3. Chen S.G., Craig S., Guo H. and Adhikary D.P., A pre/post processor for finite element modeling for coal mining applications. Proceedings of IC-SEC 2002, Singapore, December 2002.
LINEAR AND TORSION SPRING ANALOGIES FOR DYNAMIC UNSTRUCTURED MESHES IN FLUID STRUCTURE INTERACTION PROBLEMS - A COMPARATIVE STUDY R. AJAYKUMAR t , N.M.SUDHARSAN*, K.MURALf, K. KUMAR + AND B.C.KHOO* t
Institute of High Performance Computing, Singapore - 117528, *Singapore-MIT E-mail: ajay@ ihpc.a-star. edu.sg
Alliance
Dynamic mesh adaptation techniques are an important aspect to be considered in fluid structure interaction (FSI) problems. In these problems, the computational mesh has to be adapted at every time step to the new boundary position dictated by the structural response. In most cases, this adaptation can be achieved by moving the mesh points rigidly in response to the motion of the structure. However, this approach is no longer applicable for large structural displacements. The same holds if the outer boundaries of the mesh are fixed multiblock boundaries or if more complex deformations are considered. To tackle such problems, efficient grid regeneration and/or grid deformation techniques are required. One such approach is the linear spring analogy. In this procedure, the dynamic unstructured grids are usually represented by a network of fictitious linear springs. However this procedure fails for very large displacements. Two other methods to overcome these difficulties are implemented and compared in this paper. They are, namely: the "Modified Linear spring analogy" and "Torsional spring analogy". Their advantages, disadvantages and improvements for FSI problem's are discussed. Comparison is also made with other grid deformation schemes.
1
Introduction
Fluid-Structure Interaction (FSI) problems are described by fluid and structural field equations. Solutions to these problems require the coupling of fluid and structural solvers. This coupled problem can be viewed as a three-field problem by viewing the moving mesh as a system with its own dynamics. One way of coupling the Computational Fluid Dynamics (CFD) and the Computational Structural Dynamics (CSD) codes is to use a partitioned analysis (also called staggered procedure). This procedure is utilized here to study the phenomenon of a nearly incompressible fluid interacting with a solid plate. Herein, we address the mesh movement part of the FSI-simulation. Linear and Torsion spring analogies are used for updating the fluid grid point positions. They are compared on the basis of the quality of mesh obtained after the dynamic meshing. Normalized equiangular skewness is used to assess the quality of the resulting mesh. Comparison is also made with other mesh movement algorithms such as the Laplacian smoothing. 2 2.1
Governing Equations and Finite Element Formulaton Fluid Equations
An inviscid-compressible fluid model is often sufficient for analysis of hydrodynamic structures [5]. Considering an irrotational, isentropic inviscid fluid in small displacements and assuming constant density, the governing equations of the fluid can be written as:
V 2 0 = - ^ 0 , inS°
905
(1)
906
where, /? is the bulk modulus, p is the densityof the fluid, <j> is the velocity potential, S is the fluid domain and wave speed is given by c = ^P/p
.
The boundary conditions for this fluid flow can be written down as: | ^ = 0,on
(2)
—— = un , on dSb and dSf, (3) arc and P0-p<j>= pb ,on
J ^ - ^ k S 0 + j(VS0).(V0)dS° = 0
(6)
Linear triangular elements are used to discretize the fluid domain and, the resulting dynamic equation is marched in time using the Newmark time integration scheme. 2.2
Structural Equation - two dimensional plain strain
By the principle of virtual displacements we have:
\SeCs£dSl + \ps SutidS1 = \Sunfbd(dSb) l
S
[
S
(7)
dSb
where, S stands for the structural domain, Cs is the material stress-strain matrix, ps is the density of solid, u is the displacement vector, e is the strain tensor and / is the interface force vector, which is calculated as shown below: / = —n p , n is the outward normal from the solid. (8) Again, linear triangular elements are used to discretize the structural domain, and the resulting dynamic equation is marched in time using the Newmark integration scheme. 2.3
Modified Linear Spring Analogy to Describe the Mesh Movement
In this approach, a fictious spring is attached to each edge connecting two adjacent vertices i and j of a triangle, and the stiffness is chosen as the inverse of the length ltj [1]. Herein, the stiffness of the spring is expressed as the i/fh power of the length /,, multiplied by a scaling factor
kii=q^xi-xj)1Hyi-yj)2)r
(9)
907
The resulting quasi-static equation is: Kq = 0 (10)
q = qon dSb,dSf,dS0
where, q is the displacement of nodes, and q~ is the prescribed displacements on the boundaries. Equation (10) is solved using the Jacobi iterative scheme. 2.4
Torsional Spring Analogy to Dynamically Move the Fluid Mesh
Originally proposed by Farhat et.al [3], this method consists of introducing torsional springs at the mesh vertices to prevent neighboring triangles from interpenetrating each other. The stiffness of the torsion spring at each vertex of a triangular mesh is given by: / 21 2
(ID
C, = \ t •
where, the stiffness C is inversely related to area A, thus allowing the edges to sense the event of a dynamic triangle close to experiencing a negative area or a bad aspect ratio. The resulting equilibrium equation looks as shown below: pijk 'torsion
\ R ijk Tr'JkR ijk ijk L *-" " J H
ijk "-torsion
ijk
K
'J
( l 2 )
\>-^)
where, R relates the displacement at a node to the corresponding rotational increment. 3
Performance Metrics
Normalized equiangular skewness used to evaluate the quality of mesh is described below: 3.1
Normalized Equiangular Skewness
In the normalized angle deviation method, skewness is defined as [4]: n
— f)
180 -ee
/9
— /9
ee
(13)
where, 0mm is the largest angle in the face or cell, dmin is the smallest angle in the face or cell and 6e is the angle for an equiangular face or cell (60° for a triangle). According to this definition, a value of 0 indicates an equiangular cell (best) and a value of 1 indicates a completely degenerate cell (worst). 4
Results and Discussion
Two-dimensional simulations are done for channel flow (Fig 1) and free surface flow with a cantilever plate in the center. The domain for free surface flow is larger than Fig 1. It is seen from Fig 2 that for the coarser mesh (486 el.) the convergence of displacement is bad when compared to the fine mesh (840 el.). Hence, although the torsion spring generates better skewed triangles (Fig 3), the modified spring analogy manages in generating finer triangles near the structure, which enables a better capture of fluid flow there. It is thus expected that a combination of linear and torsion smoothing would be better and hence, this was tried with the free-surface problem. In this (free surface) case, the laplacian smoothing turns out to be better (Fig 4). It's also seen from Table 1 that the modified linear spring analogy is quite fast and effective when compared to other methods.
908 Olid (water) ft"»7Waf SaB A-2710)eW ' -035 E-7GP»
^
D
T.tnHirBpTinp| •fiimmBBli
3 — - —
Tendon spring, fine modi LinMrEprmg, owasemseh Tamoii spring, cottsemwli
0.01
0.02 TimB(EBo)
V Fig 1: Channel Flow with plate in center
Fig 2: Disp. of mid-pt. on top of the plate
0.482 0.-18
0.497
0.476
0.406
5 0.47R'
Linear sprint ,oo*ra madi Tonrtaa apriuc.ooaixo mool
9 0.17-1
| 0.495 R 0.494
B 0.472 -«! 0.493 0.47 0.468 0.4GG
Fig 3: Avg. skewness plot of fluid mesh
0.492
- A A K LigJunim smoothing - After Uno*r + T<w*k>n
r#y ^
/ v*
;
%
0.49 1
Fig 4: Smoothing for free-surface FSI flow
Table 1: CPU real time for total simulation with different smoothing algorithms Mesh Size Linear spring analogy Laplacian smoothing 40.90 s Coarse(1058 elements) 69.86 s Fine (3696 elements) 258.75 s 285.02 s References 1. Batina J.T., Unsteady Euler Airfoil Solutions Using Unstructured Dynamic Meshes, AIAA Journal, Vol.28, No.8 (1989) pp.1381-1388. 2. Blom F.J. and Leyland P., Analysis of Fluid-Structure Interaction By Means of Dynamic Unstructured Meshes, 4th International Symposium on Fluid-Structure Interaction, Vol 1, ASME (1997). 3. Farhat C , Degand C , Koobus B., and Lesoinne M., Torsional Springs for TwoDimensional Dynamic Unstructured Fluid Meshes, Computer methods in applied mechanics and engineering, 163 (1998) pp. 231-245. 4. Fluent Technologies Ltd., Handbook, 8.1.1, (2002) 5. Nitikitpaiboon C. and Bathe K.J., An Arbitary Lagrangian-Eulerian Velocity Potential Formulation for Fluid-Structure Interaction, Computers and Structures, 47 (1993)pp.871-891. 6. Wu G.X. and Taylor R.E., Time Stepping Solutions for Two-dimensional Nonlinear Wave Radiation Problem, Ocean Engineering, 22,8(1995) pp.785-798
SOLVING BIOT'S CONSOLIDATION MODEL FOR BRAIN TISSUE USING SPARSE MATRIX TECHNOLOGY Y. L. LI Institute of High Performance Computing, #01-01 The Capricorn, 1 Science Park Road, Singapore Science Park II, Singapore 117528 E-mail: [email protected] K. H. LEE Mechanical Department, National University of Singapore, 10 Kent Ridge Crescent Singapore 119260 In this paper, Biot's Consolidation model is applied to simulate the deformation behaviors of human brain tissue in neurosurgery. The brain tissue is porous material consisting of both the solid and fluid phases. Its deformation behavior is governed by coupled bi-phasic partial differential equations. FEM is applied to solve the governing equations and simulate the deformation of brain tissue in neurosurgery. But FEM results in a large-scale sparse equilibrium system as fine elements are implemented to mesh the geometrically complex human brain. So row indexed matrix is adopted to avoid the out of core problem and accelerate convergence using Biconjugate Gradient Method. But this scheme is not convenient in assembling the finite element global system. To overcome this handicap, an array of linked lists is implemented to assemble the global system. After the global system is assembled, boundary and initial conditions are applied for iterative solver.
1
Introduction
Biot's infinitesimal consolidation model [1] for soft tissue are governed by the following equations in Einstein's notation aiU + pJ=0 (la)
diu^-u^/dt-kp^O
(ib)
where Oy are components of the stress tensor; p is pore fluid pressure (Pa); ut are components of displacements (m);k is permeability (m3s/kg) and t is time (s). These equations assume that the solid tissue behaves in a linearly elastic fashion and that the pore fluid is incompressible. The equation (la) relates mechanical equilibrium to the fluid pressure gradient across the medium, while equation (lb) provides the constitutive relationship between volumetric strain and fluid pressure. And the brain is considered as a saturated medium. 2
Solving the coupled equations using FEM
Due to the complex shape of a human brain, FEM is used for solving the coupled system. 2.1
Finite element equations formulation
To implement FEM, Garlekin's method [2] is applied to get the weak form of the governing differential equations. For equation (la), the weighted residual method and the divergence theorem give
909
910
I ffV ^W'J
+ W
i.' WV ~ I P-iW'dV
=
fff*"'*
(2)
Similarly, for equation (lb) of the coupled equations, using finite differencing in time, the weighted residual method and the divergence theorem to give jy u^lw.udV - jy <,-w,dV + Atk £ p_,wudV = At £ kprln,w,dS = At £ h^dS Now, the use of the 0 method [2] implies ii,. = (1 - 0) ul + e uf and Pi = (1-0) pi + 0 />,"+1 Combining equations (2-4) and presenting them into matrices form lead to 4jvBrEBdvj *[jvNr[VL]dV|]'1-
"j][jvLr[VN]dV|J ffAA^IVLnVLjdvjjlp"' (1-0) ["jvBrEBdVil
[jvLr[VN]dV|]
(1 - 0)[jvNr[VL]dV|j
(3) (4)
<»
f s NrtdS,l
(l-ejArtJjJVLnVLjdV^ P J AtjlIhdS',
Calculation of the matrices can be found in Bather's textbook [2]. 2.2
Initial conditions
When time-marching is applied for this time-dependent problem, the initial displacements and pore pressures must be known. Interestingly, with 0=1.0, the elements which multiply the pore pressure p" are zero on the right hand side in equation (5). In this case, only the initial displacements are needed. At t=0, the external loads are supported by the pore fluid and the effective stress in the solid skeleton is zero because the pore fluid seepage velocity is limited when f—>0. So the total fluid flow out of the solid skeleton is zero and there is no geometric deformation in the solid skeleton, thus the displacements are zero at t=0. Once the initial pore pressure and displacements have been set up, 0can be changed from 1.0 to 0.5 as 0=0.5 gives the best computation result in dynamic problems. 2.3
Assembly of the global system
This section discusses the assembly of the global system using linked lists. In this work, the Biconjugate Gradient Method [3] is applied in solving the system of equations Kx=b. This method reference K through its multiplication of its transpose and a vector. These operations can be very efficient for a properly stored sparse matrix and here the row indexed storage scheme [4] is applied. Sparse matrix in this storage scheme is very convenient to multiply a vector to its right. This scheme sets up two one-dimensional arrays, sk and ij*. The first one stores nonzero K„ values, while the second stores integer values which indicate the locations of these nonzero values in K. It save memory in sparse matrix storage but costs time in locating the value of its element K,y. 2.3.1
List storage of stiffness matrix
The local stiffness matrix of an element can be calculated using equation (5). After the stiffness matrix of an element has been calculated, it is assembled into the global system. Its elements are added to the global matrix according to the global node numbers of the nodes of the element. This assembly procedure is simple to perform if the element K,-,- can
911 be easily located in the global system. Unfortunately, this is a tough problem in the rowindexed storage scheme for the sparse matrix [4]. As it is very difficult to locate the element K,y in the row-indexed storage scheme, linked lists [4] are used to store the sparse global stiffness matrix because this storage scheme can allocate the element Ky conveniently, quickly and economically. The stiffness matrix K has nonzero elements in every row and the nonzero elements are only a few in each row. Thus, it can be represented with an array of lists with the array size being n as the dimension of K. In each list, the head of list[/] stores the number of nonzero elements in this row while the other nodes store the nonzero values and their column numbers in iA row of K. The data structure of the linked list is defined as typedef struct LIST { int j; // The column number of element K[i] [j] double val; // The value of element K[i] [j] LIST *next; // Pointer to the next element in the list }LIST; Corresponding to this data structure, other operators are defined for its operation. E.g. Lookup(/, L) is for searching the node has the column number of j in the linked list of L. Add(/', val, L) adds parameter val to val at the list node has the column number of j in L. New(jc, val, j) generates a new node x with val and j . Insert(x, L) appends node x to L. Access(L, j) returns the value of val at node has the column number of j . With the operators, an algorithm for the global system assembly can be given as following. procedure AssembleGlobal(list k[l..n], real val, integer i, integer j) results—Lookup(j, k[i]) if(result) then Add(j,val, k[i]) Else New(x, val, j) Insert(x, k[i]) end AssembleGlobal After the global stiffness matrix has been stored in the lists, the algorithm for rowindexed storage can be modified for linked list stored sparse matrix. In original algorithm, K/, is obtained by array indexes. But here it can be found by calling the operator Access(K[i] j). In this way, the global stiffness matrix can be finally stored in row-indexed storage, which is extremely convenient for methods of iterative solution. 2.4
Application of boundary conditions
There are two kinds of boundary conditions. One is the degree-of-freedom constraint, the other is the applied load. At the exposed nodes, the pore pressure is zero; at the nodes close to the skull, the displacements are fixed in the normal direction but free in the tangential directions.
912
3
Results and conclusions
The simulation model has 6067 hexahedral elements with 6389 nodes and 25556 DOFs. It Young's modulus is E=5xl0 4 Pa; Poisson's ratio is v=0.49; permeability is k=1.0xl0"7 m3s/kg. Free drainage across open hole and other surfaces impermeable and smooth. The deformation by loading pressure is simulated using the model in Fig. 1 and deformation by pore pressure changing is simulated using the model in Fig. 2.
VFigure 1. Simulation model for loading pressure
Figure 2. Simulation model for pore pressure
Figure 3. Cut plane
With a pressure loading of #=lxl0 2 Pa and simulation time of Ato=50s and Atn=Atn_ Selected deformation steps at the cut plane in Fig. 3 for this case are given in Fig. 4.
L.
\L (1)
.
VI
\/
(2) (3) Figure 4. Deformation steps for loading pressure case
\ (4)
When pore pressure is assumed to be 5%E and simulation time is Atn=50s and At^At^+SOs. The zones within the bold curves in Fig. 2 are the holes where fluid drains out. Selected deformation steps at the cut plane in Fig. 3 for this case are given in Fig. 5.
JT\
Zi
pweprntin
y
* *
y
\
pot prtmri
,
*
x
7 \
> i....tiJ...i.J Ll..^S?~. lJ...±...L.LiJ l..l.^I?-; (1)
p«ra pmwre
,
f
•>•
attar CSF draknga
.(....* > t i i J . i l -X',2
(2) (3) Figure 5. Deformation steps for pore pressure case
*
2 (4)
The results show pore fluid pressure change is the main reason for brain shift. References 1. Biot M. A., General theory of three-dimensional consolidation. Journal of Applied Physics, 12 (1941) pp. 155-164. 2. Bathe K. J., Finite element procedures. Englewood Cliffs, N.J.: Prentice-Hall. 1996. 3. William H. P., Saul A. T., William T. V. and Brian P. F., Numerical recipes in C: The art of scientific computing, 2nd ed. Cambridge, New York: Cambridge University Press. 1994 4. Lewis H. R. and Denenberg L., Data structures and their algorithms. New York: HarperCollins Publishers. 1991.
3-D MULTI-BLOCK ORTHOGONAL GRID GENERATED BY LAPLACE EQUATION WITH SLIDING BOUNDARY CONDITION Z.K.ZHANG Temasek Laboratories,
National University of Singapore,10 Kent Ridge Crescent, Singaporel E-mail: tslzzk© nus. edu. sg
19260
A three-dimensional multi-level multi-block Laplace equation based grid generation method with sliding boundary condition is presented to generate a smooth, orthogonal-to-boundary grid. Laplace equation based method can obtain good smoothness across block interfaces when using ghost points along both sides of the interfaces. In the iterative solution of the Laplace equation, the sliding boundary condition is made possible by using NURBS to represent the body surface. With NURBS, the body surface grid points move freely along the surface and are updated at each iteration step of Laplace equation solution by projecting the second-layer grid points to the body surface in its normal direction, consequently the grid lines starting from the body are always orthogonal to it. Thus the body surface grid is created simultaneously with the interior volume grid.
1
Introduction
Grid orthogonality plays an important role in implementing boundary conditions in numerical simulation of flow field. Orthogonal grid simplifies the formulation of boundary condition and makes it more accurate. Conformal mapping method can indeed create smooth orthogonal grids, but it is difficult to specify or control the grid spacing and it is only applicable to two-dimensional case. Algebraic method is fast and can control both orthogonality and grid spacing but can not prevent grid from crossover, nor guarantee overall smoothness. Elliptic equation based method naturally smooths the discontinuities on boundaries and can control orthogonality and spacing on boundaries by iteratively correcting source terms, such as Sorenson method[1] or Hilgenstock methodt2,3,4]. But both Sorenson method and Hilgenstock method are time-consuming since there are two levels of loops in the two methods. Laplace equation based method (i.e., elliptic equation without source terms) maintains the smoothing property of the elliptic equation based method and has the tendency to create uniform grids if the boundary points are free to float (Neumann-Dirichlet or sliding condition) during the grid generation process. The method also ensures that the grid does not cross over. The motivation of the present paper is, in the Laplace equation solution process, to add sliding boundary condition by NURBS (Non-
913
914
Uniform Rational B-Spline) to body surface boundary which is represented by the NURBS. At each step of the Laplace equation iteration, the body surface grid points are up-dated by projecting the second-layer grid points to it in its normal direction. Thus the grid lines starting from the body surface are always orthogonal to it and its surface grid is generated simultaneously with the interior volume grid.
2
Description of the Method
The 3-D Laplace equations used for grid generation are
C„+Cw+Ca=o The equivalent equation in computational space can be written in a vector form[23'4] =o
(2)
where f = xl + yj + zk • Eq. (2) can be solved either with SOR (Successive Over-Relaxation) method or SLOR (Successive Line Over-Relaxation) method. To provide flexibility for grid generation, a multi-block, patch grid method is used. Using ghost or halo points along both sides of the interfaces ensure continuity across block interfaces. In solving Eq. (2), all the blocks are computed simultaneously, thus boundary points are updated at each iteration step. This ensures the grid lines to be smooth across the interfaces. In order to obtain boundary orthogonality, the sliding condition is incorporated by using NURBS to represent the body surface. This condition allows boundary points to move on the boundary freely. At each iteration step of Laplace equation solution, the body surface grid points are updated by projecting the second-layer grid points to the body in its normal direction, therefore the grid lines originating from the body are always orthogonal to it. Finally when iteration converges, both the body surface grid and the interior volume grid are created and the orthogonality on body surface is obtained as well. To generate grid faster and more efficiently, the grid generation process is made in a multi-level way. That is to say, a coarsest grid (first
915
level) is generated first by a few iteration steps. Then interpolate this grid and iterate by Laplace equation solver to obtain a finer grid (second-level). Each higher level grid is generated by its previous level grid in this way. The highest-level grid is the final grid. 3
Applications
An ellipsoid is selected as a test. The three semi-axes of the ellipsoid are 1.0, 2.0, 0.4, respectively. The ellipsoid surface (inner boundary) is divided into 6 blocks corresponding, respectively, to the six faces of a cube surface which is taken as the far-field (outer) boundary. The volume grid is, therefore, divided into six blocks too. Each block resembles a frustum of a pyramid. Taking a cube surface as the far field boundary is deliberate to test the influence of the method in minimizing boundary discontinuity effects on overall grid smoothness. The coarsest (first level) 6-block grid is generated by 20 iterations of Laplace equation solver with grid number of 5x5x5in each block (Fig.l) and the second-level grid ( 9 x 9 x 9 grid points in each block) is generated by 40 iterations (Fig.2).
x
Fig.l Coarsest grid(5x5x5)
Y
Fig.2 Second-level grid(9x9x9)
The third-level grid (17x17x17 in each block) and the fourth-level grid (33x33x33 in each block) are generated in the same way with 100 iterations respectively.Fig.3 and Fig.4 present the fourth-level volume grid and body surface grid respectively. Good grid smoothness can be found from the figures. The grid lines across the block interfaces are very smooth although there are slope discontinuities along the outer boundary . The
916
good smoothness is attributed to the Laplace equation itself and to using the ghost points.
Fig.3 The fourth-level grid (33 x33x33)
4
Fig.4 Body surface grid of the fourth-level grid
Conclusions
Representing body surface with NURBS is helpful in making grid lines starting from body surface orthogonal to it. At the same time it makes it possible to generate the volume grid and the body surface grid simultaneously by projecting the second-layer grid points to the body in its normal direction. How to search body normal lines faster and to ensure the fidelity of the NURBS are two important factors to the method. References 1. Sorenson, R.L., McCann, K., "A method for iterative specifications of multiple-block topologies", AIAA paper 91-0147, 1991. 2. White, J.A., "Elliptic grid generation with orthogonality and spacing control on an arbitrary number of boundaries", AIAA paper 90-1568, 1990. 3. Sonar, T., "Grid generation using elliptic partial differential equations", Braunschweig: DFVLR-FB89-15, 1989. 4. Zhang, Z.K., Zhu, Z.Q., Zhuang, F.G., "A multi-block grid generation system for multi-component aircraft", Proc. of 7th International Symp. on Comput. Fluid Dynamics, International Academic Publishers, Sept. 1997, pp. 390-395.
MESHING HUMAN BRAIN WITH HEXAHEDRAL MESHES FROM IMAGE SLICES USING MESH MAPPING METHOD Y. L. LI Institute of High Performance Computing, #01-01 The Capricorn, 1 Science Park Road, Singapore Science Park II, Singapore 117528 E-mail: liyl@ ihpc. a-star. edu.sg K. H. LEE Mechanical Department, National University of Singapore, 10 Kent Ridge Crescent, Singapore 119260 Finite element method (FEM) is applied to analyze the dynamic behaviors of human brain in biomechanics. But the brain cannot be easily meshed with hexahedral meshes for FEM simulation due to its complex shape. Conventionally, it is meshed with hexahedrons from its computerized tomography (CT) or magnetic resonance imaging (MRI) image voxels directly. Unfortunately, this method tends to generate poor quality elements at the boundary of the volume. Voxel-based meshing method also causes extremely fine meshes. Too fine a mesh results in a big system for matrix, which is a huge disadvantage in real-time dynamic simulation. Due to the shortcomings of voxel-based meshing method, another method for creating a threedimensional hexahedral mesh of the human brain is proposed herein. In this method, the image slices are used for extracting contours instead of being meshed with hexahedral elements using the image voxels directly. After the contours have been extracted, they are used for mapping the recognition model to the physical model using the method put forward in this paper. Hexahedral meshes generated from image slices of the human brain with different resolution are given for demonstrating efficiency of the method. Moreover, this mesh mapping method can also be used for generating hexahedral meshes from CT or MRI image slices of other human organs.
1
Introduction
Mesh mapping method [1] is based on the theory of Basis mesh (Recognition model) + F => Physical model. This method is theoretically very simple. Its capacity to handle realistic geometrical situations adequately strongly linked to the mapping function F. This paper implements the mesh mapping theory to map a regular recognition model of the human brain to its true physical model, which is constructed from the MRI image slices of the human brain. 1
Recognition model
Figure 2. Free surfaces of the recognition model
Figure 1. Recognition model
917
918
The regular recognition model of the human brain can be constructed using commercial FEM software package such as PATRAN®. It is given in Fig. 1 and is to be mapped to the true physical model by chosen mapping function. The boundary surfaces of the recognition model are classified as shown in Fig. 2. For simplicity and clarity, only part of the surfaces is presented. 2
Mapping function
The mapping function chosen here is the Laplacian function of
V V r = 0 in Q 1 T{Z,n,Z) = TT onT J where VV=A is the Laplacian operator. T can be the coordinate component x, y or z respectively. 3
Physical model
So far, the recognition model is ready and the mapping function is chosen. The last problem in solving this mapping function is to find the coordinates of the bounding surface nodes on the physical model. These coordinates are needed as boundary conditions when solving the mapping function numerically. The steps involved in obtaining these coordinates are described next. 3.1
Contours of image slices
The contours of the image slice of the human brain are extracted using image processing technologies [2]. They are shown as in Fig. 3. For simplicity and clarity, only contours on half side of the human brain are given. The other half is nearly symmetric.
Figure 3. Slices of contours extracted from image slices Further down, the contours extracted are broken and classified into groups for constructing the top, left and right side parts of the free surfaces. The contours in Fig. 3 are broken and classified into groups as in Fig. 4. They are used for construction of the surface patches of the physical model.
Figure 4. Broken contours for surface patches of brain
919 3.2
Surface patches of physical model
After all the contours extracted from the image slices have been divided into groups, they can used to construct the boundary surface patches of the human brain. Here they are represented by many small triangles as shown in Fig. 5.
(a)
3.3
(b) (c) Figure 5. Triangulated side surfaces of brain
Mapping of surface of recognition model to physical model
Once the boundary surfaces of the recognition model and the physical model have been identified and constructed, the surface patches of the former can be mapped to the latter using the mapping method by solving the two-dimensional Laplacian equation. But before the two-dimensional Laplacian equation is used for mapping the recognition surface to the physical surface, the surfaces need to be projected to a suitable two-dimensional plane in order for the mapping to be successful. The projection direction is chosen through evaluating the average normal of all triangles. After the surfaces are projected to a plane in this direction, the 2D Laplacian equations are solved for mapping. Here only the left part side is demonstrated, other parts are the same. Fig. 6 is the 2D mapping result of the left part side after the boundary surfaces of the recognition and true models are projected to the plane transverse to the average normal.
ip_ Figure 6. Result of two-dimensional mapping in a plane
'>:>"*& ISO metric view
Front view
Side view
Figure 7. Mapped quadrilateral surface and physical surface
Yet as shown in Fig. 7, the shapes of the mapped recognition surface and the physical surface are similar only when they are projected onto the plane transverse to the average normal direction. But differ from the physical one in the average normal direction. So the mapped nodes on this plane need another projection to the physical surface along the average normal direction. To do this, a line in this direction is drawn from the node so that it intersects with one triangle on the triangulated surface [3]. This intersection point is where the mapped node is projected.
920
4
Results and Conclusion
After the surface of recognition model is mapped to the physical model, the coordinates of the boundary nodes are used as boundary conditions to solve the three dimensional Laplacian equation and get the final result of the mapping method. Both coarse and fine meshes are generated and shown in Fig. 8. - >*s
"^11 ft
irse mesh A fine mesh A coarse Figure 8. Results of mapping method
The mapping method can be used for meshing the human brain with hexahedral elements. However, the difficulty in extracting contours from the image slices still remains. Even after the contours have been extracted, classifying them into different parts requires significant expert experience. Thus, the codes developed for this mesh mapping method need to be integrated with some expert system. In order to approximate the surface of the brain from image slices with triangles, the marching cube method for surface rendering can be used. The difficulty in using this method is that there is no clear demarcation between the brain tissue and bone. This causes great difficulty in distinguishing the brain and skull unless some new technology can be used to overcome the problem. It may also be worthwhile to investigate the use of image processing technology to extract the contours from the image slices. References 1. George P. L., Automatic mesh generation: Application to finite element method(Chichester: John Wiley and Sons. 1991). 2. Mignotte M. and Meunier J. A., Multiscale optimization approach for the dynamic contour-based boundary detection issue. Computerized Medical Imaging and Graphics, 25(3) (2001) pp. 265-275 3. Foley J. D., Dam A., Feiner S. K. and Hughes J. F., Computer Graphics: Principles and Practice(Addison-Wesley, Reading, MA, 2nd. edition, 1995).
0, 00
19 9 1
27 27 27
"i n 1 ~2
2,
1 1 1 1 1
7 7 7 7 7
3 0. 0 3 0.0 3 0. 0 3 0. 0 3 0. 0 3 0. 0 30
27 ^7
27
P279 sc ISBN 1-86094-345-4(pbk)
Imperial College Press '.icpress.co.uk
9 781860"943454
linn