Recent Advances in Computational Science and Engineering

•U , \J V u K 1 9 , 21 2,21 ^0 i 27 27 27 30. 30. 30. 17 0 0 0 •"} Q "7 3 3 2,23 c , d 3 7n n 1 7 I i O...

Author: H. P. Lee

185 downloads 3772 Views 51MB Size Report

This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!

Report copyright / DMCA form

DOWNLOAD PDF

•U , \J V

u

K 1 9 , 21 2,21

^0 i

27 27 27

30. 30. 30.

17

0 0 0

•"}

Q

"7

3 3

2,23 c , d

3

7n n

1 7 I

i

OU.

U

3

Proceedings of the Infernofionol Conference on ScienMfic & Engineering Computation [IC-SEC] 2002 3 P

"7

2

Editors

H. R Lee K. Kumar ?! 8*

Imperial College Press — -

Proceedings of tlie International Conference on ScienfipcftEngineering Computation [IC-SEC] 3002

liunhuinhy

ins in

oiiuiiil klMP ui i ini e IK ir y o oil

This page is intentionally left blank

Proceedings of Ihe International Conference on Scientific & Engineering Computation [IC-SEC] EDOB

Recent Rdvanc cienc Co 00 I 3 - 5 December 2002

Raffles City Convention Centre, Singapore

Editors

H. R Lee & K. Kumar (Institute of High Performance Computing, Singapore)

Co-Organisers:

1

IMUS

Institute of High Performance Computing

National University of Singapore

In Cooperation With:

CDGC The 5uporcomputing People

siHJTL

^AM

Imperial College Press

Published by Imperial College Press 57 Shelton Street Covent Garden London WC2H 9HE Distributed by World Scientific Publishing Co. Pte: Ltd. 5 Toh Tuck Link, Singapore 596224 USA office: Suite 202,1060 Main Street, River Edge, NJ 07661 UK office: 57 Shelton Street, Covent Garden, London WC2H 9HE

British Library Cataloguing-in-Publication Data A catalogue record for this book is available from the British Library.

RECENT ADVANCES IN COMPUTATIONAL SCIENCE AND ENGINEERING Proceedings of the International Conference on Scientific and Engineering Computation (IC-SEC) 2002 Copyright © 2002 by Imperial College Press All rights reserved. This book, or parts thereof, may not be reproduced in any form or by any means, electronic or mechanical, including photocopying, recording or any information storage and retrieval system now known or to be invented, without written permission from the Publisher.

For photocopying of material in this volume, please pay a copying fee through the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, USA. In this case permission to photocopy is not required from the publisher.

ISBN

1-86094-345-4 (pbk)

Printed by Fulsland Offset Printing (S) Pte Ltd, Singapore

INTERNATIONAL CONFERENCE ON SCIENTIFIC & ENGINEERING COMPUTATION (IC-SEC) 2002 Recent Advances in Computational Science & Engineering The inaugural IC-SEC 2002 is jointly organised by the Institute of High Performance Computing (IHPC), the Faculty of Engineering and Faculty of Science, and Institute for Mathematical Sciences of the National University of Singapore (NUS). The conference is also organised in cooperation with the Australian and New Zealand Industrial and Applied Mathematics (ANIZAM), Centre for Development of Advanced Computing (CDAC), Society for Industrial and Applied Mathematics (SIAM), and Theoretical and the Applied Mechanics Society - Singapore (SingTAM). IC-SEC 2002 aims to create a forum for engineers and scientists who are involved in the use of high performance computers, advanced numerical strategies, computational methods and simulation in various scientific and engineering disciplines. The conference offers participants a platform to present and discuss latest trends and findings or simply to learn about the state-of-the-art in their particular field(s) of interest. The conference also hopes to provide a forum for the interdisciplinary blending of computational efforts in various diversified areas of sciences such as biology, chemistry, physics and materials and all branches of engineering. The technical committee has therefore devised a broad range of topics and application areas surrounding the theme of this conference, all of which involves modelling and simulation work using high performance computers. Conference Topics and Application Areas MEMS Modelling & Simulation Computational Acoustics Computational Fluid Dynamics Computational Solid and Structural Mechanics Computational Electromagnetics Computational Electronics Computational Chemistry Computational Materials Science Computational Nano-Science Defence Modelling & Simulation Grid Computing & Applications Scientific Visualisation Fast Algorithms Parallel & Distributed Computing V

This page is intentionally left blank

Conference Committee Conference Chair Lam Khin Yong, Institute of High Performance Computing / National University of Singapore, Singapore Conference Co-Chairs Barbara Lee Keyfitz, University of Houston, USA Lee Seng Luan, National University of Singapore, Singapore Nhan Phan-Thien, National University of Singapore, Singapore International Scientific Advisory Members Francine D. Berman, University of California, San Diego, USA Rajkumar Buyya, Distributed Systems Engineering Lab, University of Melbourne, Australia Louis Chen, National University of Singapore, Singapore Chew Weng Cho, University of Illinois, Urbana-Champaign, USA Carolina Cruz-Neira, Iowa State University, USA David Ferry, Arizona State University, USA Gao Beng Qing, Beijing Institute of Technology, China Tony Hey, University of Southampton, UK Marc Ingber, University of New Mexico, USA Christian Joachim, Centre National De La Recherche Scientifique, France Carl Kesselman, Information Sciences Institute, The University of South California, USA Sangtae Kim, Lily Research Laboratories, USA S. Kitipomchai, City University of Hong Kong, Hong Kong, China Leo W M Lau, Chinese University of Hong Kong, Hong Kong, China Liu Wing Kam, Northwestern University, USA Mark S Lundstrom, Purdue University, USA Max G Q Lu, University of Queensland, Australia Kwan-Liu Ma, University of California, Davis, USA Shaker A Meguid, University of Toronto, Canada Eugenio Onate, International Centre for Numerical Methods in Engineering, Spain D R J Owen, University of Swansea, UK Maurice Petyt, Institute of Sound and Vibration Research, UK Ken Powell, University of Michigan, USA Sharad Purohit, Centre for Development of Advanced Computing, India J N Reddy, Texas A & M University, USA Dennis Salahub, Steacie Institute for Molecular Sciences, Canada William R. Schowalter, University of Illinois, Chicago, USA Satoshi Sekiguchi, National Institute of Advanced Industrial Science and Technology, Japan VII

VIM

Shi JiaoYing, Zhejiang University, China Young S. Shin, U.S. Naval Postgraduate School, USA Ian Sloan, University of New South Wales, Australia Charles Steele, Stanford University, USA Gilbert Strang, Massachusetts Institute of Technology, USA Jan K Sykulski, University of Southampton, UK Tong Liyong, The University of Sydney, Australia Gretar Tryggvason, Worcester Polytechnic Institute, USA Putchong Uthayopas, Kasetsart University, Thailand Vijay Varadan, Pennsylvania State University, USA Wang Bo Ping, The University of Texas at Arlington, USA Weinan E, Princeton University, USA Tomas Wierzbicki, Massachusetts Institute of Technology, USA Yao Zhenhan, Tsinghua University, China

CONTENTS

Plenary Multiscale Modeling and Computation of Incompressible Flow T.Y.Hou Splitting Methods for Incompressible Viscous Flow Problems with Free and Moving Boundaries E.J. Dean, R. Glowinski, L.H. Juarez and T.W. Pan

3

7

Contributed Papers COMPUTATIONAL CHEMISTRY Computational Chemistry 1 Cation-7r Interactions in Ag(l)-Substitued Alkylbenzenes Complexes: A Theoretical Study Y.P. Wong, KM. Ng and C.W. Tsang

15

Light Propagation in Biological Tissue Using Monte Carlo Simulation A. Aggarwal

19

Predicting Materials Properties Using First Principles Electronic Structure Calculation Y.P. Feng, A.T.L. Lim and J.C. Zheng

23

Correlation of the Distribution of Essential/Trace Elements in Human Hair for Lung Cancer Diagnosis P.L. Mao, H.L.K. Kueh, W.K.A. Ng, Y.T. Wang and P. Wu

28

Dynamic Simulation of an Anaerobic Digestion Process Using Orthogonal Collocation on Finite Elements Method T. T. Lee and P. Wu

32

IX

X

Computational Chemistry 2 Total Algebraic System Identification for Homogeneous Catalyzed Syntheses of Fine Chemicals (Alkenes to Aldehydes) E. Widjaja, C.Z. Li and M. Garland

36

Numerical Solution of Bound-Constrained Nonlinear Systems in Chemical Engineering S. Bellavia, M. Macconi and B. Morini

41

Phonon Dispersion, Structure Stability, Surface Relaxation and Surface Energy for Metals Ni, Cu and Pd J. Cai and J.S. Wang

45

On the Development of Weighted Two-Band Target Entropy Minimization for the Reconstruction of Pure Component Mass Spectra H.J. Zhang, M. Garland, Y.Z. Zeng and P. Wu Computer Aided Additives Study for Steel Hot Dip Galvanizing H.M. Jin, H.L. Liu and P. Wu

49

54

Computational Chemistry 3 Cation-7i Interactions in Ag(l)-Substituted Naphthalene Complexes: an Ab initio Molecular Orbital Study H.M. Lee and C.W. Tsang

58

Band-Target Entropy Minimization (BTEM) - A Novel Approach for Pattern Recognition in Chemical Spectrocopy E. Widjaja and M. Garland

62

Development of a Web-Based Statistical Process Control Analysis and Reporting System for Tablet Processing Plants T.T. Lee, Y.L Li, P. Wu, M.K. Chin, T. Lim and LK. See

67

Monte Carlo Simulation of Surface Segregation in Ni-Co Nanoparticles R. Jayaganthan and G.M. Chow

71

First Principles Studies of the Dissociation of Protonated Alanine (H+-ALA) Complex S. Abirami, C. Wong, N.L Ma, C.W. Tsang and N.K. Goh

76

XI

Molecules and Structures for Molecular Electronic Nanodevices P. Bai, S.W. Yang, P. Wu andE.P. Li

80

COMPUTATIONAL FLUID DYNAMICS Computational Heat Transfer CFD Analysis of Film Cooling of a Cyclindrical Leading Edge with Compound Angle Injection B.S. Chufal and A.R. Srikrihnan

84

Flow Analysis in Port and Cylinder of a Spark Ignition Engine with Oblique Valve M.J. Luo, G.H. Chen and Y.H. Ma

88

Analysis of Chemical Reactions in Argon Thermal Plasma Flow Using the Eddy Dissipation Concept K. Sundaravadivelu, H.W. Ng, S.C.M. Yu, J.C. Chaiand Y.C. Lam

93

Numerical Simulation of 2-D Reacting Flow in Nozzle of Liquid Rocket Motor H.F. Qiang, Y.C. Yang and X L Xia

97

Simulations of the Onset of Convection in a Non-Newtonian Liquid Under Fixed Surface Temperature Boundary Condition K.C. Ting, K.K. Tan and S.Y.T. Choong

102

Computer Visualisation of Fluid Circulation in Annuli of Heated Rotating Cylinders of Low Prandtl Number Fluids Z.D. Xu, Y.L. Huo and T.S. Lee

108

Finite Element Analysis of Hollow Brick Drying in Forced and Mixed Convection Environment H.N. Suresh, J.R. Kumar, P.A.A. Narayana and K.N. Seetharamu

112

Computational Fluid Dynamics 1 High Accuracy Simulation of Multi-Medium Flow C.W. Wang, T.G. Liu and B.C. Khoo

118

XII

The Modified Ghost Fluid Method T.G. Liu, K.C. Hung and B.C. Khoo

122

Gauss Quadrature vs. Analytic Integration for Finite-Element CFD Codes MM. El-Awad, M.J.N. Boyce and F. Tarlochan

126

A Projection Method for Solving Incompressible Viscous Flows on Domains with Moving Boundaries H. Pan, LS. Pan, D. Xu and T.Y. Ng

130

Multi-Physics Simulations of Vortex-Induced Cylinder Vibrations S. Y. Lee and M. Lee

134

Computational Fluid Dynamics 2 Characteristics of Airflow in a Prototype of a Hard Disk Drive X.G. Xu, N.M. Sudharsan, K. Kumar, T.H. Yip, M.A. Suriadi and E.H. Ong

139

Linearized Navier-Stokes Equations and its Applications in Unsteady Airfoil Flows R.J. Cao

143

Cost-Effective Formulation of a Finite-Element Model for Atmospheric Dispersion of Dense Gases M.M. El-Awad

147

Spatial Simulation of 2D Tollmien-Schlichting Wave over Volume Based Viscoelastic Layer Z. Wang, K.S. Yeo and B.C. Khoo

151

Numerical Study of Passive Displacement Ventilation H.J. Poh, J. Lou and K. Kumar

155

Computational Fluid Dynamics Simulation of the Dispersion of Airborne Contaminants in a Work Environment S.M. Salim and G. Xu

159

Numerial Simulation of Low Reynolds Number Channel Flow Over Dimpled Surface Z. Wang, K.S. Yeo, H Lim and B.C. Khoo

163

Computational Fluid Dynamics 3 CFD Modelling of Particle Transport and Biological Reactions in a Mixed Wastewater Treatment Vessel M. Brannock, T. Howes, M. Johns, B.D. Clercq andJ. Keller

167

A Computer Analysis of Turbulent Flow of Acid in the Pickling of Steel Strips B.C. Khoo, D. Xu, W.Y.D. Yuen and W.K. Soh

171

Finite Element Analysis of Non Linear Fluid Structure Interaction in Hydrodynamics Using Mixed Lagrangian-Eulerian Method N.M. Sudharsan, K. Murali and K. Kumar

176

Numerical Study of Fluid Flow through Multiple Bell-Shaped Constrictions in a Tube W. Liao, G.C. Li and T.S. Lee

180

A Mixing Interface Treatment for Numerical Simulation of Tip Clearance Flow of a Small Axial Flow Fan R.J. Cao

184

CFD Simulation of Precipitation Process C.N. Lim

188

Computational Fluid Dynamics 4 Effect of Rotor's Aspect Ratio on Entropy Generation in a Rotor-Casing Assembly B. Abu-Hijleh, J. Y. Tu and A. Subic Lattice Boltzman Method on Nonuniform Mesh M. Cheng and K.C. Hung Numerical Study of the Effects of Check Valve Closure Flow Conditions on Pressure Surges in Complex Fluid Systems with Air Entrainment T.S. Lee Numerical Simulation of Czochralski Crystal Growth by Fixed Grid Technique M.A. Suriadi, D. Xu and B.C. Khoo

192

196

200

204

XIV

A Pre/Post Processor for Finite Element Modelling for Coal Mining Applications S.G. Chen, S. Craig, H. Guo and D.P. Adhikary

209

Evaluation of Turbulence Models for Hydrofoil Turbulent Boundary Layer Flow at High Reynolds Numbers N. Mulvany, J. Y. Tu, L Chen and B. Anderson

213

Mesh Adaptation for Time-Accurate Viscous Compressible Fluid Flow O. Hassan, K.A. Sorensen, K. Morgan and N.P. Weatherill

217

A Numerical Study on Bubble Structure Interaction in Underwater Explosions K.C. Hung, C. Wang, E. Klaseboer, C.W. Wang and B.C. Khoo

223

COMPUTATIONAL ELECTROMAGNETICS AND ELECTRONICS Computational Electronics Delay Computation of Large Interconnect Network Y. CaoandE.C. Tan Modeling of On-Chip Buses for Placement Optimization in Integrated Circuits O. Peyran

227

231

Simulation of Nano-Scale Single-Electron Devices and Circuits P. Bai, E.P. Li and R.M. Patrikar

235

Surface Roughness Modeling R.M. Patrikar and K. Ramanathan

239

A Novel Scheme for Simulating Quantum Effects in Hydrodynamics Model E.P. Liu, E.P. li, P. BaiandR.Q. Han

243

A CAD Tool to Study Thermal Distribution for Block Level Placement in Embedded Systems R.M. Patrikar, K. Muraii and E.P. Li

247

XV

Computational Electromagnetics Applications 1 Comparative to FDTD, PSTD and MRTD Methods in Studies for Planar Stratified Media MS. Tong, Q.S. Cao, K.K. Tamma and Y.L Lu FDTD Analysis Effective of Printed Dipole Antenna M. Tangjitjesada, N. Anatrasirichai and T. Wakabayashi Radiation and Signal Integrity Analysis in Imperfectly Differential Transmission Lines with Full-Wave FDTD Method W.L. Yuan and E.P. Li

251

255

259

Radiated Emission Prediction in Electronics Circuit System Level J. Gao, S.B. Wee and E.P. Li

263

Web-Based Electromagnetic Simulation Y. Fan, Y.L Li, E.P. Li and S.K. Chin

267

Parallelized FDTD Method for Analysis of Signal Integrity in High-Speed Electronic Circuits H.F. Jin, E.P. Li, W.L. Yuan and LW. Li

271

Computational Electromagnetics Applications 2 Generic Approach to Overcome the Low-Frequency Breakdown in Electromagnetic Computations by Moment Methods A.J. Lapovok, N.L. Sudov and O.V. Grimalski

275

Crosstalk Simulation of High Speed Interconnects by an Efficient Finite Difference Method Y. Xiao, E.P. Lee and K.H. Lee

279

Transient Simulation of High Speed Interconnects using Coupled Model Order Reduction and FDTD-Macromodeling Technique E.X. Liu, E.P. Li, X. Ying, LW. Li and K.H. Lee

283

Zigzag Slot Antenna with CPW Feed N. Anantrasirichai, A. Lorphichian, J. Nakasuwan and T. Wakabayashi

287

Analysis Electromagnetic Field of Slot Antenna N. Anantrasirichai, S. Puntheeranurak, C. Jamjank and T. Wakabayashi

291

XVI

Computational Electromagnetics Applications 3 Load Characteristics Analysis of a 100kVA Synchronous Generator with High Temperature Superconducting Field Winding using Finite Element Modelling K.S. Ship, J.K. Sykulski and K.F. Goddard

295

Stochastic Modeling and Characterization of Electrical Trees in Composite Insulation Structure using Fractal Concepts R. Sarathi, C.R. Anilkumar and R. Jayaganthan

299

Application of Robust Design Techniques to Electromagnetic Devices Design Optimization X.K. Gao, J.T. Li, Z. Xie and Z.J. Liu

304

Numerical Simulation of Nonlinear Behavior of Electromagnetic Pulses Inside Dielectrics with Nonlinear Susceptibilities M.T.M. Ho and T.H. Huang

308

The Noise-Robust Variable Step-Size Algorithm for Lattice Form Adaptive MR Notch Filter C. Benjangkaprasert, S. Teerasakworakun and K. Janchitrapongvej A New Constant Modulus Algorithm for Adaptive Equalizer P. Tupchai, C. Benjangkaprasert and O. Sangaroon

316

320

Computing Engineering in Communication The Variable Step Size Blind Adaptive Decorrelating Detector PIC in DS/CDMA System S. Benchapornkullanij, C. Benjangkaprasert and M. Lertwatechakul Performance Evaluation of Finite-Length MMSE-DFE in Wideband Mimo Channel P. Changsuwan, M. Chamchoy, S. Promwong and P. Tangtisanon The Efficiency of WI-FI (IEEE 802.11B) in Presence of Bluetooth System Indoor Propagation T. Subson, P. Supanakoon, P. Rawiwan, M. Chamchoy, S. Promwong and P. Tangtisanon

324

328

332

XVII

A Study of Self-Similar Pseudorandom Teletraffic Generators for Simulation J.S.R. Lee, H.W. Park and H.D.J. Jeong The Delay-Bounded Source Model S. Ruttanawit, M. Lertwatechakul and P. Sooraksa Invertible Integer FFT and DCT Applied on Lossless Image Compression Y.S. Van, CM. Wang, G.D. Su and Q.Y. Shi

336

340

344

Fast and Parallel Electromagnetics Computation 1 Solution of Scattering by Homogeneous Dielectric Bodies using Parallel P-FFT Algorithm W.B. Ewe, Y.J. Wang, L.W. Li and E.P. Li The Common Component Architecture (CCA) Applied to Sequential and Parallel Computational Electromagnetic Applications D.S. Katz, E. R. Tisdale and CD. Norton

348

353

A Fast Algorithm for Three-Dimensional Electrostatic Analysis: Fast Fourier Transform on Multipole (FFTM) E.T. Ong, H.P. Lee, K.H. Lee and KM. Lim

357

An Alternative Implementation of Interpolation in Multilevel Fast Multipole Method (MLFMM) C.P. Lim, Y.J. Zhang, F. Wu and E.P. Li

361

Fast Matrix Algorithms for Hierarchically Semi-Separable Representations S. Chandrasekaran, T. Pals and M. Gu

365

Parallel Fast Multipole Method for Large-Scale Computation of Electromagnetic Scattering Y.J. Zhang, F. Wu, C.P.E. Lim and E.P. Li

369

Fast and Parallel Electromagnetics Computation 2 Two Classes of Preconditioning Techniques for Electromagnetic Wave Scattering Problems J. Zhang, J.H. Lee and C.C. Lu

373

XVIII

Arbitrary Order Edge Element Methods for 2D EM Scattering K. Morgan, P.D. Ledger, O. Hassan, N.P. Weatherill and J. Peraire Parallelization of Pre-Corrected FFT in Scattering Field Computation Y.J. Wang, L.W. Li and E.P. Li Finite Element Analysis of Photonic Crystal Fibres R. Yang and Y.L. Lu Parallelization of Fast Multipole Method using MPI on IBM High Performance Computers F. Wu, Y.J. Zhang, C.P.E. Lim and E.P. Li Parallel Unstructured Meshes Approach for the Simulation of Electromagnetic Scattering O. Hassan, J. Jones, B. Larwood, K. Morgan and N.P. Weatherill

377

381

385

389

393

COMPUTATIONAL MECHANICS Virtual Design Development of Parting Line Generation Tools for a 3D CAD Injection Mould System W.M. Chan and S.L. Lieow

398

Simulation of Temperature and Stress Field in Deposition Process for RPST by Homogenization Method G.L Wang, Z.H Xu and HO. Zhang

402

Geometric Model and Numerical Simulation for the Laying Process of Wire Rope G.L. Wang, J.F. Sun and H.O. Zhang

407

Agent-Based Composable Simulation for Virtual Prototyping W. Xiang, S.C. Fok and G. Thimm Knowledge-Based Rapid Virtual Engineering System for Product and Tooling Design R.D. Jiang, T.W. Lim and B.T. Choek

413

417

XIX

Virtual Aesthetic Design: Architecture and Some Results W.S. Li, S.H. Xu, G. Zhao and Y.L. Ke Numerical Simulation and Experiment on Prediction for Retention Force H.Z. Dong, T.W. Lim and B.H. Low

421

425

Couple Field Analysis and Nanostructured Material Failure Probability of Wire Bonding Packages F. Wang, Y. Y. Wang and C. Lu

429

Shape Control of Smart Composite Plate Structures Based on Actuator Shape Optimisation Q. Nguyen and L. Y. Tong

434

Numerical Investigation of Micro-Scale Sheet Metal Bending using Laser Beam Scanning Z.Q. Zhang, G.R. Liu andXM. Tan

438

Three-Dimensional Finite Element Study of the Elastic Fields in Quantum Dot Structures Q.X. Pei and C. Lu

443

Directional Dependence of Surface Morphological Evolution of Heteroepitaxial Films P. Liu, Y. W. Zhang and C. Lu

447

Forming of Nanostructured Materials: Numerical Analysis in Equal Channel Angular Extrusion (ECAE) of Magnesium, Aluminium and Titanium Alloys B.H. HuandJ.V. Kreij

454

CAD Mechanical Design 1 The Development of Standard Part Database for Progressive Die Design Z.H. Wang

459

The Application of Sensitivity Analysis to Modifying Car Body Configuration X.R. Zhang and M. Zhu

463

XX

The Simulation of the 7t-Type Constraint Bending Process H.Z. Xu

468

Optimising the Dimensions of Cylindrical Ultrasonic Motor Q.G. Yang and S.P. Lim

472

Design and Analysis of a High-Efficiency MR Valve W.H. Li, H. Du and N.Q. Guo

476

Computer Design and Visualization on New Loop Worm Transmission Q.S. Luoand B.L. Han

480

CAD Mechanical Design 2 A Computer-Aided Optimisation Approach for the Design of Cooling Channels and Selection of Process Parameters in Plastic Injection Moulding L.Y. Zhai, Y.C. Lam, K. Taiand S.C. Fok Wavelets-Based Multiresolution Representation and Manipulation of Closed B-Spline Curve G. Zhao, S.H. Xu, W.S. LiandX.X. Zhu A Piping Modeling and Calculation System H.X. Gao

485

490

494

Optimization of Injection Molded Part Based on the CAE Simulation V. LiandC.R. Pan

498

Evaluating Plane-Strain Forging of Magnesium Alloy AZ31 using Finite Element Analysis S.C.V. Lim, M.S. Yong and CM. Choy

502

Numerical Structural Analysis 1 Comparative Structural Evaluation of Protective Helmets using the Finite Element Method A. Subic, M. Takla and C. Mitrovic

506

XXI

Buckling Analysis of Composite Spherical Panels with Random Material Properties B.N. Singh, N.G.R. lyenagarand D. Yadav

510

Numerical Analysis of Adhesively Bonded Cylindrically Curved Lap Joints C. Y. Qian and L Y. Tong

514

Numerical Analysis of the Effect of Interphase on the Deformation of Particle-Reinforced Composites W.X. Zhang, P.P. Yang and T.J. Wang

518

Numerical Finite Deformation Analysis on Solid Propellant Grain using Finite Element Method Y.C. Yang, H.F. Qiang, G.M. Xu and H.S. Zhao

522

Investigation on the Counter-Intuitive Phenomenon of ElasticPlastic Beams Y.M. Liu, G.W. Ma and Q.M. Li

527

Numerical Structural Analysis 2 Computational Material Testing of Pre-Damaged Metals using Damage Mechanics Models Y. Toi and S. Hirose

531

Study of the Influence of the Suspension Parameters on Suspension Kinematics Characteristic D. Hua, Z.M. TaoandX.C. Gao

535

Computational Study of Vapor Pressure Assisted Crack Growth at Polymer/Ceramic Interfaces C.W. Chong, T.F. Guo and L. Cheng

539

Distortion Prediction using Finite Element Method Y.C. Tse, P. Liu, Y.Y. Wang, C. Lu, G.R. Liu and K.P. Quek

543

Interface Pressure Distribution in Automotive Drum Brake A. Tom a, M. Takla and A. Subic

548

A New High Precision Direct Integration Scheme for Nonlinear Rotor-Seal System J. Hua, Z.S. Liu, Q.Y. Xu and S. Swaddiwudhipong

552

XXII

Numerical Structural Analysis 3 Delamination Identification using Piezoelectric Fiber Reinforced Composite Sensors P. Tan and L. Y. Tong

557

A Simple Model for Predication of Crack Spacing in Concrete Pavements G. Chen and G. Baker

561

Hellinger-Riessner Mixed Formulation for the Nonlinear Frame Element with Lateral Deformable Supports S. Limkatanyu

565

Energy Approach to Numerical Modelling of Crack Spacing in Reinforced Concrete G. Chen and G. Baker

569

Effect of Bolt Connections on Dynamic Response of Cylindrical Shell Structures Q.H. Cheng, S. Zhang and Y.Y. Wang

573

Simulation of Ductile Fracture in Tubular Joints Through a Void Nucleation Model X.D. Qian, Y.S. Choo and J.Y.R. Liew

577

Stress Intensity Factors for Doubler-Plate Reinforced Tubular Joint Subjected to Axial Load R. Jiang and Y.S. Choo

583

Computational Dynamics Vibrational Analysis of Poroelastic Bar T.Z. Chen, Z. Zong and K.C. Hung

587

Finite Element Failure Modelling of Corrugated Panel Subjected to Dynamic Blast Loading J.W. Boh, LA. Louca and Y.S. Choo

591

Simulation of Acoustic Radiation and Scattering using Boundary Element Method Z.Y. Yan, K.C. Hung and H. Zheng

595

XXIII

Numerical Characterization of RC Plate Response and Fragmentation Under Blast Loading K.Xu, Y. Lu and H.S. Lim

599

Dynamic Analysis of Brick-Concrete Structure by using Wilson-6 Method DM. Hou, Y.B. Wang, M. Yin andX.Y. Ma

604

A New Computational Mathematical Model of Hydraulic Damper Y.B. Wang and DM. Hou

608

Broadband Echoes from Underwater Targets H. Lew and B. Nguyen

613

High Performance Computing and Numerical Methods 1 Competing Risks for Reliability Analysis Using Cox's Model F.AM. Elfaki, I. Daud, N.A. Ibrahim, M.Y. Abdullah and I. Lukman

618

Parallel Multibody Dynamics using the Message Passing Interface B. Fox, F.J. Welna, D.J. Lilja and L.S. Jennings

622

Some Computation Aspects in Model-Order Reduction of Flexible Structures R. Saragih

626

Meshless Analysis of the Obstacle Problem for Timoshenko Beams Based on a Locking-Free Formulation J.R. Xiao, F. Wang and Q.H. Cheng

632

Efficient Parallel Algorithm for Large-Scale Molecular Dynamics Simulation in Microscale Thermophysics B. Wang, J.W. Shu, WM. Zheng and J. Z. Wang

637

Improving the Cell Mapping Method and Determining Domains of Attraction of a Nonlinear Structural System Q. Ding, Z.S. Liu and J.J. Li

642

XXIV

High Performance Computing and Numerical Methods 2 High Rate Dynamic Response of Structure using SPH Method Z. S. Liu, S. Swaddiwudhipong and C. G. Koh

646

The Generalized Differential Quadrature Rule T.Y. Wu, Y.Y. Wang and G.R. Liu

651

Recovery Based Submodeling Finite Element Analysis H. Gu and Z. Zong

655

A Hierarchical Approach to Surface Partition of Polygonal Meshes J. Shen and D. Yoon

659

A Combined Meshfree Method and Molecular Dynamics in the Multiscale Length Simulation Q.X. Wang, T.Y. Ng, K.Y. Lam, H. Li and X.J. Fan

663

COMPUTATIONAL SCIENCE Computational Science 1 Self-Similar Problems in Multidimensional Conservation Laws S. Canic, B.L. Keyfitz and E.H. Kim

667

Variance Reduction of Monte Carlo Methods for Option Pricing under Stochastic Volitility Models X.Q. Liu and Y.Y. Wong

671

A Superlinearly Convergent Algorithm for Large Scale Multistage Stochastic Nonlinear Programming F.W. Meng, R. Tan and G.Y. Zhao

675

Computation of Network Delay with Prioritised Traffic Involving the Multi-Priority Dual Queue A. Bedford and P. Zeephongeskul

679

Simulation Solutions of Networks with Prioritised Traffic Involving the Multi-Priority Dual Queue A. Bedford and P. Zeephongeskul

683

XXV

Inverse of a Certain Band Toeplitz Matrix K.J. Lim

687

Computational Science 2 Time-Splitting Sine-Spectral Approximation for the Nonlinear Schrbdinger Equations W.Z. Bao Calculating Global Minimizers of a Nonconvex Energy Potential D. Gao and P. Lin

692 696

A QR-Type Method for Computing the SVD of a General Matrix Product/Quotient D.L. Chu

700

Newton's Method for Non-Differentiable Equations: Convergence and Applications D.F. Sun

705

Numerical Solution of Blow-Up Problems Using MeshDependent Variable Temporal Steps K.W. Liang, P. Lin and R.C.E. Tan

709

Nonlinear Boundary Layers of the Boltzmann Equation S. Ukai, T. Yang and S.H. Yu

713

Computational Science 3 A New Algorithm for Division of Polynomials L.H. Fan Ginzburg-Landau System and Superconductivity Near Critical Temperature X.B. Pan Geodesic Approximations of 2D Hydrodynamics W. Lawton Multi-Phase Flow Models and Methods for Lava Lamps and Life Sciences J. Shuo

717

722 726

730

XXVI

A Reynolds-Uniform Numerical Method for Prandtl's Boundary Layer Problem for Flow Past a Plate with Mass Transfer J.S. Butler, J.J.H. Miller and G.I. Shishkin

733

GRID COMPUTING Grid Computing and Related Issues Constructing an Ogsa-Based Grid Computing Platform W. Jie, T.Y. Zang, Z. Lei, W.T. Cai, S.J. Turner and LZ. Wang

738

An Ogsa-Based Directory Service Z. Lei, T. Y. Zang and W. Jie

742

Grid Resource Management Information Services for Scientific Computing H.N. Lim, D.P. Spooner, S.A. Jarvis, G.R. Nudd, LZ. Wang and W. Jie An Open Producer and Consumer Management System for Grid Environment 7". Y. Zang, Z. Lei and W. Jie

746

750

Replica Selection Framework for Bio-Grid Computing LZ. Wang, W.T. Cai, B. Schmidt, B.S. Lee and W. Jie

754

A Grid Testbed Supporting MPI Applications W. Jie, Z. Lei, T.Y. Zang and L.Z. Wang

758

Running MPI Application in the Hierarchical Grid Environment L.Z. Wang, W. Jie and W. Xue

762

Clustering Systems and Applications Cluster-Based Parallel Simulation for Large-Scale Power System Dynamics J.F. Yan, W. Xue, J.W. Shu, X.F. Wang and W. Jie

766

Hardware Impact on Communication Performance of Beowulf Linux Cluster Y. Tang, Y.Q. Zhang and Y.C. Lee

773

XXVII

D-GRIDMST: Clustering Large Distributed Spatial Databases J. Zhang and Y. Cao

777

Massively Parallel Sequence Analysis with Hidden Markov Models B. Schmidt and H. Schroder

781

Tabu Search and Simulated Annealing on the Scheduling of Pipelined Multiprocessor Tasks M.F. Ercan and Y.F. Fung

785

Distributed Systems and Applications TME - A Distributed Resource Handling Tool T. Imamura, Y. Hasegawa, N. Yamagishi and H. Takemiya

789

Protecting Integrity in a Distributed Computing Platform T.T. TayandY.Y. Chu

793

An Integrated Distributed Computing Platform on a Decentralized Architecture T.T. TayandY.Y. Chu Grid Based Problem Solving Environment for Scientists E. Sindhu, U. Periathampy and M. Kantharaj Management of EJB Applications using Java Management Extensions J.K. Park, J.B. Kim and D.J. Sohn

798

803

807

HIGH END COMPUTATIONS Data Mining On Applications of Data Mining to Human Resources Data V. Kamalesh and V. Kuralmani Linguistic Rule Extraction by GA Combining DDR and RBF Neural Networks X.J. FuandL.P. Wang

814

818

XXVIII

Web-Based Configuration and Control of HLA-Based Distributed Simulations N. Julka, D. Chen, B.P. Gan, S.J. Turner and W.T. Cai Competing Risks with Censored Data: A Simulation Study /. Lukman, N.A. Ibrahim, F. Maarof, I. Daud and M.N. Hassan Combining Support Vector Machine (SVR) with Genetic Algorithm (GA) to Optimize the Initial Positions of Agents in the Land Combat Simulation L.J. Cao, K.S. Chua, W.K. Chong, H.P. Lee and L Qian Acquisition of Background Coefficient X.Y. Qi, C. Lu andZ.G. Liu

822 826

830

834

Visualization A Framework for a Real-Time Distributed Rendering Environment H.B. ZhuandK.Y.T. Chan

838

Immersive Visualisation of Nano-lndentation Simulation of Cu S.H. Xu, J. Li, C.H. Li and F. Chan

846

Distributed Processing and Visualisation of MEG Data S. Date, S. Shinji, M.M. Yuko, S. Jie, B.S. Lee, W.T Cai and L.Z. Wang

850

A Fast Algorithm of Level Set Method for 3D Prostate Surface Detection F. Shao, K. V. Ling and W.S. Ng

855

A Pathological Diagnosis System for Brain White Matter Lesions S.H. Han and F. Li

859

Using Streaming SIMD Extension on High Level Image Processing M.F. Ercan and Y.F. Fung

867

An Approach for Optimization of Imaging Parameters for Ground Surface Inspection using Machine Vision V. Sivasankaran, A. Jothilingam, B. Rajmohan and G.S. Kandasami

871

XXIX

MICRO-ELECTRO-MECHANICAL SYSTEMS Model Development and Behavior Simulation of pH-StimulusResponsive Hydrogels H. Li, T.Y.NgandY.KYew

875

Fringe-Field and Ground Plane Effects for Electrostatic MEMS Simulations A. Ongkodjojo and F.E.H. Tay

879

A Coupled Multi-Field Formulation for Stimuli-Responsive Hydrogel Subject to Electric Field Z. Yuan, H. Li, T. Y. Ng and J. Chen

884

Numerical Simulation of Electromechanical Behavior for MEMS Optical Switch F. Wang, C. Lu and Z.S. Liu

888

Design and Modelling High-Efficiency Accelerometers AT. Ng, W.H. Li, H. Du andN.Q. Guo A Finite Element Analysis for Piezoelectric Smart Plates Including Peel Stresses Q. T Luo and L Y. Tong

893

897

MESHING A Study on Three-Dimensional Mesh Generation for Coal Mining Modeling S.G. Chen, S. Craig, D.P. Adhikary and H. Guo

901

Linear and Torsion Spring Analogies for Dynamic Unstructured Meshes in Fluid Structure Interaction Problems - A Comparative Study R. Ajaykumar, N.M. Sudharsan, K. Murali, K. Kumar and B.C. Khoo

905

Solving Biot's Consolidated Model using Sparse Matrix Technology Y.L Li and K. H.Lee

909

XXX

3-D Multi-Block Orthogonal Grid Generated by Laplace Equation with Sliding Boundary Condition Z.K. Zhang

913

Meshing Human Brain with Hexahedral Meshes from Images Slices using Mesh Mapping Method Y.L. LiandK.H. Lee

917

Plenary

This page is intentionally left blank

Multiscale Modeling and Computation of Incompressible Flow T h o m a s Y. Hou*

A b s t r a c t . Many problems of fundamental and practical importance contain multiple scale solutions. Direct numerical simulations of these multiscale problems are extremely difficult due to the range of length scales in the underlying physical problems. Here, we introduce a dynamic multiscale method for computing nonlinear partial differential equations with multiscale solutions. The main idea is to construct semi-analytic multiscale solutions local in time, and use them to approximate the multiscale solution for large times. Such approach overcomes the common difficulty associated with the memory effect and the non-uniqueness in deriving the global averaged equations for incompressible flows with multiscale solutions. It provides an effective multiscale numerical method for computing incompressible flow with multiscale solutions.

1

Introduction

Many problems of fundamental and practical importance have multiple scale solutions. Composite materials, porous media, and turbulent transport in high Reynolds number flows are examples of this type. The direct numerical solution of multiple scale problems is difficult due to the wide range of scales in the solution. It is almost impossible to resolve all the small scale features by direct numerical simulations due to the limited capacity in computing power. On the other hand, from an engineering perspective, it is often sufficient to predict the macroscopic properties of the multiscale systems, such as the effective conductivity, elastic moduli, permeability, and eddy diffusivity. Therefore, it is desirable to develop a method that captures the small scale effect on the large scales, but does not require resolving all the small scale features. In recent years, we have introduced a multiscale finite element method (MsFEM) for solving partial differential equations with multiscale solutions [5, 6, 4, 3]. The central goal of this approach is to obtain the large scale solutions accurately and efficiently without resolving the small scale details. The main idea is to construct finite element base functions which capture the small scale information within each element. The small scale information is then brought to the large scales through the coupling of the global stiffness matrix. Thus, the effect of small scales on the large scales is correctly captured. In our method, the base functions are constructed from the leading order homogeneous elliptic equation in each element. As a consequence, the base functions are adapted to the local microstructure of the differential operator. In the case of two-scale periodic structures, we have proved that the multiscale method indeed converges to the correct solution independent of the small scale in the homogenization limit [6]. We remark that the idea of using base functions governed by the differential equations has been used in the finite element community, see e.g. [1, 8, 2]. "Applied Mathematics, 217-50, Caltech, Pasadena, CA 91125, USA. Email: [email protected]. Research was in part supported by a grant DMS-0073916 from the National Science Foundation.

3

4 2

Multiscale F i n i t e E l e m e n t M e t h o d

In this section, we briefly describe the multiscale finite element method for elliptic problems with highly oscillating coefficients: Leu := - V • ( a ( - ) V u ) = /

in fi, u = 0 on T = DQ,

(1)

where a(x) = (a,ij(x/e)) is a symmetric positive definite matrix. The main idea of MsFEM is to construct finite element base functions which capture the small scale information within each element. This is accomplished by requiring that the base functions satisfy the leading order homogeneous differential equation within each coarse grid element: Le<j>€ = 0,

x

eK,

with some appropriate boundary condition for cff in the boundary of the local element K. The choice of boundary conditions in defining the multiscale bases will play a crucial role in approximating the multiscale solution. Intuitively, the boundary condition for the multiscale base function should reflect the multiscale oscillation of the solution u across the boundary of the coarse grid element. The simplest choice of the boundary condition for the base function is a linear boundary condition. Using homogenization theory, we can show that the multiscale finite element method gives a convergence result uniform in e as e tends to zero. This is the main feature of this multiscale finite element method over the traditional finite element method. T h e o r e m 2.1 Let u 6 H2(ft) be the solution of (1) and «/, be the finite element solution obtained from the space spanned by the multiscale bases, 0 £ . Then we have \\u - uh\\Hi < C(h + e)\\f\\Li +c(^y'2\\u0\\H2, where «o £ H2(Q) n W 1,0 °(fi) is the solution of the homogenized 2.1

(2)

equation.

Over-Sampling

As we can see from the above theorem, the multiscale FEM indeed gives correct homogenized result as e tends to zero. This is in contrast with the traditional FEM which does not give the correct homogenized result as e —* 0. The error would grow like 0(h2/e ) . On the other hand, we also observe that when h ~ e, the multiscale method attains large error in both H1 and L2 norms. This is what we call the resonance effect between the grid scale (h) and the small scale (e) of the problem. Motivated by our convergence analysis, we propose an over-sampling method to overcome the difficulty due to scale resonance [5]. The idea is quite simple and easy to implement. Since the boundary layer in the first order corrector is thin, O(e), we can sample in a domain with size larger than h + t and use only the interior sampled information to construct the bases; here, h is the mesh size and e is the small scale in the solution. By doing this, we can reduce the influence of the boundary layer in the larger sample domain on the base functions significantly. As a consequence, we obtain an improved rate of convergence.

5 2.2

F I N E SCALE RECOVERY

To solve transport problems in the subsurface formations, as in oil reservoir simulations, one needs to compute the velocity field from the elliptic equation for pressure, i.e v = —a e Vp, here p is pressure. For MsFEM, the fine scale velocity can be easily recovered from the multiscale base functions, noting that they provide interpolations from the coarse grid to the fine grid. To test the accuracy of the recovered velocity and effect of small-scale velocity on the transport problem, we have performed two sets of computations. In the first computation, we reconstruct the fine scale velocity field (1024 by 1024 grid) from a coarse grid (64 by 64 grid) pressure computation, and use the reconstructed fine scale velocity field to transport the saturation in the two-phase flow. In the second computation, we compute both the pressure and the saturation using a fine grid (1024 by 1024). To demonstrate that we can recover the fine grid velocity field from the coarse grid pressure calculation, we compare the velocity fields obtained by the two approaches. The agreement is excellent. The recovered velocity field captures very well the layer structure in the fine grid velocity field. Moreover, we observe that the agreement in the saturations obtained by the two approaches is striking. 3

M u l t i s c a l e M o d e l i n g of I n c o m p r e s s i b l e F l o w

In order to obtain a coarse grid model for two-phase flow, we need to derive a homogenized equation for the saturation equation which is hyperbolic. This is a very difficult problem due to the nonlocal memory effect [10]. The difficulty encountered here is similar in spirit to that in obtaining a large scale averaged equation for the incompressible Euler or Navier Stokes equations. One of the earliest attempts in deriving the homogenized equation for the incompressible Euler equation was made by McLaughlin, Papanicolaou and Pironneau in [9]. More specifically, they considered the following initial value problem:

ut + (u- V)u = -Vp, with V-u = 0 and highly oscillatory initial data u(x,0) = U{x) + W{x,x/e). They then constructed multiscale expansions for both the velocity field and the pressure. In doing so, they made an important assumption that the microstructure is convected by the mean flow. Under this assumption, they constructed a multiscale expansion for the velocity field as follows: u £ (x,t) =u(x,t)

+ w(

v

' ,-,x,t)

+ eu1{

v

,-,x,t)

+0(e2).

The pressure field p £ is expanded similarly. From this ansatz, one can show that 6 is convected by the mean velocity: 8t + u-V8 = 0, 8(x,0)=x. It is a very challenging problems to develop a systematic approach to study the large scale solution in three dimensional Euler and Navier-Stokes equations. The work of McLaughlin, Papanicolaou and Pironneau provided some insightful understanding how small scales interact with large scale and how to deal with the closure problem. However, the problem is still not completely resolved since the cell problem obtained this way does not have a unique solution. Additional constraints need to be enforced in order to derive a large scale averaged equation. With additional assumptions, they managed to derive a variant of the k — e model in turbulence modeling. Recently, together with Dr. Danping Yang [7], we have studied convection of microstructure of the 3-D incompressible Euler equations using a new approach. We found that the small scales

6 are convected by the full velocity field, i.e. it is more appropriate t o expand the velocity field as follows: ul{x,t) =u(x,t)+w(

v

,-,x,t)

+eiti(

,-,x,t)

+0(e2).

where 9t£ + u £ - V « £ = 0 ,

ee{x,0)=x.

Thus, the oscillatory part of 9e could have order one contribution t o the mean velocity of the incompressible Euler equation. By using a = 9e as a new variable, we can reformulate the Euler equation in (a, t) and perform multiscale expansion for the stream function and the flow map in terms of a/e. This change of variables amounts t o using a Lagrangian formulation of the Euler equation. Using this reformulation, we can derive both the homogenized equation and the periodic cell problem. The resulting problem is well-posed provided that we project certain resonance modes in the velocity field. We have also generalized this multiscale analysis to initial velocity field that has many or even continuous spectrum of scales. Similar analysis can be carried out for the two-phase flow with highly oscillating or random permeability field. References [1] I. Babuska, G. Caloz, and E. Osborn, Special Finite Element Methods for a Class of Second Order Elliptic Problems with Rough Coefficients, SIAM J. Numer. Anal., 31 (1994), 945-981. [2] F . Brezzi, L. P. Franca, T. J. R. Hughes and A. Russo, b = J g, Comput. Methods in Appl. Mech. and Engrg., 145 (1997), 329-339. [3] Z. Chen and T. Y. Hou, A Mixed Finite Element Method for Elliptic Problems with Rapidly Oscillating Coefficients, Math. Comput., published electronically on June 28, 2002. [4] Y. R. Efendiev, T. Y. Hou, and X. H. Wu, Convergence of A Nonconforming Element Method, SIAM J. Numer. Anal., 37 (2000), 888-910.

Multiscale

Finite

[5] T. Y. Hou and X. H. Wu, A Multiscale Finite Element Method for Elliptic Problems in Composite Materials and Porous Media, J. Comput. Phys., 134 (1997), 169-189. [6] T . Y. Hou, X. H. Wu, and Z. Cai, Convergence of a Multiscale Finite Element Method for Elliptic Problems With Rapidly Oscillating Coefficients, Math. Comput., 68 (1999), 913-943. [7] T. Y. Hou and D.-P. Yang, Multiscale Analysis for Three-Dimensional Equations, in preparation, 2002.

Incompressible

Euler

[8] T. J. R. Hughes, Multiscale Phenomena: Green's Functions, the Dirichlet-to-Neumann Formulation, Subgrid Scale Models, Bubbles and the Origins of Stabilized Methods, Comput. Methods Appl. Mech Engrg., 127 (1995), 387-401. [9] D. W. McLaughlin, G. C. Papanicolaou, and O. Pironneau, Convection of Microstructure Related Problems, SIAM J. Applied Math, 45 (1985), 780-797.

and

[10] L. Tartar, Nonlocal Effects Induced by Homogenization, in P D E and Calculus of Variations, ed by F . Culumbini, et al, Birkhauser, Boston, 925-938, 1989.

Splitting Methods for Incompressible Viscous Flow Problems with Free and Moving Boundaries

E. J. Dean*, R. Glowinski*, L. H. Juarez*, T.W. Pan* *Department of Mathematics, University of Houston, Houston, Tx 77204-3008 t Departamento de Matematicas, Universidad Autonoma Metropolitana-I, Mexico, D. F., CP 09340 Email: [email protected], [email protected], [email protected], [email protected]

The main goal of this presentation is to discuss an (almost) unified approach for the computational treatment of incompressible viscous flow problems with free or moving boundaries. The methodology to address the solution to these problems relies on the following ingredients (several of them fairly classical): (i) Time discretization by operator-splitting. (ii) Space approximation by finite element methods to take advantage of variational formulations of the existing above problems. (iii) In the case of flow with moving rigid boundaries, volume distributed Lagrange multiplier based fictitious domain methods to solve the flow equations on a fixed space region. This approach has proved efficient for the simulation of particulate flow with many particle3s (> 102, typically). (iv) Wave-like equation treatment of the advection. (v) H1 or L2-projection methods for the treatment of the incompressibility. We shall validate the above methodology through the solution of test problems in two and three-dimensions these problems include: the settling and interaction of spherical and ellipsoidal rigid bodies in Newtonian, Visco-Elastic, and Visco-Plastic fluids (see attached pictures) and the simulation of an incompressible viscous flow with surface tension on the free boundary.

7

8

?z!st8mmMam&s&!>

silfHBlSgSt iiiiffliHBg

1

0

Figure 1: Settling of a Rigid Ellipsoid in an Newtonian Incompressible Viscous fluid.

9

Figure 2: Settling of a Rigid Ellipsoid in an Incompressible Oldroyd-B Visco-Elastic Fluid.

10

-

L t=12.14

Figure 3: Settling and interaction of Two Rigid Ellipsoid in a Newtonian Incompressible Viscous fluid.

11

IDul

1.8

1.6

1.4

1.2

0

0.2

0.4

0.6

0.8

1

F i g u r e 4: Settling of rigid disks in an incompressible Bingham Visco-Plastic Fluid.

This page is intentionally left blank

Contributed Papers

This page is intentionally left blank

CATION-7C INTERACTIONS IN AG(I)-SUBSTITUED ALKYLBENZENES COMPLEXES: A THEORETICAL STUDY

;

Y. P. WONG, K. M. NG, AND C. W. TSANG* Department of Applied Biology & Chemical Technology, Hong Kong Polytechnic University E-mail: [email protected] (*contactperson)

Introduction The binding of Ag (I) cation to alkylbenzenes (RBz) is chosen as model systems to investigate the alkyl substituent effects in cation-7i interactions, which have been postulated to be a new type of interaction forces important in biological recognition, nano-electronics and the design of therapeutic drugs.1

Experimental High-energy (4.7 keV, He as collision gas) and low-energy (15-45eV, Ar as collision gas) collision-induced dissociations (CID) of electrospray ionization (ESI) generated [RBz,Ag-RBz2]+ heterodimers (where RBz! and RBZ2 are alkylbenzenes) were measured with a Finnigan-MAT 95S B-E tandem mass spectrometer with B/E linked scans, and a Micromass Quattro II triple quadrupole mass spectrometer, respectively.

Theoretical Methodology Standard ab initio molecular orbital calculations were carried out by using the Gaussian 98 package on SGI Indigo2/Octane workstations at the approximate CCSD(T)/(HW(f), 631G(d)) level of theory using the additivity relation as below. CCSD(T)/(HW(f), 6-31G(d)) s CCSD(T)/STO-3G(d) + [MP2/(HW(f), 631G(d)) - MP2/STO-3G(d)]

Results and Discussion The relative Ag (I) ion binding energies (affinities) of 15 alkylbenzenes were measured by the standard mass spectrometric kinetic method2 based on the competitive dissociation of the [RBzi-Ag-RBz2]+ heterodimers to their respective monomer complexes under high-

15

16 Alkylbanzanas

In { fct*0-R*«i]* / A [ A B - R B « 2 ] * )

Bwizen*

0.00 0.97 1 0.06

Ethyl benzene

1.63 * 0.07 (0.31)

m-Xylerm

1.94 i 0.10

n-Propytbenzene

216 ' 0 . 1 0

I-Propyttwnzene

(0.04)

| V :-

2.20 * 0.11

n-Butytbsnztfrw

2.51 t 0 . l l

i-Butyl b o n z e no

2.59 i 0.12 2.68 - 0.13

s-Butyibenzsn* (0.02) t-Butylbenzene

—*"

1,3,5-Triroethylbenzene

2.70 10.15 2.87*0.15 3.11 +0.15

n-PerrtytmfiZBfW 0.50

Figure 1

1,2,4,5-Tatram«thy(banzsn»

3.61 tO. 15

Pantamethylbanzene

4.34 ±0.15

H«xamethyl benzene

4.88 - 0 16

Experimentally measured /n(k|Ag-RBZl]+ / kiAg-RB^]*) values for He-CID of Ag* bound heterodimers of alkylbenzenes. The /n(k|Ag.RBz1]+ / k[Ag-RBz2]+) values are taken to be equal to /n([Ag-RBzi]+ / [AgRBz2]+) ion intensity ratio values. The data presented under the heading /w(k[Ag.RBz]+ / k[Ag-Benzoie]+) are average ± standard derivation of cumulative values expressed relative to benzene. The values in parenthesis are the arithmetic difference of measured values having a common reference alkylbenzene.

Table 1 Experimental measured and theoretical calculated absolute Ag+ affinities of alkylbenzenes at OK. Alkylbenzenes

Theoretical Affinities^

Experimental Affinities

(kJ mol" )

(kJ mol"1)

Benzene

156.7

157.7

Toluene

168.8

169.4

Ethylbenzene

172.9

177.4

m-Xylene

181.2

181.2

n-Propylbenzene

—

183.8

i-Propylbenzene

—

184.3

n-Butylbenzene

—

188.1

i-Butylbenzene

—

189.0

s-Butylbenzene

—

190.1

t-Butylbenzene

1

190.4

17

1,3,5-Trimethylbenzene

193.4

192.4

—

195.3

200.6

201.4

Pentamethylbenzene

—

210.2

Hexamethylbenzene

209.4

216.8

n-Pentylbenzene 1,2,4,5-Tetramethylbenzene

a

Calculated at the CCSD(T)/(HW(f), 6-31G(d)) level. b Obtained from their relative Ag (I) affinities measured

from the high-energy CID.

Table 2 Experimental Ag+ afffinities at OK (kJmol"1) of acetone and acetonitrile. This worka AH [ A g-L n ]+

A(AS)app

RA Kinetics

Threshold-CID

Acetone

148.6+1.8(3.4)

0.6 + 0.8(1.5)

160±19 b

—

Acetonitrile

162.8 ±1.0 (1.9)

6.7 + 0.7(1.3)

—

162c

L„

a

Literature values

Weighted results with experimental uncertainties given as + S.D. (90% confidence interval). " Ref [4]. c Ref

[5].

Acknowledgement The award of a Hong Kong Polytechnic University research studentship and graduate scholarship to YPW and the funding support of the Research Grant Council of Hong Kong (Area of Excellence Project No. P-10/2001 and CERG Project No. 5190/00P) are gratefully acknowledged.

18 energy He-CID conditions. The experimental measured ion intensity ratio values, /n([AgRBz]]+ / [Ag-RBz2]+), are summarized in a relative Ag (I) ion affinity ladder diagram shown in Figure 1. Theoretically calculated values for benzene, toluene, ethylbenzene, mxylene, 1,3,5-trimethylbenzene, 1,2,4,5-tetramethylbenzene, pentamethylbenzene were calculated with an estimated uncertainty of ±15 kJ mol"1.3 Based on these theoretical calculated values, the experimental absolute affinities of the 15 alkylbenzenes (in kJ/mol) were established from their relative affinity ladder values and summarized in Table 1. These experimental absolute values were further validated by using them as reference values in extended kinetic method2, measurement of Ag (I) ion affinities of acetone (148.9 kJ mol"1) and acetonitrile (162.8 kJ mol"1), which were in good agreement with that obtained by radiative association kinetic analysis (160 + 19 kJmol" ) and the thresholdCID method (162 kJ mol"1)5, respectively. Our results show that the Ag(I) ion affinities of alkylbenzenes increase significantly with the number of methyl substitutions, the chain length of n-alkyl substituents, and the extent of branching in alkyl substituents.

The results are attributed to the positive

inductive effect and polarizability effect of the alkyl substituents, leading to enhanced ion-quadrupole and ion-induced dipole interaction between the Ag(I) ion and the aromatic ring.

Conclusions The absolute Ag (I) ion affinities of 15 alkylbenzenes were successfully established based on the theoretical calculated affinities. The alkyl substituents have been found to have a pronounced effect on Ag(I) ion affinities of alkylbenzenes.

References 1. 2. 3. 4.

Ma, J.C.; Dongherty, D.A. Chem. Rev. 1997, 97, 1303. Cooks, R.G.; Wong, P.S.H. Ace. Chem. Res., 1998, 31, 379. Ma, N. L. Chem. Phy. Lett., 1998, 297,230. Ho, Y. P.; Yang, Y. C ; Klippenstein, S. J.; Dunbar, R. C. J. Phy. Chem. A, 1997, 101, 3338. 5. Shoeib, T.; EI Aribi, H.; Siu, K. W. M.; Hopkinson, A. C. J. Phys. Chem. A 2001, 705,710.

Light Propagation in Biological tissue using Monte Carlo Simulation Aslmam

Aggarwal

Lecturer, Department of Electrical and instrumentation Engineering Sant Longowal Institute ofEngineering and Technology. Longowal. Sangrur, Punjab. INDIA Email: [email protected]

Non-invasive diagnosis is medicine has shown considerable attention in recent years. Several methods are already available for imaging the biological tissue like X-ray computerized tomography, magnetic resonance imaging and ultrasound imaging etc. But each has its own disadvantages. Optical tomography is one of the emerging methods in the field of medical imaging which is non-invasive in nature. The only problem that occurs in using light for imaging the tissue is that it is highly scattered inside tissue, so the propagation of light in tissue is not confined to straight lines as the case with X-ray tomography. So the need arises to understand the behavior of propagation of li^it in tissue. There are several methods for light interaction with tissue These methods can be divided into two categories viz. deterministic methods and stochastic methods. Deterministic methods involve the solution of radiative transfer equation (RTE) which is complex integrodifferential equation and is difficult to solve. Stochastic methods model the RTE through dealing individual interaction. The individual interactions are modeled explicitly by deriving probability density functions in random walk and Markov random field models. Monte Carlo method is stochastic method which is simple technique for simulation of light through tissue, m the Monte Carlo methods individual history of photons is simulated as they undergo scattering and absorption events in the biological tissue. This is continued until the photon is either absorbed or is emitted from the boundary of the tissue. The data obtained from the Monte Carlo simulation is the number of photons detected at different times which is known AS temporal point spreadfunction^TPSF). This TPSF is used by nonlinear optimization techniques to get the reconstruction of image.

1. Introduction Monte Carlo techniques1 are used to study the evolution and study state behavior of phenomenon, which are essentially stochastic in nature. Simulation of propagation of NIR light through tissue using Monte Carlo method is a conceptually simple technique. Photon packets are launched from the source and as this traverse through the medium, if the packet hits the position which is appointed as a detection point, photon packet is logged into a result matrix. Eventually, when a large number of photons have been detected, the average path length traversed can be obtained from the distribution of actual path lengths traversed, obtained from the detected photons. In practice, however a number of difficulties arise, all of which have their origin in the computationally time consuming nature of Monte Carlo method. Since we are only interested in photons that actually hit the detector, a lot of computation done is waste. The faction of photons that contribute depend on the optical properties2 of the simulated medium and the geometry, the distance between the source and the detector, but under very favorable conditions this fraction is no more than 1:100 which means that the total number of photon paths that have to be calculated is typically several millions for good results. As the distance of detector increases, the number of detected photon packets decreases rapidly. At some distance it is no longer useful to perform calculations, because the statistics become very poor and uncorrelated. Computation can be decreased by taking advantages of symmetries. In case of cylindrical geometry, if the homogeneity is taken co-axial with the background (in 2-D, concentric circles), at each angle, same data can be used. The finite size of detector has also to be considered. The simplest way of implementing detection is to assign detector status to a certain pixel, and thus making the detector size equal to the pixel size. It is tempting to make detector size large, in order to obtain a higher faction of detected photons. In reality, detector will be determined by the diameter of optical fiber used. 2. Monte Carlo Method Light beam is considered to consist of many packets of photons that behave like particles subject to scattering and absorption3 in tissue. The path of each packet is traced until it escapes from the medium, and a large number of packets are launched so that the statistical averages (such as mean, variance, autocorrelation) etc. are consistent, goingfromone estimate to other, and are meaningful. 2.1 Photon Initialization Monte Carlo begins by launching a photon packet into the tissue. The size of this packet is called weight (W) of the photon. Its initial weight is set to unity. The photon s initial direction is chosen downwards into the tissue. The coordinates of the photon are identical for all photons. At each step, some portion of the packet of photons would be absorbed and so the weight reduces after every step.

19

20

2.2 Generating Step size The simplest Monte Cario method propagates each photon with fixed incremental step sizes. The fixed step size As must be small relative to the average mean path length of a photon in the tissue. The mean free path length is the reciprocal of the total attenuation coefficient4. (2.2.1) As — = ft

H.+H,

Where /J,,M, a n d / i a als ^c t o t a l attenuation, the absorption and the scattering coefficients respectively. If the step size is too small the photon will rarely interact with tissue and the Monte Carlo method will be inefficient, conversely if the step size is too large, the distance traveled by a photon is a poor approximation to that of an actual photon traveling through tissue. A much more efficient method chooses a different step size for each photon step. The probability density function5 for the step size follows Beer's law 6(i.e. it is more likely for a photon to travel a short distance than a long distance and the probability that the step size is As is proportional to ^n,^3 ). A function of random number § uniformly distributed between zero and one which yields a random variable' with this distribution is:

A,

.

(2- 2 - 2 )

_>!fe) M,

The stepsize As found using equation (2.2) represents the distance that a photon will travel before interacting (through absorption or scattering) with the tissue.

2.3 Changing Photon Direction When the photon interacts with the tissue, it gets scattered to a new direction, in addition to weight loss owing to absorption. The new direction in 3-D is defined by two angles, one an azimuthal and the other longitudinal. Following Mie theory8 a normalized phase function9 describes the probability for these azimuthal and longitudinal angles for a photon when it is scattered Scattering in time is characterized by the so-called Henyey Greenstein phase function10, which gives the azimuthal angle as: OOS0

1 2g

1+X+r

l-g

(2.3.1)

l-g+2g£]

Where g is called anisotropic factor" and for tissue its value is taken nearly equal to 0.9. If scattering is isotropic12 (g — 0 ) , the following equation for azimuthal angle is used cosO

2f-l

(2.3.2)

21

I

»• -

—

•

-

\»»«

-

/

g-0

/

~~^- —

;

2.4 Moving the Photon The moving photon is uniquely described by three Cartesian coordinates for spatial position and the three direction cosines for direction of travel. For a photon located at (x,y,z) traveling a distance in the direction specified by the direction cosines14, the new coordinates are given by: y

=

x

+

M*-&3

y

=

y

+

ti,Ja

z'

=

z

+

/*,-AJ

( 2 -4.1)

In 2-D Monte Carlo method, for cylindrical geometry, the photon is traced in 2-D plane. So location is specified by (x,_y) and for moving the photon from location (xi, J ] ) to(x2,.V2)> ^ ^ size and angle of scattering is generated. The new location is given by: xi y2

= =

n + ^cosff y, + ,scos0

(2.4.2)

2.5 Photon Absorption After each propagation step, the photon packet is split into two parts, a fraction is absorbed and the rest is scattered. The fraction of the packet that is absorbed is: fraction

absorbed=^S. ft

^a_ fa+fs

=

x

^s fa + fs

x_a

(2-5.1)

Where a is called albedo constant Consequently, a new photon weight W is given by , which represents the traction of the packet that is scattered in this step. An absorption event requires both the location and the amount of light absorbed be recorded. For example, the appropriate element of the absorption matrix is incremented by (l-a)W .The number of bins in the absorption matrix is determined by spatial resolution required. Increasing the number of entries increases the spatial resolution, but also increases the computation. 2.6 Photon Termination The weight of photon packet will never reach zero and propagating a photon with very little weights yields no useful information. But if we discard all the photons when their weight drops below a specified

22

minimum (e.g. 0.001), violates energy conservatioa The Roulette technique gives such photons (of weight W) one chance in m ( eg 10) of surviving with a weight mW or else its weight is reduced to zero. So the photon is terminated in an unbiased fashion without sacrificing energy conservation and without continuing propagation until its weight has reduced to zero. 3. Simulation Result As a result of simulation, following temporal point spread function is obtained The TPSF is obtained for all the source and detector positions and for all the orientations which is used by non-linear optimization techniques for reconstruction of image

t20B

|

BOP

000

400 -

Sw

»•

«o

«

sw

so*

too

References: [1] Malvin Kalos and Paula A- Whftlock. Monte Carlo Methods Volume 1: The Basics. Wiley, New York,

1986.

[2] W. F. Cheong, S. A. PrahL, and A. J. Welch, "A Review of the Optical Properties of Biological Tissues," IEEE J. Quantum Electronics, 26,2166-2185 (1990) [3] Simon R Arridge and Jeremy C Hebden, Optical imaging in medicine: II. Modelling and reconstruction, Phys. Med. Biol. 42 (1997)841-853. [4] K.R. Czerwmski, M. FoDcert, W. Thilly and E. Gostjeva, Towards a Scientific Basis for Estimatmg Expected Health Effects from Low Radiation Exposures Nuclear Engineering Department, MIT Center for Environmental Heafth Sciences, MTT [5] B P Flannery, SA Teukolsky and WT Vetterling, Numerical redpies in C, Cambridge University Press. [6]. F Alberto Graubaum and Jorge P Zubelli, Difiuse Tomography: Compu&ional aspects of the isotropic case, Inverse Problems (1992), 421-423. [7]. S A Prahl, M Keijzer, S L Jacques, A J Welch, A Monte Carlo method of light propagation in tissue, SPIE Institute series, Vol. IS 5 (1989). [8]. S R Arridge, M Cope, D T Delpy, The theoretical basis for the determination of optical path lengths in tissue: Temporal and frequency analysis, Physics in Medicine and Biology, 1992, Vol. 37, No.7,1531-1560 [9]. R. A. Forester and T. N. K. Godfrey, "MCNP—a general Monte Carlo code for neutron and photon transport,"m Methods and Applications in Neutronics, Photonics and Statistical, pp. 33-47, New York: Springer-Verlag, 1983. [10]. Y Yamada, light Tissue interaction and optical imaging in Biomedicme, Annual House, Inc, 1995.

Review of Heat transfer, Vol. 6, Begell

[11]. Stephen T Flock, Michael S Patterson, Monte Carlo modeling of light propagation in highly scattering tissue-I: Model predictions and comparison with diffusion theory, IEEE Transactions on Biomedical Engineering, Vol 36, No. 12, December 1989. [12]. M. Mrwa and Y. Ueda, "Development of time-cesolved spectroscopy system for quantitative noninvasive tissue measurement," in Optical Tomography, Photon Migration, and Spectroscopy of Tissue and Model Media, B. Chance and R.R. Alfano, eds., Proc. SPIE, voL 2389, pp. 142-149,1995.

PREDICTING MATERIALS PROPERTIES USING FIRST PRINCIPLES ELECTRONIC STRUCTURE CALCULATION

Y. P. FENG, A. T.-L. LIM, AND J. C. ZHENG Department of Physics, National University of Singapore 2 Science Drive 3, Singapore 11754.2 E-mail: [email protected] Ab initio electronic structure theory has achieved a considerable level of reliability concerning predictions of structures and properties of materials. It provides understanding of matter at the atomic and electronic scale with an unprecedented level of detail and accuracy. The state-of-the-art ab initio electronic structure calculation methods based on the density functional theory and local density approximation is reviewed briefly and their applications in solids is illustrated using C3P4 as an example. Motivated by the success of first principles method in predicting properties of C3N4 and related materials, we use the density functional theory and pseudopotential method to explore the possibility of forming C3P4 crystal and to investigate its properties. Our calculations predict that the structure of C3P4 is quite different from that of similar material such as C3N4 and the pseudocubic structure is extremely stable compared to other possible structures. Our calculation also indicates that C3P4 can be possibly metallic or a narrow gap semiconductor.

1

Introduction

It has been almost a century since quantum mechanics was developed. This revolutionary theory is the foundation of modern physics and is indispensable to the understanding of atoms, molecules, atomic nuclei, and aggregates of these. Although quantum mechanics could in principle allow us to solve for any system but direct application of quantum mechanics to real materials was difficult because analytic solutions of the Schrodinger equation were only possible for a few simple systems such as a square potential well, a harmonic oscillator and the hydrogen atom. This was summarized by Dirac in his famous statement, The underlying physical laws necessary for the mathematical theory of a large part of physics and the whole of chemistry are thus completely known, and the difficulty is only that exact application of these laws leads to equations much too complicated to be soluble. In the next half century following the establishment of quantum physics, various approximation methods were developed. However, direct solutions of the Schrodinger equation for real materials were impossible. It was not until the mid 1960s before an important break-through was finally made by Hohenberg and Kohn 1 and Kohn and Sham 2 who developed the density functional theory (DFT). Hohenberg and Kohn proved that the total energy, including exchange and correlation, of an electron gas (even in the presence of a static external potential) is a unique functional of the electron density. The minimum value of the total-energy functional is the ground-state energy of the system, and the density that yields this minimum value is the exact single-particle ground-state density. Kohn and Sham then showed that it is possible to replace the many-electron problem by an exactly

23

24

equivalent set of self-consistent one-electron equations. An very important effect of a many-electron system is the exchange and correlation, and it is extremely difficult to calculate the correlation energy of a complex system. D F T provided some hope of a simple method for describing the effects of exchange and correlation in an electron gas. However, direct applications of D F T to real materials were still limited until the 1980s due to the demanding requirement on computational resources in order to solve the Schrodinger equation self-consistently for a given system. In the 1980s, it was the emergence of efficient algorithms and advances in computing power that finally made the direct applications of electronic structure calculation to real materials feasible. Noticeable algorithmic improvements included the molecular dynamics method by Car and Perrinello 3 , and direct minimization of the Kohn-Sham energy functional proposed by Payne et al.4. First principles calculation using DFT and particularly the local density approximation have proved to be a reliable and computationally tractable tool in materials science. It is now possible to simulate systems containing 100 or more atoms in a unit cell. The main advantage of the first principles method is that it requires no experimental parameters. Therefore, it is very powerful in predicting properties of new materials and is ideal for study of new materials. 2

D e n s i t y Functional T h e o r y

For a non-relativistic many-electron system, the Kohn-Sham total-energy functional for a set of doubly occupied electronic states ipi can be written as E[{^i}]

= 2^2

J in

+y /

V2Ad3f+

- ^

^^]
J

+ Exc[n(r)}

Vion(r)n(r)d3f +

Eion[{Rr}]

where Elon is the Coulomb energy associated with interactions among the nuclei at positions {Ri}, V\on is the static total electron-ion potential, Exc is the exchangecorrelation functional, and n(f) is the electronic density given by

n(r-)=2£>,(r0|2. i

The D F T variational principle yields the Kohn-Sham equation h2 2m ^

+

^on(^

+ V

" ^

+

Vx0

^

where ipi is the wave function of electronic state i, £i is the Kohn-Sham eigenvalue, and VJJ is the Hartree potential of the electrons given by J

\r-f'\

The exchange-correlation potential, Vxc is given formally by the functional derivative

= "WO! xcy

'

6n(f)

25 The Kohn-Sham equations must be solved self-consistently. But one problem is that the exact exchange-correlation functional is not known. For realistic calculations, various approximations were used. The simplest of such approximations is the local-density approximation (LDA) proposed by Kohn and Sham 2 . It is constructed by assuming that the exchange-correlation energy per electron at a point f in the electron gas, exc(r), is equal to the exchange-correlation energy per electron in a homogeneous electron gas that has the same density as the electron gas at point f. T h a t is Exc[n(r)]

= /

eXc(r)ri(r)d3r.

The LDA has been remarkably successful in describing the ground-state properties of a large range of physical systems. 3

P r e d i c t i n g S t r u c t u r e s and P r o p e r t i e s of Solids

One very important application of the first principles electronic structure calculation is to predict the structural form of a material. This involves calculation of the quantum-mechanical total energy of the system and subsequent minimization of that energy with respect to the electronic and nuclear coordinates. Because of the large difference in mass between the electrons and nuclei and the fact that the forces on the particles are the same, the electrons respond essentially instantaneously to the motion of the nuclei. Therefore, the nuclei can be treated adiabatically, leading to a separation of electronic and nuclear coordinates in the many-body wave function, or the Born-Oppenheimer approximation. According to the molecular dynamics method, a Lagrangian which includes the positions of the ions and the coordinates that define the size and shape of the unit cell as dynamical variables can be defined. The force on an ion can be calculated following the Hellmann-Feynman theorem 5 , 6 . Local energy minima of the ionic system can be located based on the Hellmann-Feynman forces. The ions are moved along the directions of the Hellmann-Feynman forces, for each ionic configuration, a self-consistent electronic structure calculation is carried out to quench the electrons to the Born-Oppenheimer surface. The forces on the ions are calculated again and the procedure is repeated, until the residual forces on all the atoms become smaller than a given tolerance. By performing this zero-temperature quench from a variety of initial configurations of the ionic system, one can obtain information about the local energy minima of the system. But there is no guarantee that this will locate the global energy minimum of the system. While there is no systematic procedure to find the global energy minimum for a given system, other information about the system may be useful for determining the structure of the system. For example, there may be a finite number of possible structures for a given system and the structure corresponding to the global energy minimum can be identified by comparing the total energies of all possible structures. Once the structure of the system is determined, other properties such as electronic property, optical property, etc. can be computed from the self-consistent electronic density.

26

4

Structure and P r o p e r t i e s of H y p o t h e t i c a l C3P4

As an illustration of materials science problems that can be investigated by ab initio electronic structure calculations, we present a study of hypothetical alloy C3P4 7 . Motivated by the success of first principles electronic structure calculation in prediction of super hard C3N4 8 ' 9 ' 1 0 ' 1 1 , we decided to investigate the structure and properties of a similar compound, C3P4. C3P4 can be obtained by substituting N for P in C3N4. Since both C and P are group V elements in the periodic table, one can expect C3P4 to have properties similar to C3N4. Trend analysis of related materials suggests that C3P4 could have a wide band gap and could be a relatively hard material. Nevertheless, it is of interest to know whether C3P4 can form a stable alloy. And if it does, what is the possible structure? What physical properties does it have? Five possible structures were considered with stoichiometry C3P4. They are Q-C3P4, /3-C3P4, cubic-C3P4, pseudocubic-C3P4 and graphitic-C3P4 (see Figure 1). We have performed ground state total-energy calculations using the pseudopotential plane-wave method based on density-functional theory in the local density approximation for exchange and correlation. The Vanderbilt ultrasoft pseudopotentials were used. The wave functions are expanded into plane-waves up to an energy cutoff of 310 eV. Special k points generated according to Monkhorst-Pack

(a)

(b)

(d)

(c)

(e)

Figure 1. Unit cell of (a) a - C 3 P 4 , (b) /3-C 3 P 4 , (c) graphitic-C 3 P 4 , (d) cubic-C 3 P 4 , and (e) pseudocubic-C 3 P 4 . The C atoms are shown using small spheres while the P atoms are shown using large spheres. All structures are fully optimized within the preselected space groups.

27 scheme 1 were used for integration over the irreducible wedge of the Brillouin zone for the various structures. Appropriate numbers of points were used to achieve good convergence. We also optimized each crystal geometry within the preselected space groups. Results obtained from our calculation were surprising. Unlike C3N4, the calculations predict that pseudocubic-C3P4 is energetically favoured relative to other phases. In particular, OJ-C3N4 and /9-C3N4 are the stable phases for C3N4 but aC3P4 and /J-C3P4 are unstable. The stability of the pseudocubic phase of C3P4 is quite exceptional, the total energy per C3P4 unit of pseudocubic-C3P4 is almost 2 eV lower than that of the next most energetically favoured structure, i.e. o>C3P4. Not only is the pseudocubic phase exceptionally stable, it also has the smallest volume per C3P4 unit and therefore the highest density among all the phases investigated. This implies that it is impossible for pseudocubic-C3P4 to undergo a phase transition under pressure. There may be other high-pressure phases of C3P4 besides the five structures investigated here. One possible candidate for a higher pressure phase, the defect /3-Sn structure, was investigated. However, no stable structure was found with a volume smaller than the equilibrium cell volume of pseudocubic-C3P4. Therefore, it is concluded that there is no pathway for phase transition from the pseudocubic phase to other structural forms under pressure. The calculated total energies are fitted to fourth order polynomials and the bulk modulus for each structure is then determined. It was found that the pseudocubic phase also has the highest bulk modulus of 203 GPa. We have also investigated the electronic properties of the various phases of C3P4. It was found that within the LDA approximation, C3P4 is metallic. Even though it is known that LDA underestimates band gap, the overlap between the lowest conduction band and the highest valance band is almost 2 eV and correction of the band gap by other method may not be able to completely remove this large overlap. We therefore also conclude that C3P4 is metallic or at most a narrow gap semiconductor. References 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12.

P. Hohenberg and W. Kohn, Phys. Rev. 136, 864B (1964). W. Kohn and L. J. Sham, Phys. Rev. 140, 1133A (1965). R. Car and M. Parrinello, Phys. Rev. Lett. 55, 2471 (1985). M. C. Payne, M. P. Teter, D. C. Allan, T. A. Arias, and J. D. Joannopoulos, Rev. Mod. Phys. 64, 1045 (1992). H. Hellmann, Einfuhrung in die Quantumchemie, (Deuticke, Leipzig, 1937). R. P. Feynman, Phys. Rev. 56, 340 (1939). A. T.-L. Lim, J. C. Zheng, and Y. P. Feng, Int. J. Mod. Phys. B 1 6 , 1101 (2002). M. L Cohen, Phys. Rev. B 32, 7988 (1985). A. Y. Liu and M. L. Cohen, Science 245, 841 (1989). A. Y. Liu and M. L. Cohen, Phys. Rev. B 4 1 , 10727 (1990). D. M. Teter and R. J. Hemley, Science 2 7 1 , 53 (1996). H. J. Monkhorst and J. D. Pack, Phys. Rev. B 13, 5188 (1976).

CORRELATION OF THE DISTRIBUTION OF ESSENTIAL/TRACE ELEMENTS IN HUMAN HAIR FOR LUNG CANCER DIAGNOSIS Pei-Lin Mao", Kelly Kueh Hui-Lee", Alan Ng Wei Keong*, Yee-Tang Wang*, and Ping Wu° 'Institute of High Performance Computing, 1 Science Park Road, #01-01 The Capricorn, Singapore Science Park II, Singapore 117528, Tel: 6419-1522, Fax: 6778-0522. Email: Kelly®ihpc.a-star.edu.sg *Tan Tock Seng Hospital, Department of Respiratory Medicine, 11 Jin Tan Tock Seng, Singapore 308433

Objectives of the study were to correlate essential/trace elements in human hair in patients with lung cancer and control subjects, and to develop a model that can be applied to test subjects to determine the probability of having lung cancer. However, concentrations of hair essential elements can range varied due to different sample preparation methods and calibration standards. We applied a unified testing procedure for all hair samples to prevent the limitations in hair study. Furthermore, by using an in-house developed artificial intelligent software package (APEX), pattern recognition analysis of 19 essential elements in hair was performed to elucidate a clear picture for the relationship between cancer and non-cancer patients. A computer model was developed based on the levels of the selected essential/trace element for patients diagnosed with lung cancer as well as for non-cancer control groups. This study determined that certain essential/trace elements were strongly correlated to lung cancer, which can be remarkably distinguished from the control group. Future studies linking the contributions of these elements to the carcinogenesis of tumours are expected. Thus, the interrelationship of multiple essential elements in hair may be used as a diagnostic tool for lung cancer. Thus, it may set up the ground for further research and development of a new screening technique for the medical industry.

1

Introduction

Lung cancer remains the most common causer of cancer death in the world. It is account for about 30% of all cancer deaths in the United States. In the past few years, in Singapore, it is the major cause of cancer in men and the third most frequent cancer in women. Between 1988 and 1992, there were over 4,000 people diagnosed to have lung cancer in Singapore. More than 80% of lung cancer cases are related to smoking. About 85% of men and 75% of women died on lung cancer because of smoking. Other important causes of lung cancer include exposure to asbestos and Radon. It is, however, fairly rare in people under the age of 40, partly because it takes many years to develop. The average age of people found to have lung cancer is 60. Lung cancer represents a major health problem worldwide. Most lung cancers begin to grow silently, without any symptoms. Patients with lung cancer often do not develop symptoms until the cancer is in an advanced stage. Surgery, chemotherapy, and radiotherapy have all been utilized in the treatment of lung carcinoma. If lung cancer is found and treated early, before it has spread to lymphnodes or other organs, the five-year survival rate is about 42%. It is therefore, vital to identify the causes, in order to diagnosis and earlier prognosis of the disease.

28

29

Study of cancers using hair samples has been done years ago, and, chemical analysis on the hair sample had shown that the trace elements in human hair has been correlated to the diagnosis of various diseases [1]. In this paper, a screening technique of lung cancer by hair chemistry and the in-house developed artificial intelligence software APEX, as well as, commercial software SPLUS is introduced (3). This screening technique has been shown successful in diagnosing prostate cancer (2). Further study has also been carrying out on gastric cancer using this technique. A total of 100 hair samples were collected at a local hospital in Singapore. 43 healthy normal were collected as control group and 57 lung cancer patients were collected as cancer group. Each hair sample is to be 3 to 5cm long, with a total of about 0.3gms of hair were cut and collected from the occipital region, near the scalp of the head, of each patient. Chemical analysis on each hair sample was carried out and done carefully beforehand. Twenty hair trace elements are extracted from chemical analysis of these hair samples. The analytical data consists of 100 selected patients, with a response, Target, which was recorded as presence or absence of lung cancer and the 20 hair essential elements of Al,

As, Ca, Cd, Co, Cr, Cu, Fe, K, Mg, Mn, Mo, Na, Ni, P, Pb, Se, Sr, V and Zn.

2

Analysis

The analysis was carried out at the following steps: 1. Exploring the raw analytical data. There are several approaches; one can use a simple histogram to view the distribution of the data and a scatter plot to seem the strength of the relationship between two hair trace elements. 2. Fitting a raw model. Often, we will need to normalise the raw data before fitting a null model, especially at the case where the variation among the trace elements are differ widely. A raw logistic regression model is fitted to the dataset (4). Model diagnosing was then carried out before a final prediction model has chosen. •

Checking for outlying points. The presence of outliers and high influential points can be graphically detected by checking the residuals and fitted values plot, and the leverage-residual plot. A potential outlying point has been detected, i.e., sample 16, and, several possible outliers, i.e., sample 21, 71, 57, 59, 31, 41, 63, 23 and 66.

•

Goodness of fit test. The goodness of fit was accessed by calculating its p-value and the pseudo- R .

•

Significant test for the hair trace elements on the model fitted. A smaller submodel with significant elements will fitted, and, goodness of fit test will carried out to check for the adequacy of the model.

30

3.1 The models We have developed several models. The goodness of fit of few most important models is listed in table below. MODEL

P-value

Pseudo- R 52.54% 63.79% 64.82% 62.08%

Null 2nd 3rd Sub (from 2nd)

Table 1. Goodness of fit and pseudo- R

0.87405 0.99602 0.99339 0.99864

for some important models.

The base model has a completed dataset of sample size of 100. The 2nd model has a sample size of 99 with sample 16 being deleted from the dataset, whereas, the 3rd model has all the outlying points mentioned above, being removed from the dataset. All of these three models have the 20 hair trace elements as of its explanatory variables. The submodel has 13 explanatory variables with a sample size of 99, after deleted sample 16. It is a submodel derived from the 2nd model. 3.2

Final Model Behavior

The predicted model, logit P = a + AlpAi + ASPAS + CaPca + CdPcd + CoPco + CrP& + MnpMn + NapNa + NiPwi + Ppp + PbpPb + SePse + SrPsr where P is the estimated probability of having lung cancer and a, P are the regression coefficients.

Lung Cancer Prediction Model

1.2 1 -

«M»4O,«*

* •

^

,«*»

«•»

«M>4 « • •

.

>0.8

•

J 0.6-

%•

• • • •-

•

•

•

•

o °-0.4-

•»

• *

0.2 -

k

•• <

* •#

o3

20

« - * 40

-

* 60

Samples

80

100

120

• Cancer s False cancer

si Control » False control

31 3

Conclusion

From the Table 1, the goodness of fit suggests the submodel is a better prediction model with p-value of 0.9986. The test for significant terms from the predicted model suggests that hair trace elements of Al, Ca, Cr, Ni, P, Pb, Se and Sr are highly significant. Thus, were also saying that these hair trace elements extracted from the hair sample are important in contributing to the lung cancer development. From the figure, the false cancer and false control are samples predicted wrongly from the model compare against the experimental data. The specificity and sensitivity of the predicted model is 88.10% and 91.23% respectively. The ambiguity zone has 15 samples fall between probabilities 0.4 to 0.6. If we exclude these samples, the probability of predicting lung cancer is fairly good, at about 95%. The four wrongly predicted results from the model, in compare with the experimental data, could be due to several reasons, such as data-entry error, or perhaps, those are patients with potentially higher chance of getting lung cancer. Further investigation and diagnosis on these patients are encouraged.

References 1. Wang, X. R., Zhuang, Z. X., Zhu, E., Yang, C. L., Wan, T. and Yu, L. J. (1995) Multielement ICP-AES Analysis of Hair Samples and a Chemometrics Study for Cancer Diagnosis. Journal ofMicrochemical 51, 172-181. 2. Wu P, Heng KL, Yang SW, Chen YF, Mohan RS and Lim PHC. A screening technique for prostate cancer by hair chemical analysis and artificial intelligence. Lecture Notes in Artificial Intellignece 1620, Subseries of Lecture Notes in Computer Science, Springer-Berlag Berlin Heidlberg 1999;372-376. 3. Selvin S. (1998) Modern Applied Biostatistical Methods using S-Plus. Oxford University Press, New York. 4. Demaris, A. (1992) Logit modeling : practical applications. Sage Publications, California. 5. Heine H. H. (1995) Lung cancer : advances in basic and clinical research. Kluwer Academic, Boston. 6. Retherford, R. D. and Choe, M. K. (1993) Statistical models for causal analysis. Wiley, J., United States.

DYNAMIC SIMULATION OF AN ANAEROBIC DIGESTION PROCESS USING ORTHOGONAL COLLOCATION ON FINITE ELEMENTS METHOD T. T. LEE AND P. WU Institute of High Performance Computing, 1 Science Park Road #01 -01 The Capricorn Singapore Science Park II Singapore 117528 E-mail: [email protected]

This paper addresses dynamic simulation of an attached-growth anaerobic digestion process treating industrial wastewater. A new partial differential equation (PDE) model of the anaerobic digestion process developed based on a recent work on lumped parameter approach for the same process by Bernard et al. (Biotech. Bioeng, 2001) is presented. Our simulation model was developed on the basis of gPROMS process simulation package (http://www.psenterprise.com) which allows studies on a wide range of practical operational problems such as plant start-up, plant shutdown and hydraulic shock loads. Numerical simulation on an industrial case study using advanced numerical techniques showed that the new model is able to explain the process behaviour more accurately than the conventional approach.

1.

Introduction

Anaerobic digestion processes have been employed for the treatment of sewage, industrial wastewater or mixture of both to degrade organic wastes to methane (CH4) and carbon dioxide (CO2). It has a reputation of being difficult to operate and prone to process instability due to external disturbances such as hydraulic, organic or toxicants shock loads. Physically, a fixed-bed anaerobic reactor consisted of a fixed-bed of solid support structure or media inside a large cylindrical vessel. Wastewater passes through the bed from bottom of the vessel, reacted with biomass immobilised on the media and finally comes out of the effluent channel above the media continuously (Fig. 1). Much effort in mechanistic modelling of the anaerobic treatment systems has been concentrated on improving the reaction kinetics [1] but hydraulics of the reactors has often been overlooked. Existing mathematical models largely assume ideally mixed reactors [2] with no concentration gradients on species within the reactor, which could give erroneous results when simulated. In this work, a new partial differential equation (PDE) model for the attached growth anaerobic digestion process will be formulated followed by development of a numerical algorithm for simulation of the process based on the orthogonal collocation (OCFE) method described by Lee et al. [3]. The model will be validated using a pilot-scale experimental study on an industrial wastewater treatment. 2.

Model development and numerical algorithm

We considered only eight basic components present in the three-phase (solid-liquid-gas) system. We assume two bacterial species form the solid phase ie acidogenic bacteria (XO and methanogenic bacteria (X2). For liquid phase, we assume there are four components ie COD (SO, VFA (S2), inorganic carbon (C) and total alkalinity (Z) and for gaseous phase we have CH4 and C0 2 . Kinetics of Bernard et al. [2] is adopted in this work. The following assumptions are considered: (1) X] consumes COD to form C0 2 and VFA. (2) In the second step, X2 consumes VFA to form CH4 and C0 2 (3) For hydrodynamics, only axial convection and backmixing of the liquid are considered. (4) Biomass is attached to the media with a small fraction detached continuously. (5) The fluid physical properties are constant and (6) Identical Peclet number (Pe) for S t , S2, C and Z. Justification for this

32

33

assumption is detailed in Lee et al. [4,5]. Based on the model assumptions, we have formulated a complete set of mass balance equations for the three-phase fixed-bed anaerobic digestion system. An example of the PDE model for COD and acid-forming bacteria is given as follow: 1 d>S, ^ ) Pe
dt

kl/ilXl'^=,,^l-aDXl-Mi=Mm-T^-

' ^ ' dt

"'"' — >'•

r

™Sl

+

K, si

(1)

where \ii is growth rate of X} and Pe, kh KSi and a are constants and D is influent flow per unit volume of reactor. The OCFE method has been applied to solve PDE problems in the environmental process engineering systems [3]. The OCFE model for Eq. (1) is given as: t

"l.(t-IXAf-tl)<-j _ ^ frw

" ^

! . ( * - ! ) ( Af-UK j i

U

• pe

' . ^

w

V< D C 2-. °j,i-Jl.(k-lX1<-l)<-i

_ y ~ A*i,(*-i)(A'+i)+y i,(*->)(iv+D+y

s

i,(t-ixAr^iHj

^1,(*-1XW+1)+/

3.

("max-,, "\(*-l)(JV+l)+y +

A

_ V J ? ,I Z j /1y,/-'5l.(*-lxAr-i-l)t/||

n y U A / A

F. y " l A ' u i - l X A ' t l ) * ; ^ i,<*-ix AT * ] ) • . /

w h e r e i,(t-ixAr+i)+y

, A: = 1,.., JV£+7 and j = 2,.., N+2

(2)

„ X1

Computational strategy for process simulation

In this study, we employ gPROMS package version 2.1.1 [6] running under SUN Solaris 7.0 Unix operating system for all our process simulation work. We set error tolerance for all dynamic simulations to le-4. The OCFE model is cast into a differential-algebraic equation (DAE) problem and solved using gPROMS's default Backward Differentiation Formulae (BDF) solver [6]. From a previous study [3] it was shown that three finite elements and four internal collocation points are sufficiently accurate for numerical simulation of reactor systems in the environmental engineering. Consequently, this result is implemented in this work. Pe, representing the reactor hydraulic is arbitrarily choosen as 0.1 while all other parameters follows Bernard et al. [2]. We first compute the steadystate process conditions and solve for the states and algebraic variables of the various species within the reactor. These steady-state values become the initial conditions for dynamic simulation of the process. We then introduce a step change of+ 15 percent at day 15th to the influent COD, VFA, pH and flowrate and simulate over a period of 100 days. For model validation, we apply the computational algorithm to model a pilot-scale system of Bernard et al. [2] treating industrial wine distillery vinasses at their pilot-scale facility in LBE-INRIA in Narbonne, France. The main equipment i.e. reactor of their experimental set-up is essentially similar to Fig. 1. All kinetic data and reactor geometry used in their study will be re-employed for model validation. We performed a trial and error procedure to estimate the 'best' value of Pe. Concurrently, we recalibrate some of the constants such that steady-state values of the species at outlet of the reactor are similar to their pilot-scale experimental results 4.

Results and discussions

Dynamics of COD and CH4 at various axial positions are shown in Fig. 2. Evidently, the results showed that there is a significant distribution in the dynamics of gas species (full results not shown) along the reactor that cannot be otherwise captured by lumped parameter models. It is observed that the step test results in step increases of COD, VFA and pH along the reactor. This additional nutrients in the system causes growth rate and

34

hence concentrations of biomass to be increased (results not shown). Increased in the growth rate of X2 in turn results in more CH4 production, which is assumed to be proportional to the rate of methanogenesis reaction. On the other hand, concentration of C0 2 dropped due to the relatively large (by a factor of four) increased in the total alkalinity of the system compared with the increased (by a factor of three) in the inorganic carbon concentration. There is no observed axial variations in the alkalinity of the system since it does not take part in the reactions. An appropriate value of Pe was identified to be 0.005 and constants related to C0 2 and CH4 (k4, k5 and ke) are identified to be 42.17 mmol C02/1, 229.07 mmol C02/1 and 226.5 mmol CH4/I respectively. Fig. 3 shows the patterns of diurnal variation of flow rate, COD, VFA and pH used in the simulation. Fig. 4 shows the simulation results for COD, VFA, Z and C compares with the experimental data at the reactor outlet. Evidently, the PDE model prediction of the species dynamics are in good agreement with the experimental data throughout the 70 days except for the peak in VFA concentration close to the end of the experimental period. The latter may be attributable to the relatively slower reaction towards overloading of VFA in the model than the process itself such that the accumulation starts less rapidly in the model. The peak in COD dynamics at day 50 of the experimental data is not accounted for in the PDE model. It is uncertain if this peak represents the actual process behaviour since there is no corresponding increase in the inlet COD load. Fig. 4 also shows that there is a significant bias in the dynamics of total alkalinity and inorganic carbon compared with the experimental data, attributable to the uncertainty in the measurement of the inlet alkalinity by Bernard et al. [2]. Figure 5 shows that simulation results for CH4 and C0 2 flow rates and pH are well predicted, compared with the on-line measured experimental data at the reactor outlet. At day 25, the model over predict the pH drop during the destabilisation phase (days 21 to 25) and is attributable to the underestimation of the alkalinity of the system [2]. Results of this study may be further improved if more complex kinetics model which include other identified bacterial species and radial distribution of the biomass are considered. 5.

Conclusions

A generalised computational algorithm based on the OCFE approach has been developed for modelling fixed-bed anaerobic digestion process. We showed in this work that the PDE model is able to predict distribution of species along the reactor for Pe over and above 0.1 that cannot be otherwise captured by lumped parameter models. It is also demonstrated in this study that the proposed algorithm is able to give good prediction of the dynamics of species of interests in a pilot-scale study of an industrial wastewater which is operating at near completely mixed hydraulics. 6.

References

1. Masse, D.I. and Droste, R.L. (2000) Comprehensive model of anaerobic digestion of swine manure slurry in a sequencing batch reactor. Wat. Res. 34(12) 3087-3106. 2. Bernard, O., Hadj-Sadok, Z., Dochain, D., Genovesi, A. and Steyer, J.P. (2001) Dynamical model development and parameter identification for an anaerobbic wastewater treatment process, Biotechnology and Bioengineering, 75(4), 424-437. 3. Lee T.T., Wang, F.Y. and Newell, R.B. (1999c) Dynamic simulation of bioreactor systems using orthogonal collocation on finite elements. Computers & Chemical Engineering 23, 1247-1262.

35

4. 5. 6.

Lee T.T., Wang, F.Y and Newell, R.B. (1999a) Dynamic simulation of activated sludge process using orthogonal collocation approach. Wat. Res. 33(1), 73-86. Lee T.T., Wang, F.Y. and Newell, R.B. (1999b) Distributed parameter approach to the dynamic of complex biological processes. AIChE Journal 45(10), 2245-2268. gPROMS Introductory User Guide rel. 2.1.1, Process Systems Enterprise, U.K. 2002

Figure 1. Schematic of a fixed-bed anaerobic digestion reactor

%uz '•• —

"*'—

i f\

Figure 2. Results of simulated step change 54J

i s "8

—i

r

t 2

_

•"•

, - - • " -

R101.ZETfi_INt.2>

Figure 3. Diurnal variations of Bernard etal. [5]

Figure 4. Validation of PDE model with experimental data of Bernard etal. [5]

TOTAL ALGEBRAIC SYSTEM IDENTIFICATION FOR HOMOGENEOUS CATALYZED SYNTHESES OF FINE CHEMICALS (ALKENES TO ALDEHYDES) EFFENDI WIDJAJA, CHUANZHAO LI, AND MARC GARLAND Department of Chemical and Environmental Engineering, 4 Engineering Drive 4, National University of Singapore, Singapore 117576 E-mail: chemvg® nus.edu. sg In this paper, a thorough and rigorous total algebraic system identification is performed as a starting point before kinetic studies on a chemically reactive system. Based on the in-situ spectroscopics measurements and knowledge of the amount of initial reagents put inside the reactor, all meaningful (observable) algebraic information needed for subsequent kinetic modeling are extracted. Successful application to a homogeneous catalyzed synthesis of a fine chemical (alkene to aldehyde) is also presented.

1

Introduction

In the chemical sciences today, both in academic and industrial research environments as well as in chemical manufacturing, spectrometers play a very important role. They are the true centres, which generate much of the chemical information used. Accordingly, effective methods to analyze the data and to maximize the amount of extracted physically and chemically meaningful information is of considerable importance. Imagine a simple situation in which a reactive system is created by introducing a certain amount of reactants and/or catalyst(s) into a liquid phase batch reactor. A spectrometer is employed on-line to record signal attenuation as a function of frequency. Furthermore, since the reaction requires minutes or hours to complete, a time-series of spectra is collected. The spectra contain information concerning the time dependent concentrations of the reactants, intermediates, and products. Such data arrays, obtained from sequential spectral measurements, may often be 100s of megabytes in size or more. Briefly, there are three primary goals that we would like to develop and to investigate in this current study in order to achieve what we called 'Total Algebraic System Identification". These three goals are: 1. Goal I: Determine the number of observable species s and reactions r present in the synthesis using the spectroscopy measurements alone. 2. Goal II: Obtain the pure component spectra of all the observable species a sxv - also by using the spectroscopy measurement alone. 3. Goal III: Given data on the amount of reagents put into the reactor, determine all the reaction stoichiometrics v rxs , the extents of reactions ^keXr, and the time dependent moles of all observable species N te)Ct . These are prerequisites to other subsequent steps such as the modeling of reaction kinetics and rate constant determination. We develop our approach by taking a total soft-modelling path. Such an algebraic system identification is the necessary starting point for later rigorous solution of the differential inverse problem (classic inverse problem in chemical kinetics) in order to determine the kinetics and reaction rate constants. (Deuflhard, 1983) [1]

36

37

2

Total Algebraic System Identification

Let Akexv represent the consolidated preconditioned1 [2] spectroscopic matrix, where k denotes the number of spectra in one batch experiment, e denotes the number of experiments, and v is the number of data channels associated with the spectroscopic wavenumber range. Atexl/ can be assumed as a linear combination of cell-path length \keXkl,, concentration matrix CteX!and the pure component spectra matrix a i x „, where s denotes number of observable species in the chemical mixture, and experimental and instrumental error, E kew . Goal I & II Singular value decomposition (SVD) was performed to decompose the data matrix A haa, yielding the orthonormal matrices U and V , and the diagonal singular value matrix E. The BTEM approach performs spectral reconstruction one-spectrum-at-a-time [3,4]. Based on an exhaustive search of the prominent spectral features seen in the V T vectors and targeting them one by one as part of the BTEM algorithm, all pure component spectra of observable components are reconstructed. By using this approach, the number of observable components are thereby determined. Goal I & III Before the goal III analysis, an important re-normalization of the spectroscopic data must be first performed. The aim is to eliminate the dependence of the spectroscopic data on volume and cell-path length variables. This re-normalization is done using an internal standard approach resulting in a matrix A ^ [5]. According to the Jouguet-De Donder equation, the change in moles of the observable species corresponds to the extents of reactions and the reaction stoichiometrics. Therefore after some mathematical manipulation, a new equation describing the change in the measured spectroscopic data can be written as follows. ADml - (A Dml T - E "•kexv

V^kexv)

;;

a

~ b kexr "rxs asXV

CT\ \^>

In order to determine the first estimate of the number of observable reactions, the matrix of l A t e ^ - ( A t e t r ] i s subjected to SVD. This is followed with statistical test (Malinowski F-test) on the resulting diagonal singular matrix. This estimate will later be confirmed by the target-transformation factor analysis approach. Since the reconstructed pure component spectra obtained from goal II are in normalized form, a diagonal weighting matrix dsxs is required to quantify the time dependent moles for the observable species. This diagonal matrix is obtained using a material balance model, in which the sum of squares error minimization between the estimated N*CXE and the initial reaction condition N fex£ is utilized. NkexE is the time dependent moles of elements or chemical groups. Nkexs

=

A-kexv LHJXV 1 (.** jxi /

N t e x £ = Nfaa, VsyE

(3) (4)

1 The role of preconditioning is to properly subtract signals from known components, i.e. vapour phase CO2 and moisture, dissolved CO in solvent, as well as solvent. The preconditioning algorithm is based on entropy minimization. Details can be found in reference [11].

38

MinG = X S ( N f c > < £ - N ^ ] ke

E ^

,~ (-'J

w-r.t dsxs Next, the mole changes for observable components AN kexs are calculated and are subjected to SVD. AN^x, = ukexkeLkeXS VIX( (o) Since physically meaningful reaction stoichiometries lie in the vector spaces spanned by the rows of V T , target transformation factor analysis can be used to check whether the target reaction stoichiometries are compatible with the observation spaces. If the target stoichiometry is consistent with the observations, the target and transformed stoichiometries are identical. The target stoichiometries can be obtained/deduced from chemical reasoning or from an approach proposed by Yin (1990) [6] . Since the mole changes were obtained from a linear combination of the extents of reactions and the reaction stoichiometries, the extents of reaction can be calculated. & * , = AN feX! [y rx J + 3

(7)

Experimental Section

The homogeneous catalytic hydroformylation of 33DMB was performed using Rh4(a-CO)9(|0.-CO)3 as catalyst precursor in n-hexane solvent and high pressure in situ infrared spectroscopy was used as the analytical tool. A total of 6 experiments were conducted in this study. During each experiment, a total of 30 - 36 reaction spectra were recorded at regular intervals (a few minutes) in the FTIR wave number range 1000 - 2500 cm"1. The resolution was 4.0 cm"1 and the data interval was 0.2 cm"1. 4

Results and Discussions

Goal I & II SVD was performed to decompose the preconditioned absorbance data matrix A 158x4751> anc^ subsequently BTEM was used to reconstruct the pure component spectra of all observable species. Spectral reconstruction parameters and the identities of resolved pure component spectra are presented in Table 1 and Figure 1 as follows. Table 1. Spectral Reconstruction Parameters and Species Identities # # V1 vectors, Wavenumber Maximum Species Species z region (cm1) Abs, a Identification* 1. 15 1640-1644 1 33DMB 2. 15 1733-1735 1 44DMP 3. 15 1885-1887 3 Rh4(a-CO)9(u-CO)3 4. 18 2020-2023 1 RCORh(CO)4 5. 100 1720-1726 1 2M33DMB 6. 100 2075.5-2077 1.5 Rh4(o-CO)12 7. 50 1818-1820 5 Rh6(CO)16 *44DMP = 4,4-dimethylpentanal; 33DMB = 3,3-dimethylbut-l-ene; 2M33DMB = 2-methyl,3,3-dimethylbutanal

39 i 0.5

.A 1600

A

S p ec les 1

A

. 1700

1000

i

1900

2000

1600

1100

1700

1

S p e c le S 3 0.5

1700

1S00

1S00

1900

2000

Speclei 6

1

A 1600

Species 5

jJl

1900

S p ecle » 4

2000

|

L ^

2100

1600

1700

'

(

0.5

1S00

1900

2000

Speclei 7

MAJL 1600

1700

1800

1

0 1600

1700

2000

2100

4^ 1600

-A_ 1700

1BO0

1

1000

2000

. 2100 A

Species 2

A

O.S

1900

2100

1

•

0

2100

J

1800

1000

2000

2100

la. Major Species lb. Minor Species Figure 1. The Reconstructed Pure Component Spectra via BTEM for Both Major and Minor Species (Ordinate: Normalized Absorbance; Abscissa: Wavenumber)

Goal I & HI SVD was performed on [Af^-(A^)**J matrix, and subsequently the resulting singular values of X matrix were subjected to Malinowski's F-test. The F-test showed that there are two significant factors. It inferred that at least two overall and observable reactions were present. Since pure component spectra of all observable species can be resolved, the associated chemical groups present in these species can be established. The chemical groups were employed to set up a material balance model. Solving Eq. 5 using Corana's SA will provide the optimum values of the weighting matrix dSM . In the first attempt to obtain d I)ct , both major and minor species were included in the optimization. Because these minor species (2M33DMB, Rh4(o"-CO)i2, and Rh6(CO)16) have very low concentrations, the multiple linear regression sensitivity of these species on the absorbance data is also very low. In the regression and optimization process, the d jxt values corresponding to these species were unstable and their values kept changing. Hence, in the second attempt, these species are not included in the optimization anymore. Upon obtaining the optimum dIXI from the second attempt, the time dependent moles of these major species can be generated by a pseudo-inverse calculation (Eq. 3). These moles are presented in Figure 2. S1DHB

^L

KW>VV

r//>// Figure 2. Moles Profiles for Four Major Species (Ordinate: Moles; Abscissa: Mixture Number)

40

Next, AN kexs was calculated and was subjected to SVD. Two target stoichiometrics were proposed and were projected onto the observation space V ^ . Accordingly, the target stoichiometrics in matrix form are as follows. ~-l 0 0 1~ (0,a - 4 - 1 4 0 The projected stoichiometrics in matrix form are also shown as follows. I \ _ T -L000° -0-0014 -0.0003 1.0000' \VrxJ

projected ~

_4 QQQQ

-Q.9209

4.0182

0.0002

(8)

(9)

The sum of squares error between target and projected stoichiometrics for the 1st and 2na reactions are 2.07E-6 and 6.6E-3 respectively. Therefore, it can be concluded that the target vectors are the true reaction stoichiometrics observed in this current system. Upon obtaining reaction stoichiometrics, the extents of reactions were calculated using Eq. 7, and the results are presented in Figure 3.

Figure 3. The Extents of Reactions for l sl and 2nd Reactions. (Ordinate: Moles; Abscissa: Mixture Number)

5

Conclusion

Total algebraic system identification was successfully performed on the homogeneous rhodium-catalyzed 3,3-dimethylbut-l-ene hydroformylation reaction using in-situ FTIR spectroscopy. The algebraic chemical information retrieved included the number of observable species and reactions, the pure component spectra of observable species, time dependent moles, reaction stoichiometrics, and extents of reactions. Thus, subsequent kinetic studies can proceed. References 1. Deulfhard P. and Hairer E., Numerical treatment of inverse problems in differential and integral equations. (Boston, Birkhauser, 1983) 2. Chen L. and Garland, M., The use of entropy minimization for the preconditioning of large spectroscopic data arrays. Application to in-situ FTIR studies from the unmodified homogeneous rhodium catalyzed hydroformylation reaction. Applied Spectroscopy (2002) In Press. 3. Widjaja E., Li C.Z., and Garland M., Semi-batch homogeneous catalytic in-situ spectroscopic data. FTIR spectral reconstructions using Band-Target Entropy Minimisation (BTEM) without spectral preconditioning. Organometallics. 21 (2002) pp. 1991-1997 4. Chew W., Widjaja E. and Garland M., Band-target entropy minimization (BTEM): An advanced method for recovering unknown pure component spectra, application to the FTIR spectra of unstable organometallic mixtures, Organometallics 21 (2002) pp. 1982-1990. 5. Chew W. Ph.D. Thesis, National University of Singapore (in progress). 6. Yin, F, Some linear characters in chemical reaction systems. Ind. Eng. Chem. Res. 29 (1990) pp. 34-39.

N U M E R I C A L S O L U T I O N OF B O U N D - C O N S T R A I N E D NONLINEAR SYSTEMS IN CHEMICAL ENGINEERING S. BELLAVIA, M. MACCONI, B. MORINI Dipartimento

di Energetica

"S.Stecco", University of Florence, via C. Lombroso 6/17, 50134 Florence, Italy E-mail:bellavia<3ciro.de.unifi.it, macconiSde.unifi.it, [email protected] Solving systems of nonlinear equations with bounds on the variables is an important issue in the field of scientific and engineering computation. A promising numerical method for bound-constrained nonlinear systems has been recently proposed by the authors. In this paper, the new procedure is tested using a variety of benchmark problems arising in chemical process modeling. Numerical results demonstrate the effectiveness of the method.

1

Introduction.

Very frequently, systems of nonlinear equations model a variety of process simulation problems and an essential aspect for determining a reliable solution is to enforce constraints on the variables. 2 ' 7,8 This fact can be motivated as follows. Often, the physical solutions of real-life problems are expected to lie in a certain area. For example, many problems admit solutions with positive (non negative) components. Adversely, the nonlinear systems which model the problems are prone to the existence of "spurious" solutions, that is solutions that have no physical meaning. Hence, we are urged to restrict the domain by putting suitable "physical" constraints on some or all of the variables. Moreover, constraints on the variables are imposed either to take into account the domain of the nonlinear mapping if it does not coincide with the whole space or to remove discontinuities of the nonlinear function. These constraints can be denoted "absolute" constraints 8 . In the context of solving constrained nonlinear systems there is no loss of generality considering the bound-constrained problem F{x) = 0 ,

x £ n = {x € M n

s.t.

I < x < u},

(1)

where F : X -> R n , fi C X C B " , and the vectors / € (IRU-oo)", u e (HUoo) n are specified lower and upper bounds on the variables. In fact, a problem consisting of a nonlinear system F(x) — 0, nonlinear inequality constraints and variable bounds can be stated as (1) by using bounded slack variables. We are interested in the methodologies for solving (1) which are explicitly designed for nonlinear systems and whose computational effort is not significantly affected by the presence of the bounds. Several issues arise and among them we recall the enforcement of the constraints and global convergence properties. Protection against the violation of the bounds can be enforced by heuristic strategies. For example, positivity can be enforced by clipping strategy, i.e. negative components are put to zero, or by replacing the negative components by their absolute values 7 . These ad hoc approaches can work well for certain applications, but cannot systematically account for bounds in a general way. Regarding global convergence, techniques specifically designed for (1) must be

41

42

considered since classical globally convergent methods for unconstrained nonlinear systems are unsuited. In practice, the existence of spurious solutions can adversely affect the performance of numerical algorithms and most numerical methods, such as globalized Newton method, will converge to spurious solutions as readily as to meaningful solutions. Further selecting a starting point in the general vicinity of a physical solution is no guarantee of avoiding a spurious solution. A wide variety of techniques have been introduced to address this problem: see eg. homotopy methods 4 ' 8 linesearch methods 5 ' 6 , trust-region methods 1 , 1 0 , interval Newton methods 3 . The authors have recently proposed a new afnne trust-region method for solving bound-constrained nonlinear systems and analyzed its convergence properties. 1 This method combines the classical trust-region procedure for nonlinear systems with a suitable afnne scaling technique for constrained optimization problems and generates a sequence of iterates strictly feasible, i.e. belonging to the interior of fi. The merit function used for the globalization strategy is f(x) = ||F(x)||2/2. The scaling is determined by the nearness of the current point to the boundary and handles the bounds. In fact, it allows to take steps which are angled away from the bounds. This feature overcomes the occurrence where the steps point directly to a nearby bound and convergence is lost. The afnne trust-region method has three relevant characteristics. First, since the generated iterates lie in Q, it avoids infeasible solutions and can deal with problems where F is undefined outside Q. Second, its convergence to a point of fl that solves (1) is global in the sense that it does not depend critically on the choice of the initial guess. Third, it has the ability of converging quadratically to a solution of (1) that lies in the interior of fi. Extensive numerical experiments 1 demonstrated the effectiveness of our method and showed that it provides a unifying framework for handling variables subject to physical and absolute bounds. 2

Reliable solution of nonlinear s y s t e m s

We applied our affine trust-region method in the solution of 45 benchmark problems belonging to the recent web-based library 9 denoted NLE (NonLinear algebraic Equations). Specifically, we focused on the library problems having dimension larger than one. The collected problems are classic and include chemical reaction equilibrium, pipeline flow, flowsheeting, steady-state material and energy balance on a reactor, reaction rate equations and free energy minimization. Several problems feature poor-scaling and ill-conditioning. None of the bounds is active at the known solutions. The Difficulty Level of the problems is classified as Low (LDL), Average (ADL), High (HDL). 9 The number of problems belonging to these classes is 16, 12 and 17, respectively. Each problem is supplied with several initial guesses and the number of resulting tests is 152. Many of the supplied starting points are not in the vicinity of the solutions. Thus, the considered tests allow to assess the robustness and reliability of our method. Note that the number of the tests associated with the Low, Average, High difficulty problems are 54, 37, 61 respectively.

43

8i

•

•

.

.

1

120

Figure 1. Robustness and computational cost of the affine trust-region method.

The conducted numerical experiments were performed using double precision, i.e. e m ~ 2.10 - 1 6 . Convergence to a solution of (1) was declared if the function value of F satisfies the tight stopping criterion: \\F\\2 < 1 0 - 8 . In this case, the final iterate can be considered safely as an approximation to a solution of (1). Further, the computed solutions were compared with those reported in the library. Much attention was paid to detect convergence to a minimum of the merit function / which is not a zero of F. Obviously, in these cases the procedure is considered to fail. In the sequel we summarize some of the results obtained. From the numerical results our method seems to be robust. The first plot in Figure 1 shows this feature. On the x-axis, problems are numbered from 1 to 45 using the problem list 8 . For each of them, the dotted line displays the number of tests to be performed. Hence, it represents the "ideal" performance of a numerical method. The solid line shows the number of tests we successfully solved for each problem. In fact, on a total of 152 tests our procedure solved 121 tests. We failed to solve two high difficulty level problems named 9 "Poorly scaled chemical equilibrium problem" and "Modeling of a CSTR for a complex sequence of reactions-original form" for all the provided starting vectors. Specifically, for these problems the supplied tests are 4 and 3, respectively. More insight into the problems, we report the percentage of solved tests within each level of difficulty. Our method succeeded 94% of the time for the tests of LDL problems, 84% of the time for the tests of ADL problems, 64% of the time for the tests of HDL problems.

44 The remaining plots in Figure 1 show the computational cost of our method. In particular, for each problem we considered the tests successfully solved and computed the average of iterations and F-evaluations performed. Computational cost due to the evaluation of the Jacobian matrix F' is not included but we stress that one Jacobian-evaJuation is required at each iteration. The graphs distinguish among the level difficulty and indicate the average number of iterations by a dashed line and the average number of F-evaluations by a solid line. On the s-axis, the problems are indicated as follows. From the total of 45 problems, we extracted 3 subsets of problems corresponding to the three level of difficulty. Then within each subset, we started from 1 and indicated the problems by a progressive number on the base of the order used to list the problems in the library. Most of the problems are solved very efficiently with our method: the average number of F-evaluations required is less than 20 for 34 problems, between 20 and 80 for 5 problems and larger than 100 for the problems denoted 9 : "Consecutive reactions in a CSTR" (LDL test), "Three stage, two component distillation column" (ADL test), "Pipe diameter calculation for specified pressure drop", "Pipe flow velocity calculation for a specified pressure drop" (HDL tests). Summarizing, our method seems to be an effective solver for bound-constrained nonlinear systems. For this reason, we intend to develop a Hatlab code that will be accessible through the web. This code will have the following characteristics: — reliable testing for successful runs; — diagnostic information in case of failure; — capability of accessing whether a minimum for the merit function / was computed. Acknowledgments Work supported by MIUR, Rome, Italy, through "Cofinanziamenti Programmi di Ricerca Scientifica di Interesse Nazionale". References 1. S. Bellavia, M. Macconi and B. Morini, Appl. Numer. Math. , to appear (2002). 2. L.G. Bullard, L.T. Biegler, Computers Chem. Engng. 15, 239 (1991). 3. C.Y. Gau, M.A. Stadtherr, Computers Chem. Engng. 26, 827 (2002). 4. F. Jalali-Farahani, J.D.Seader, Computers Chem. Engng. 24, 1997 (2000). 5. C. Kanzow, Numer. Math. 26, 135 (2001). 6. D.N. Kozakevich, J.M. Martinez, S.A. Santos, Comput. Appl. Math. 16, 215 (1997). 7. Meintjes, A.P. Morgan, ACM Trans. Math. Softw. 16, 143 (1990). 8. M. Shacham, Int. J. Numer. Meth. Engng. 23, 1445 (1986). 9. M. Shacham, N. Brauner, M. B. Cutlip, Computers Chem. Engng. 26, 547 (2002). 10. M. Ulbrich, SIAM J. Optim. 1 1 , 889 (2000).

PHONON DISPERSION, STRUCTURE STABILITY, SURFACE RELAXATION, AND SURFACE ENERGY FOR METALS Ni, Cu, AND Pd Jun Cai Institute of High Performance Computing, 1 Science Park Road, Singapore 117528 [email protected] Jian-Sheng Wang Singapore-MIT Alliance and Department of Computational Science, National University of Singapore, 119260 wangjs @ cz3. nus. edu. sg A simple analytical embedded-atom potential (phys. stat. soli b203, (1997)) is used to calculate phonon dispersions, surface energies, surface relaxation, and structure stability of fee metals Cu, Ni, and Pd. The calculated surface energies and surface relaxation are found to be in reasonable agreement with experimental data. The fee structure is also predicted to be preferred to the hep and the bec structure in energitics for these metals. Finally, the phonon dispersions of Cu, Ni, and Pd are calculated and in extremely good agreement with experimental results.

1. Introduction Atom interaction potential is very important for molecular dynamics or Monte Carlo simulation in materials science. In the past ten years, a kind of empirical many-body potential, called embedded-atom potential, has been developed for fee metals[l-8]. Of these many-body potentials, some are at non-analytical form[l-3] and others are the nearest neighbor model [4-6]. A non-analytical model is inconvenient in use while the nearest neighbor model does not include a long-range force. Recently, a simple analytical many-body potential model with a long-range force has been developed and used to study the theoretical strengths and structure phase transformation of Al, Ag, Au, Cu, Ni, Pd, and Pt[7,8]. This potential has also been extended[9] to study Si surface adsorption, diffusion, and reconstruction. The simulated results are in agreement with ones from first principal calculations or experiments[10]. Although so, the potential is not used to study the structure stability, phonon dispersion, surface relaxation, and surface energy, yet. However, it is some important to study these phenomena for testing the reliability of a many-body potential. 2. Theory and method In the theoretical framework of the embedded-atom method, the total energy of a N-atom system can be written as[l]

£10(=X^(A-)+^5>,y(^)

U>

where fairy) is a two-body potential between atom i and j with the separation distance r,;, FfiPi) is the embedded energy of atom i with the electron density pt due to all its neighbors. The formulas above are at the general form of the embedded-atom method. For the different models of embedded-atom potentials 0(r), and F(p) have different forms. In Daw et al's model[l], Ft(p) and

45

46

function of host electron density. More recently Johnson and Rosato et al give an analytic nearest-neighbor model for fee metals[4,5]. Baskes' model is complex in its form[6]. Present model is an analytical potential model and include a long-range force[8]. It is used to study surface relaxation and phonon dispersion, and structure stability of Ni, Cu, and Pd. The potential parameters are obtained directly from Ref.[ 8]. Table 1. The structure stability of Cu, Ni, and Pd.

Efcc-Ebcc(eV) Efcc-EhcD(eV)

Cu 0.029 0.0009

Ni 0.042 0.0025

Pd 0.046 0.0015

3. Phonon dispersion, structure stability, surface energy, and surface relaxation The potential model is used to study the structure stability of fee metals Cu, Ni, and Pd by calculate the energy of fee, bcc, and hep (with ideal c/a ratio ) phases, where the bcc and the hep have the same magnitude of the atomic volume as fee does. The global stability is very important, it guarantee that any small perturbations do not lead to a spontaneous transformation to a different lattice structure. Therefore, the calculation of the cohesive energy of each structure can provide an important information to test the reliability of a potential. We calculate the cohesive energy per atom for the fee, the bcc and the hep. The results are listed in Table 1. From the table it is seen that in all cases, the fee structure is found to be energetically favored over the hep and the bcc structure, and the hep structure is energetically favored over the bcc structure. Thus, the present model is able to represent the structural stability quite well. Johnson's analytical nearest-neighbor model[4] and the model by Rosato et al[5] can not describe the structure stability.

[500]

[HO]

[550]

[t,m

Figure 1 The Phonon dispersion curve for Ni; solid line for the calculations and open circles for experiment[ll]. The present model is also used to calculate phonon dispersion curves for Cu, Ni, and Pd. The results for the [£00], [^0], [1^0], and [£££] directions are shown in Fig. 1. Since these curves have very similar shape for Ni, Cu, and Pd. Fig.l only shows the calculated results of Ni. The phonon dispersion curves have been also well known experimentally. For a comparison, the experimental results are also shown in Fig. 1. In the long-wavelength limit, the dispersion curves are directly related to the elastic constants, to which the functions were fit. In the sort wavelength limit, it can be seen that the curves for all of these elements have a shape very similar to that found experimentally,

47

although the actual degree of fit varies somewhat. The results are very good for Ni and Cu, good for Pd. The discrepancy of the maximum frequencies along [£00] direction between experimental and calculated values is smaller than 0.1, 0.1, and 0.7 THz for Ni, Cu, and Pd, respectively. This discrepancy given by Rosato et al are about 0.4, 0.5, 1.5 THz for these metals using the nearest neighbor many-body potential proposed by themselves[5].The present results are closer to the experimental than those of Rosato et al. Table 2. Surface energies(in the unit of ergs/cm2). The experimental data are obtainedfromRef. [5].

(100) (110) (111) Experimental (average face)

Cu 1295 1395 1219

Ni 1644 1775 1534

Pd 1134 1221 1042

1790

2380

2000

We have calculated the cohesive energies of the fee, the hep, and the bec, and the phonon dispersions for Cu, Ni, and Pd. The results are in reasonable accord with experimental data. Further, we investigate the multi-layer-relaxation of surfaces for these metals. At present, surface energy and geometry of the low-index faces, (100), (110), and (111) are calculated. The surface energy is the energy difference between the energy of a periodic slab of atoms and the energy of the same number of atoms in the bulk material. In all cases, the relaxation of each atom layer parallel to the surface is allowed to occur. This Table 3. Surface relaxation of the top-layer spacing Az's for the low-index faces(in the unit of nm%), where two numbers are given, the top number is the calculated values, and the lower number is experimental values from Ref. [5]. (100) Cu Az12

Az23

Ni

0.080

-0.020

-0.11

-0.32

-0.011

-0.0038

(110) Pd

Cu

-0.200

-0.140

0.0078

+0.17 Az34

-0.0013

-0.00035

-0.0032

Ni

(111) Pd

Cu

Ni

Pd

-0.285

-0.531

-0.039

-0.088

-0.186

-0.85

-0.87

-0.60

-0.07

0

0.013

0.038

0.070

-0.008

-0.002

0.006

+0.23

+0.30

+0.10

-0.025

-0.028

-0.037

0.0002

0.00006

-0.0033

-0.05

is performed by minimizing the total energy of the slabs. The slabs are sufficiently thick to guarantee that surface energies and geometry are independent of thickness. For a comparison, in the calculations we have not any effort to search for energetically favored

48

reconstruction. The embedded-atom potential is a good tool for the investigation of reconstruction, though. The surface energies obtained for the low-index faces of these metals are presented in Table 2, where the results from experimental data[5] are also included . It can be seen in the table that the close-packed (111) face has the lowest energy, followed by the (100) and (110) faces for all the cases. The trend is in good agreement with the experimental, although, the calculated values, compared with experimental data, are consistently too small. The corresponding calculations from other many-body potentials[2,5,7] also present very similar results to the present. Thus, this type of many-body potential model needs to be further improved for calculating surface energy correctly. The change in the interlayer spacing Az's computed for the relaxed surface geometry relative to the spacings for the truncated bulk geometry and the comparison between the theoretical results and experimental data are presented in Table 3. Note that, except for Cu (100) face, all the top-layer spacings show a small contraction. Further, the rougher (110) surfaces show a larger relaxation than the smoother (100) and (111) faces. In addition, for (110) surfaces the relaxation shows the oscillating behavior of contraction-dilationcontraction. These general features agree with the trends found experimentally[5], except for the case of Cu(100). 4. Conclusion In summary, an analytical many-body potential model with a long-range force is used to calculate the properties of surface relaxation, surface energy, structure stability, and phonon dispersions for metals Cu, Ni, Pd. It is shown that such a many-body potential model can describe correctly these physical properties of these metals, expect for the relaxed structure of Cu(100). The calculated results are in reasonable agreement with the experimental. A analytical many-body potential model is more convenient than a nonanalytical model in use. References 1 Daw M. S. and Baskes M. I., Phys. Rev. B29, 6443 (1984). 2 3

Foiles S. M., Baskes, M. I., and Daw M. S., Phys. Rev. B33, 7983(1986).

Voter A. F. and Chen S. P., Mater. Res. Soc. Symp. Proc. 82 175(1987). 4 Johnson R. A., Phys. Rev. B37, 3924(1988). 5 Rosato V., Guilope M., and Legrand B., Philos. Mag. A59, 321(1989). 6 Baskes M. I., Phys. Rev. B46, 2727(1992) 7 Cai J. and Ye Y. Y., Phys. Rev. B54, 8398(1996) 8 Cai J., Phys. Stat. Soli. B203, 345(1997) 9 Cai J., Phys. Stat. Soli. B212, 9(1999) 10 Cai J.and Wang J.-S., Phys. Rev. B64, 035402(2001), Inter. J. Modern. Phys. B 16, 621(2002); Phys. Stat. Sol. B223,773(2001) 11 Birgeneau R. J. and et al, Phys. Rev. 5A, A1359(1964).

ON THE DEVELOPMENT OF WEIGHTED TWO-BAND TARGET ENTROPY MINIMIZATION FOR THE RECONSTRUCTION OF PURE COMPONENT MASS SPECTRA HUAJUN ZHANG AND MARC GARLAND Department of Chemical and Environmental Engineering, 4 Engineering Drive 4, National University of Singapore, Singapore 117576 E-mail: [email protected] YINGZHI ZENG AND PING WU Institute of High Performance Computiong, 1 Science Park Road, #01-01, The Capricorn, Singapore Science Park II, Singapore 117528 E-mail: [email protected] A method is proposed, on the basis of a recently developed algorithm - Band Target Entropy Minimization (BTEM), to reconstruct mass spectra of pure components from mixture spectra. This new method is particular useful in dealing with spectral data with discrete features (like Mass Spectra). Compared to the original BTEM, which has been applied to differentiable spectroscopies such as FTIR, UV, RAMAN and NMR, the latest modifications were obtained through: 1. reformulating the objective function using the peak heights instead of their derivatives; 2. weighting the abstract vector V to reduce the effect of noise; and 3. using a two-peak targeting strategy (tBTEM) to deal with strongly overlapping peaks. A set of 50 multi-component mass spectra were generated from ten reference experimental pure-component spectra. Many of the compounds chosen have common MS fragments and therefore, many of the pure-component spectra have considerable intensity in the same data channels. Successful reconstruction of the ten component spectra was rapidly achieved using the new tBTEM algorithm. The advantages of the new algorithm and its implication for rapid system identification of unknown mixtures are readily apparent.

1

Introduction

In the chemical sciences, one is constantly challenged to identify pure components from mixtures and/or evolving reactive systems. Usually, one attempts to separate / purify the components for analysis, but such efforts are often time-consuming and sometimes not entirely effective. To date, many advanced combined analytical instruments, e.g. GC-MS, HPLC-MS and HPLC-NMR, have been used to perform separation and then spectroscopy. In other words, to obtain pure component spectra for the positive identification of species present. However, there are many types of situations were separation is difficult or even impossible (transient, reactive or labile species). As such, the reconstruction of pure component spectra from mixture data alone without separation and without relying on prior information is often required and has received considerable attention [1-2]. Over the past few decades, a lot of methods to resolve pure component spectra have been developed based on some sort of algebraic projection method e.g. SIMPLSMA [3-4], IPCA [5], OPA-ALS[6], but these methods frequently fail with strongly overlapping systems, and are unable to recover minor components. On the basis of an approach proposed by Sasaki et al [7], minimum entropy techniques have made considerable progress [8-10]. A new method called Band-Target Entropy Minimization (BTEM) [11,12] has been recently developed to recover observable components from an unknown mixture, including trace components that are difficult to identify by other methods. BTEM has been applied to real experimental systems with more than 10 species present and it has been successful in identifying pure component spectra from continuously differentiable FTIR and RAMAN. Mass spectral data presents new challenges. Recently, we re-examined the possibility of applying BTEM to mass spectral data to recover pure component spectra. The primary difficulties include the discrete nature of mass spectra, and the fact that fragmentation patterns can be rather similar due to the presence of many common fragments or adducts. Modifications of the original BTEM algorithm have now been performed to take into consideration the above mentioned characteristics of MS. This modified algorithm is called the tBTEM, emphasizing the two (rather than the single) peak/band targeting concept.

2

tBTEM

As mentioned before, tBTEM was developed on the basis of BTEM, an advanced spectral reconstruction algorithm. The three major modifications on the BTEM are present in this section and form the tBTEM algorithm.

49

50 2.1

Objective function for mass spectrum systems

The original entropy function usually uses Is', 2nd and 4th order derivatives but causes a problem with nondifferentiable discrete data like mass spectra. Therefore, it was suggested that the peak heights, and not their derivatives, be used to formulate the objective function for discrete data. It was also found that the use of normalized peak heights a lxv , worked just as well or better than the expression a lxv xln(ai xv ) as used in the BTEM, and required less computational time in evaluating the objective function. Specially, the following objective function was proposed, where the a and C denote the pure component spectrum and concentrations respectively. ) (1) The penalty function P is defined similarly to the BTEM algorithm. In comparison with the conventional entropy minimization, the objective function of Eq. (1) is simpler and thus requires less computational time. This modification achieves good results even for highly overlapped spectra data. 2.2

Weighted vectors VT

It is well known that the vectors VT, the right singular matrix of the Singular Value Decomposition (SVD), are ordered according to their contribution to the total variance of the observations. Therefore, the first few vectors are associated with real chemically important signals in the system and the rest are associated primarily with the random instrumental and experimental noise. It is noted that in the original BTEM algorithm, the choice of the number of the V vectors is not achieved by statistical tests, since nonlinearities are present. Instead, a very large number of vectors are chosen, more than the number of species anticipated in the system. If far too many vectors are chosen, and hence far too much noise is included, problems can be encountered during the optimization procedure. In the present work, we propose to multiply the entire V matrix with a set of weights, namely the diagonal matrix S that is readily available from SVD. By scaling the right singular vectors VT, the significance of the vectors associated with the real signals is increased and the effect of noise is reduced. Accordingly, it is less likely that optimization method becomes trapped in a local minimum. So it is less sensitive to the choice of the number of V T vectors. In particular, the estimated pure component spectrum is represented as below where T is the rotation vectors to be solved.

a-^T^xCS^Vl,) 2.3

(2)

Two-peak/band Targeting Entropy Minimization (tBTEM)

It is noted that in the BTEM algorithm, the estimated spectrum is normalized by the chosen band, which is often the highest peak of the spectrum. Since the targeted peak is used to retrieve the entire spectrum, the use of a higher peak would be advantageous due to the larger signal to noise ratio and the result is often less affected by the noise present. BTEM is usually successful in estimating pure spectra with the highest peaks. However, if strongly overlapping spectra are present at the highest peak, only the spectrum with smallest objective function value can be recovered while targeting at this peak. To deal with this challenge, a two-peak (band) targeting entropy minimization method (tBTEM) is proposed as below where a' and a' are the two targeted (usually the highest) peaks within the estimated spectrum. Permutations for different pairs of highest peaks provide a way to perform an exhaustive search for pure components spectra and makes it possible to recover most of the pure component spectra. In comparison with the BTEM algorithm, tBTEM is also less sensitive to the choice of the numbers of the VT vectors. a = a es '/[max(a') + max(a")]

3

(3)

Simulation Result

The performance of the tBTEM algorithm was examined with a set of simulated data. The simulated system was comprised of ten real organic components, namely ethanol, hexane, toluene, acetone, acetonitrile, cyclohexene, (CH3)2CHOH, acetic acid, CH2C12 and CH3CH2COCH3. Pure experimental / reference spectra of the ten components were obtained from GC-MS with the EI ionization method and a He mobile phase. Each spectrum spans M/e = 10 - 100 with data intervals of 1 M/e. This results in 91 channels. Fifty mixtures of the ten components were simulated with randomly generated non-negative concentrations for each mixture (arbitrary units, range from 0 to 100). Accordingly, a concentration matrix for the 50 mixtures

51 Csoxio was obtained. In order to make a meaningful simulation, all pure component spectra were first scaled to a maximum abundance of 10 (arbitrary unit). The mass spectra of the mixtures were simulated with Eq. (4) where P denotes the pure spectra of the ten components and e the noise matrix that was randomly generated with the level of 0 to 104. The low level noise is not so important for the conclusions as the goal of the present study is to examine the intrinsic capability of the new algorithm. It is noted that the simulated mixture data set is very complex with many strongly overlapping peaks, as many of the components have common functional group such as CH3-, CH3CH2-, CH3CO- etc and thus decompose into common fragments such as CH3 (M/e 15), CH3CH2 (M/e 29) and CH3CO' (M/e 43). ^ 5 0 x 9 0 ~~ ^ 5 0 x 1 0 * *10x91 """ ^50x91

W

In the following computation, all the objective function evaluations took the modified form of Eq. (1). The simulation and the spectral reconstruction by tBTEM were developed with the commercial mathematical software MATLAB 5.3 and run on a Toshiba laptop with Intel PHI 800MHz and 384M Ram. The computation times needed for every pure spectrum reconstruction varied from 3 minutes to 30 minutes depending on the number of V T used for targeting. The optimization algorithm employed in the tBTEM is Corana's Simulated Annealing [13]. 3.1

Comparison of the weighted and un-weighted VT vectors

In the present section, we compared the results of the estimated pure component spectra by using weighted and un-weighted V vectors. Fifty V vectors were used for both cases. It is found that the resultant spectra from the un-weighted VT are in general not acceptable. For example, the estimated cyclohexane spectrum (Fig. lc) does not resemble the real one (Fig. la). In contrast, the result was improved greatly when the 50 VT vectors were weighted by the diagonal matrix of SVD (Fig. lb). This test indicates that weighted VT vectors make the algorithm less sensitive to the choice of the number of V T vectors used. The weighting is very helpful in spectral reconstruction because the exact number (even a rough approximation) of components in a system is often unknown in advance. The weighting / scaling of VT clearly improves the results due to a better condition number. It should be further noted that if the weighting is not performed, then a rather accurate estimate of the number of species has to be known a priori. Otherwise, considerable optimization difficulties arise.

J ^i

ill J1L .ill J. ^ I

III.

, .In T

la: real spectrum. lb: by using 50 weighted V . Figure 1: The real and estimated cyclohexane spectra

3.2

Ilillii

lc: by using 50 un-weighted VT

Comparison of tBTEM and BTEM

Using the BTEM algorithm, a highest peak at some (M/e) is usually chosen for targeting and thus retrieving the entire spectrum. Under some circumstances, difficulties can be encountered when reconstructing pure spectra using its highest peak. Therefore, the choice of targeted peaks is a crucial but not a trivial task. Difficulties are encountered with BTEM if the highest peaks of two or more spectra are all located at the same channel (M/e). This arises since both the global and some of the local minima correspond to real pure component spectra. In BTEM, only the global minimum is kept and the local minima are discarded. As a result, only one spectrum will be recovered while the others will not be found. For instance in the present system, CH3CH2COCH3, CH3COCH3 (Acetone) and CH3COOH (Acetic Acid) have their respective highest peaks locating at M/e 43 due to the presence of the fragment of CH3CO . If the peak at M/e 43 was chosen, the resultant minimum objective functions were 2.9028, 3.5342 and 3.5388 for the three components respectively. The algorithm only retained the smallest, i.e. 2.9028 which led to the reconstruction of CH3CH2COCH3 spectrum, whereas the other two values corresponding to the solutions of CH3COCH3 and CH3COOH were discarded during the iteration of the algorithm. The tBTEM algorithm has the intrinsic ability to deal with this difficulty. By picking two big peaks (usually the highest two peaks) of a spectrum, tBTEM obtained the minimum objective function corresponding to the solution of one of the spectra. Then in another run, another pair of big peaks was used and this resulted in a new

52 solution leading to another spectrum. For instance, using the two peaks of M/e 43 and 58 resulted in the minimum objective function 2.0306 which led to the reconstruction of spectrum of CH3COCH3. Similarly, the choice of the two big peaks at M/e 43 and 60 resulted in the minimum objective function 2.0323 and thus the CH3COOH spectrum was recovered. Most importantly, in comparison with BTEM algorithm, tBTEM is less sensitive to the number of V T vectors used. For example, when an arbitrary number of 35 V vectors were used, the resultant spectra of hexane obtained from the BTEM (targeted at M/e 57) and tBTEM (targeted at M/e 57 and 86) are shown on the Fig 2c and the Fig 2b respectively. Clearly, the resulting spectrum from tBTEM resembles the real one (Fig 2a) while a higher level of noise is present in the spectrum from BTEM. The inner production of real and the estimated spectra from tBTEM and BTEM were 0.99366 and 0.95635 respectively, which again shows that tBTEM outperforms the single targeted peak algorithm. The improvement of the performance of tBTEM is probably due to the use of more spectral information, i.e. two peaks instead of only a single peak. The enriched spectral information facilitates the reconstruction of pure component spectra from highly overlapped mixture spectra.

....,

I jii J 1

1

,

,1 JI .J

.,1

.iiiiii.jli.ii, J Li.J niii.i. ii,

2a: Real spectrum; 2b: targeted at M/e 57 and M/e 86 2c: targeted at M/e 57 only Figure 2: Estimated hexane spectra using one-peak or two-peaks for targeting (35 VT used).

4

Conclusion and Discussion

In this paper, we proposed modifications on the advanced spectral reconstruction algorithm - BTEM. These modifications extend the utility of the algorithm to discrete spectroscopic data like mass spectra and improve the performance of the algorithm. By reformulating the objective function with the peak heights instead of their derivatives, the algorithm is able to reconstruct pure component spectra from mixture mass spectra. Furthermore, the algorithm is computationally efficient as fewer mathematic operations are needed for evaluation of the objective function (summation of abundance versus logarithmic expression). Weighting of the abstract VT vectors reduces the adverse effect of noise and the sensitivity to the number of V vectors chosen. As a result, the algorithm is more robust. A significant improvement in the reconstruction of highly overlapped spectra has been achieved by using the tBTEM algorithm where two peaks are used for targeting. This latter modification also makes the algorithm less sensitive to the number of used V vectors. The performance of the tBTEM algorithm was examined with a set of simulated spectra and has demonstrated its advantages in the spectral reconstruction. The successful use of the algorithm on real data has been completed and will be reported in a forthcoming paper.

References 1. 2. 3. 4. 5. 6. 7. 8. 9.

Lawton W. H. and Sylvestre E. A., Interactive self-modeling mixture analysis, Technometrics 13 (1971) pp. 617-633. Sylvestre E. A., Lawton W. H. and Maggio M. S, Curve resolution using a postulated chemical reaction, Technometrics 16 (1974) pp. 353-368. Windig W. and Guilment J., Interactive self-modeling mixture analysis, Anal. Chem. 63, (1991), pp. 1425-1432. Windig W., Spectral data files for self-modeling curve resolution with examples using the Simplisma approach, Chemometr. Intel!. Lab. Syst. 36, (1997), pp. 3-16. Bu D. S. and Brown C. W., Self-modeling mixture analysis by interactive principal component analysis, Appl. Spectrosc. 54, (2000), pp. 1214-1221 Sanchez F. C; Toft, J., Van den Bogaert B. and Massart D. L., Orthogonal projection approach applied to peak purity assessment, Anal. Chem. 68, (1996), pp. 79-85. Sasaki K., Kawata S., and Minami S., Constrained nonlinear method for estimating component spectra from multicomponent mixtures, Appl. Optics 22 (1983) pp. 3599-3603. Zeng Y. and Garland M., An improved algorithm for estimating pure component spectra in exploratory chemometric studies based on entropy minimization, Anal. Chim. Acta 359 (1998) pp. 303-310. Pan Y., Susithra L. and Garland M., Pure component reconstructions using entropy minimizations and varianceweighted piecewise-continuous spectral regions. Application to the unstable experimental system Co-2(CO)(8)/Co4(CO)(12), J. Chemometr. 14, (2000) pp. 63-77.

53 10. Widjaja E. and Garland M., Pure component spectral reconstruction from mixture data using SVD, global entropy minimization, and simulated annealing. Numerical investigations of admissible objective functions using a synthetic 7species data set, J. Comput. Chem. 23, (2002) pp. 911-919 11. Chew W., Widjaja E. and Garland M., Band-target entropy minimization (BTEM): An advanced method for recovering unknown pure component spectra, application to the FTIR spectra of unstable organometallic mixtures, Organometallics 21, (2002), pp. 1982-1990 12. Widjaja E., Li C. and Garland M., Semi-Batch Homogeneous Catalytic In-Situ Spectroscopic Data. FTIR Spectral Reconstructions Using Band-Target Entropy Minimization (BTEM) without Spectral Preconditioning, Organometallics 21, (2002), pp. 1991-1997 13. Corana A., Marchesi M., Martini C. et al, Minimizing multimodal functions of continuous variable with the 'simulated Annealing' algorithm, ACM Transactions on mathematical software, 13, (1987), pp. 262-280

COMPUTER AIDED ADDITIVES STUDY FOR STEEL HOT DIP GALVANIZING H.M.JIN, H.L.LIU, P.WU* Institute of High Performance

Computing, 1 Science Park Road, #01-01, The Singapore E-maihwuping @ ihpc.a-star. edu. sg

Capricorn,

This paper presents a fundamental study of additive effects on steel hot dip galvanizing. A combined approach of correlation method and quantum mechanical calculations was used to investigate the nature of additive behaviour. The general regularity, which reflects the distinct macroscopic behaviour of additives in galvanizing, was obtained through the study of the correlation among element properties, crystal structures and bulk additive properties. Two fundamental element properties, melting point (Tm) and pseudopotential atomic radii (Rs+P), were recognised as key factors to determine the bulk behaviour of additives. Based on the correlation study results, density functional theory (DFT) was attempted to further explore the mechanism of additive behaviour. A cluster model was employed in the simulation. The geometry configuration, bonding energy and charge distribution for different clusters were obtained, and the underlying mechanism of additives on zinc coatings was proposed. This study may provide an interesting approach for other materials design.

1

Introduction

When steel hot-dip galvanizing in a conventional non-alloyed zinc bath, serious problems will appear. The resulting zinc coating is too thick and brittle in addition to the dull and grayish appearance. This is due to the rapid reaction between Fe and Zn. The problem is even worse when steels contain more than 0.02 wt% of silicon: thick, brittle and uneven coating is produced, poor adhesion and marbled surface is appeared. Since the beginning of 1950s, the importance of additives in zinc bath was noticed [12]. Several additives and their alloying process were proposed. For example, it is known to control the reactivity of iron and zinc by adding aluminum into the zinc bath. However, the industrial application of aluminum additions has been limited because of the rather complicated surface preparation problem. In addition, for certain types of steels, rapid reaction between Fe and Zn are still observed even aluminum is added. The objective of this study is to obtain a scientific understanding of the mechanism of additives effect. Through a complete literature search, it was found that a clear understanding of the additive effect is still lacking at the moment. The large accumulated experimental data with unresolved theoretical problems encourage us to investigate the problem by a combined approach of correlation and quantum mechanical calculations.

2

Methodology

2.1 Correlation approach The basic idea of the correlation approach is to explore and establish easy-to-apply rules from analyzing large number of known candidates, and to extend these rules to unknown system for property and structure prediction. To apply the correlation approach on additive study, the first step was to collect all the additives information. A total number of

54

55

27 additives were found from literature [3-5]. Since the additives are added into zinc bath before the steel is dipped into, additive-zinc compounds may form. The formation of additive-zinc compounds may play an important role on the coating formation. Therefore, the correlation study mainly focused on the additive-zinc compounds. A total number of 76 binary additive-zinc compounds were reported [6]. These additive-zinc compounds were classified into two groups according to their corresponding additives behavior [7]: group 1 if their corresponding additive decreases the coating thickness, otherwise group 2. The correlation between element properties and additives effects on coatings were investigated through analyzing the patterns of 76 binary additive-zinc compounds. The inhouse-built artificial intelligence software was applied to establish the correlation. It was expected that the established correlation can provide basic information for further detailed DFT calculations.

2.2

Density functional theory calculations

Based on the correlation study, first principles calculations were performed on a few selected additives. A cluster model was employed to do the calculation. In previous research [8], it was found that most additive-zinc compounds have a common type of 12-14 coordinations. Therefore, to build the cluster model, the ratio of zinc to additive element was defined as 12:1. The geometric configuration of the cluster model was shown in Fig.l. The starting neighbor interatomic distance of the cubooctahedron was the sum of metallic radii of two atoms. The software Faststructure of MSI [9] was applied to complete the calculations. To obtain more accurate results, the highest accurate mesh item, FINE, in the software options is used for electronic integration. In addition, the electronic charge density is simultaneously optimized by using BFGS arithmetic with adequate convergence (force gradient <2xl0"3, charge gradient <4xl0"3 in atomic unit). 3

Results and discussion

3.1 Correlation study Table 1 lists all the possible additive-zinc compounds obtained from literature. A property number of PI or P2 is assigned to each compound according to its corresponding additive effect. For example, PI is assigned to a compound if its corresponding additive can reduce the coating thickness. Otherwise, P2 is assigned. By using two basic element properties [10] (melting point Tm and pseudopotential atomic radii Rs+P), a property map is established and displayed on Fig. 2. It can be seen that all class pi compounds are located within a downward-triangle zone, namely Z\. Most of the class p2 compounds are in a tetragon zone, namely 2%. Therefore class pi and p2 compounds are well separated in Fig.2, which leads to a simple correlation between galvanizing effects and constituent chemical element properties, melting point Tm and pseudopotential atomic radii Rs+P. Since the slope of a straight line through the origin has the same unit of a force, this suggests that diffusion may play a key role in both the galvanizing process and the formation of binary Zn-compounds.

56 Table 1 Additive-zinc compounds and the property number (P) Comp. Ag5Zn8 Ag5Zn95 AgZn Ce13Zn58 Ce 2 Zn ]7 Ce 3 Znn Ce3Zn22 CeZn CeZn n CeZn 3 CeZn5 CeZn2 CoZnu CrZn13 Cu2Zn98 CusZng Cu70Zn3o CuZn CuZn3

P p2 p2 p2 p2 p2 p2 p2 p2 P2 p2 p2 p2 p2 pl Pi Pl Pl pl pl

Comp. Nd 13 Zn 58 Nd 2 Zn 17 Nd 3 Zn n Nd 3 Zn 22 NdZn„ NdZn 2 NdZn NdZn 3 Ni3Zni4 Ni 70 Zn 30 NiZn 3 NiZn NiZn8 Pr13Zn58 Pr2Zn17 Pr 3 Zn„ Pr3Zn22 PrZn PrZn n

P p2 p2 p2 p2 p2 p2 p2 p2 pl Pl Pl Pl Pl p2 p2 p2 p2 p2 p2

Comp. Fe H Zn 3 9 Fe 3 Zni 0 Fe68Zn32 FeZn ]3 La 2 Zn ]7 La3Zn22 LaZni 3 LaZn5 LaZn Mg 2 Zn n Mg51Zn20 Mg 97 Zn 3 Mg4Zn7 MgZn2 Mn60Zn40 MnZn MnZni 3 MnZn 3 MoZn7

P Pl Pl Pl Pl p2 p2 p2 Pl Pl P2 p2 p2 p2 p2 p2 Pl Pl Pl Pl

Comp. PrZn 3 PrZn 5 Sni 7 Zn 3 Th2Zn ThZn 2 ThZn4 Th 2 Zn n ThZn9 TiZnl6 TiZn2 U2Zn17 TiZn 3 UZn 12 V 4 Zn 5 VZn 3 ZnZr2 ZrZn Zn2Zr Zn22Zr

Table 2 Bonding energy of M-Zn 12 clusters and the net charge on Zn AE/Kcal-mor 1 AE / Kcal-mol"' Charge of Zn 0.02974 -53.9655 MgZn12 NiZn12 -133.659 -84.7133 AgZn12 0.02741 -119.854 TiZn ]2 -60.8681 0.02588 VZn12 -106.048 SnZn12 * FeZn12: Bonding energy: -85.968 Kcal-mol , charge of Zn 0.02838

P P2 P2 P2 P2 P2 P2 P2 P2 Pl Pl P2 Pl P2 Pl Pl Pl Pl Pl Pl

Charge of Zn 0.00158 0.00089 0.00628

800 700 600 500 400 300 200 100 0 100

. .

-0.2

t

0.2

0.4

0.6

0

Pesudopotential atomic radius D

Fig.l Geometry configuration of the cluster model MZn]2 (M represents Ni, V, Ti, Fe, Sn, Ag, Mg and Zn, respectively)

Compounds of Pl * Compounds of P2

Figure 2. The correlation between Rs+P, Tm and two types of additive-zinc compound

57

3.2 Density functional theory (DFT) calculations To further understand the nature of additive behavior, electronic structure calculations are carried out. Additive Ni, Ti, V and Ag, Mg and Sn, which represent the group 1 and group 2 additives, are investigated with the cluster model. Geometry optimization of the cluster model is performed with DFT method. Table 2 shows the binding energy and charge transfer results. From Table 2 it can be seen that the effective additive has much stronger bonding energy than the non-effective additives. It is also stronger than that of Fe-Zn cluster. In addition, the electronic charges shift more significantly from the zinc atoms to the effective additives. Based on the above calculated results, the mechanism of additive effects is proposed as following: due to the stronger bonding between additive and zinc atoms, the diffusion process of zinc atom to form Fe-Zn alloys might be delayed so that the interfacial thickness of Fe-Zn alloys will decrease. In addition, since more electronic charges are shifted to effective additives, an "anti-iron wall" might be formed around zinc atoms, resulting in the slower formation of Fe-Zn alloys.

4

Conclusions

A combined approach of correlation and quantum mechanical calculations was employed to explore the nature of additive behavior in galvanizing. A property map was established through correlation study. Based on the correlation result, DFT calculations were further performed to study the underlying mechanism. An "anti-iron wall" mechanism was proposed to explain the nature of additives behavior.

References 1. Mackowiak, J. and Short, N.R. (1979), Int. Mat. Rev. 237 (1979):1. 2. Bablik, H. (1950), in Hot dip galvanizing, 3 rd edition, Spon, London 3. Sebisty, J.J. and Palmer, R.H. (1964), in Edited proceedings 7th international conference on hot dip galvanizing, Paris, p235. 4. Chen, Z.W., Kennon, K.F., See, J.B. and Barter, M.A. (1992), JOM, January, 22. 5. Horstmann, D. (1961), Edited proceedings 6th International conference on hot dip galvanizing, Edited by the zinc development association, 1961, p319 6. Villars, P. and Calvert, L. D. Pearson's handbook of crystallographic data for intermetallic phases, Imprint Metals Park, Oh: American Society for Metals, 1985 7. Jin H.M., Li, Y. and Wu, P. (1999). J. Mater. Res., 5,1791. 8. P. Wu.; H.M. Jin and Y. Li, Chem. Mater., 1999, vol.11, 3166. 9. MSI Cerius2, Molecular Simulations Inc.: San Diego, 1997. 10. Villars, P. (1994), In Intermetallic Compounds Principles and Practice, Volume 1Principles; John Wiley & Sons: New York, p227.

CATION-7C INTERACTIONS IN AG(I)-SUBSTITUTED NAPHTHALENE COMPLEXES : AN AB INITIO MOLECULAR ORBITAL STUDY H. M. LEE AND C. W. TSANG* Department of Applied Biology and Chemical Technology, Hong Kong Polytechnic Hung Horn, Hong Kong E-mail: [email protected]

University,

(*contactperson)

Ab initio calculations have been earned out to study the binding site(s)/geometries and binding energies/affinities (enthalpy of binding) of Ag(I)-substituted naphthalene complexes, Ag+-Xnaphthalene (where X= H, Me, Et, Pr, OH, OMe, OEt, F CI, Br and I). The geometries were optimizeod at the MP2/3-21G(d) level and the binding affinities were calculated at the CCSD(T)/631+G(d) level. Our results show that Ag+ preferentially binds to the aromatic ring, and binding to alkoxy oxygen and halogen sites are not as energetically favorable. The Ag(I) affinity at OK of naphthalene (173 kJ mol 1 ) is -17 kJ mol"1 greater than that of benzene (157 kJ mol"1), indicating that both the quadrupole moment and polarizability of fused-benzene ring systems could greatly enhance the cation-rt binding energy. For electron-donating substituents X=H, Me, Et, Pr, OMe and OEt, the cation-7t binding affinities are found to increase and linearly correlate with the Taft's substituent polarizability parameter. However, for X=F, CI, Br and I, the Ag(I) -n binding affinities decrease to - 160 kJ mol"1, indicating that the electronegativity (electron withdrawing effect) of the halogen substituents exerts a pronounced negative (decreasing) effect on the cation-Jt interaction. Hence, by changing the substituents, the strength cation-Jt interactions can be manipulated and usefully utilized to the design new functional materials.

Introduction Cation-Jt interactions have been postulated to be a new kind of stabilizing (attractive) force important in molecular recognition and material design.1 Cation-TC interaction has also been exploited in the design of therapeutic agents against various aliments like myasthenia gravis, glaucoma, and Alzheimer's disease, and the design of aromatics 1 ^7

00

"^

collarenes for the separation of radioactive Cs and Sr from nuclear wastes. In this study, we have chosen the interaction of the Ag (I) cation with substituted naphthalenes as model systems to investigate (i) the characteristics features of cation-7t binding energies in fused-benzene systems , and (ii) the effect of substituents on competitive binding between cation-7t and non-7t binding sites and their binding energies/affinities. Theoretical Methodology Standard ab initio molecular orbital calculations have been carried out using the GAUSSIAN 98 package on SGI Indigo 2, Compaq XP900 and Intel Pentium 4 workstations. Geometries of the ligand (L) and Ag+-L complexes were first obtained at the HF level of theory with 3-21G(d) basis, and then the [HW, 3-21G(d)] basis. For heavy atoms such as silver, relativistic effects may not be negligible. Hence, apart from using the standard all electron 3-21G(d) basis for silver, we also employed the relativistic corrected effective core potential (RECP) of Hay and Wadt (denoted as HW)4. These optimized geometries

58

59

were then subjected to the frequency calculations at the same level of theory for the confirmation of minima. For these stable conformers, further geometry optimizations were carried out at the electron correlated MP2 level. With these optimized MP2 geometries, single-point calculations were performed at the CCSD(T)/[HW(f), 6-31+G(d)] level of theory. An extra set of f-polarization functions was added to the HW RECP (denoted as HW(f)) to increase the flexibility of the silver valence basis.5 The addition of the f-functions has been found to be important for the proper theoretical description of organometallic systems.5 However, exact calculation at the CCSD(T)/[HW(f), 6-31+G(d)] level is computationally demanding. As a result, an alternative for this calculation is employed, which involves the application of the additivity relation show below: CCSD(T)/[HW(f), 6-31 +G(d)] =

CCSD(T)/STO-3G(d) + [MP2/(HW(f), 6-31 +G(d)) MP2/STO-3G(d)]

[1]

The theoretical binding affinities (AH0) of Ag+ ion binding to ligands (L) can be defined as: AH0 = [(EAg+ + EL) - EAg+.L] + [ZPEL - ZPEAg+.L] x 0.9 [2] where EAg+, EL and EAg+.L are the electronic energies of the silver cation, the ligand and the Ag+-ligand complex, respectively, calculated at CCSD(T)/[HW(f), 6-31+G(d)] level by the additivity relation. ZPE is the zero point energies of the various species calculated at the HF/3-21G(d) level, scaled by a factor of 0.9.

Results and Discussion Figure 1 shows the different types of Ag+ binding modes of Ag(I)-substituted naphthalene complexes. For Ag(I)-alkyl substituted naphthalene complexes, only one stable cation-Ji binding mode is found. For alkoxy and halogen substituted naphthalenes, both cation-7t and non-7t binding modes are found.

rtr2.263 6-2.0°

(a)Ag+-7i Figure 1

(b)Ag + -0

(c)Ag+-Halogen

Different binding modes of Ag(I)-substituted naphthalene complexes, (a) Ag+-aromatic n binding mode of Ag(I)-naphthalene (b) Ag+-0 binding mode of Ag(I)-hydroxynaphthalene (c) Ag+-halogen binding mode of Ag(I)-fluoronaphthalene

60 Table 1

, . Substituent w

Energetics of Ag(I)-substituted naphthalene complexes (in kJmol"1) calculated at CCSD(T)/[HW(f), 6-31+G(d)] level of theory using the additivity relation Eqn [1]. Cation-7t interaction (Ag+-Jt) — — — AH 2 98

AG 2 98 C

Ag+-0/Ag+-halogen interactions AH 2 9 8

AG298

Alkyl -H

175.0

149.5

-CH3

185.3

159.8

-C2H5

189.9

164.4

-n-C3H7

193.2

167.3

-OH

176.8

150.6

141.9

114.5

-OCH3

187.2

161.0

163.6

137.3

-OC2H5

192.6

165.9

171.0

143.8

-F

160.4

135.7

100.4

78.2

-CI

162.7

137.9

-Br

162.2

137.7

151.4

124.7

Alkoxy

Halogen

142.7 -I 167.3 168.8 '"' Substituents are at C5-position of naphthalene "" Temperature correction calculated at HF/3-21G(d) level of theory lc> AG298 = AH298 - T298 AS, entropy change is calculated at the HF/3-21G(d) level of theory

143.0

Table 1 shows the affinities (enthalpies, AH29g) and free energy of binding at 298K (AG298) of Ag(I)-substituted naphthalene complexes for cation-rc and non-71 binding modes. With the exception of the Ag(I)-iodonaphthalene complex, the binding affinities of the cation-Jt binding mode are generally greater than the non-7t binding mode by about 10-30 kJ mol' 1 . This shows cation-7c interaction is indeed a relatively strong non-covalent binding force competing effectively with Ag + binding to oxygen and halogen sites. Electron-donating alkyl and alkoxy substituents could enhance the Ag + affinities of substituted naphthalenes significantly by about 20 kJ mol"1 relative to naphthalene. Conversely, electronegative halogen substituents decrease the Ag+-7t binding affinity by 8-15 kJ mol"1 so that in 5-iodo-naphthalene, Ag + binding at the non-7t iodine site becomes slightly favoured over the cation-rc binding mode. The electron-donating/withdrawing substituents affect the aromatic-rc electron density, which is related to the electrostatic quadrupole moment of the naphthalenes, leading to increasing/decreasing trend of Ag(I) affinities found in substituted naphthalenes. Figure 2(a) and Figure 2(b) show that the Ag(I) binding affinities increase with the molecular polarizability and Taft's substituent polarizability parameters, respectively. Thus, enhanced ion-induced dipole interaction is another important contributing factor accounting for the increase in Ag(I) affinity of alkyl-substituted naphthalenes, and possibly the alkoxy-substituted naphthalenes.

61 (b)

(a)

?

190-

j S \

propylnaphthalene

/ethyl-

1

& 185*

%S S .' e »

mathylnaphthalene

S

naphthalene

/

| S * J

ISO-

It

5

1 IS

175-

O.

<

"naphthalene 170Molecular Polaratability of alkylnaphthalenes

Figure 2

Plots of binding affinities (in kJmol" ) versus (a) theoretically calculated molecular polarizability of alkylnaphthalenes at MP2/3-21G(d) level of theory, and (b) Taft's alkyl-substituent polarizability parameters (c a ).

Conclusion The theoretical Ag(I) affinity due to cation-Tt interaction for naphthalene is 175 kJ mol"1. With the exception of 5-iodonaphthalene, cation-7t interaction is found to be the most stable interaction for Ag(I)-alkyl/alkoxy/halogen substituted naphthalene complexes. The Ag(I)-naphthalene affinity is increased (up to 20 kJ mol"1) by electron-donating alkyl/alkoxy substituents and decreased (up to 15 kJ mol"1) by electron-withdrawing halogen substituents. The substituent effect is attributed to enhanced or reduced electrostatic quadrupole and ion-induced dipole moment of the fused benzene system of naphthalene. References 1. J.C.Ma andD.A. Dongherty, Chem. Rev. 97, 1303 (1997). 2. P. Krogsgard-Larsen, in A Textbook of Drug Design and Development, P. KrogsgardLarsen & H. Bundgaard eds., Harwood, Chur, Switzerland, pp. 419-428, (1991). 3. H.S. Choi, S.B. Suh, S.J. Cho and K.S. Kim, Proc. Natl. Acad. Sci (USA), 95, 12094 (1998). 4. P.J. Hay and W.R. Wadt, Journal of Chem. Phys. 82, 299 (1985). 5. A.W. Ehler, M. Bohme, S. Dapprich, A. Gobbi, A. Hollwarth, V.Jonas, K.F. Kohler, R. Stegmann, A. Veldkamp and G. Frenking, Chem. Phys. Letter208, 111 (1993) Acknowledgement The award of a Hong Kong Polytechnic University graduate studentship to THML, and the funding support of the Research Grant Council of Hong Kong (Area of Excellence Project No. P-10/2001 and CERG Project No. 5190/00P) are gratefully acknowledged.

BAND-TARGET ENTROPY MINIMIZATION (BTEM) - A NOVEL APPROACH FOR PATTERN RECOGNITION IN CHEMICAL SPECTROSCOPY EFFENDI WIDJAJA AND MARC GARLAND Department of Chemical and Environmental Engineering, 4 Engineering National University of Singapore, Singapore 117576 E-mail: [email protected]

Drive 4,

One important inverse problem in chemical spectroscopy involves the reconstruction of pure component spectra. Unfortunately, a general and robust approach has until now not been available. A novel approach namely BTEM is introduced. It is an advanced and model-free deconvolution technique based on a Shannon information entropy criterion, and it works without any aid of spectral libraries and/or a priori information. Due to its general characteristics, BTEM has been successfully applied to resolution of various two-dimensional spectroscopic data, such as FTIR, Raman, NMR, and MS.

1

Introduction

Inverse problems in science and engineering have become an increasingly important research area in this informatics era. In such problems, data is measured and collected, and meaningful information contained therein is extracted. Often, for very important inverse problems, some sort of fundamental question underlying the generic problem is answered. Examples of inverse problems which were eventually solved (after a long intense effort) and subsequently gave rise to new sub-disciplines can be found. Well- known examples are [i] x-ray diffraction data inversion and the determination of 3-dimensional chemical structures [ii] the inversion of seismic data and oil exploration and [iii] the inversion of magnetic resonance data and resulting full-human-body CAT scans. In the pharmaceutical and fine chemical context, an unsolved inverse problem can be identified in the analysis of spectroscopic measurements made during chemical syntheses. The main aim is to reveal the unique spectral pattern of each species from the mixture spectroscopic data alone (containing many new and unknown compounds). Such a method would be a new tool for system identification. In the past three decades, self-modelling curve resolution (SMCR) techniques has been developed to obtain the pure component spectra without using any libraries and/or any a priori information. In 1971, Lawton and Sylvestre [1] first introduced this new term SMCR when they resolved a two-component mixture system measured using UV/Vis spectroscopy. However, it was only applicable for a two-component system. Later, this method was extended by Ohta (1973) [2] to a three-component system using a Monte Carlo method. In the next phase, Kawata, Sasaki and coworkers (1984) [3] used these non-negativity constraints plus a minimum information entropy concept (Shannon, 1948) [4] to obtain rough but reasonably correct pure component spectral estimates. However, their investigation was limited to two-or three-component systems. Other similarly spirited methods were being developed at about the same time, as can be seen in the work of Neal (1990) [5] for resolving pure component spectra of fluorescence mixture data; and Brown and Harper (1993) [6] for resolving mass spectral data. In addition, such a non-linear constrained approach can also be seen in the spectral resolutions performed by Volkov (1995) [7]. Briefly, all these above-mentioned methods can be grouped into non-linear constrained optimization techniques.

62

63

Although many methods have been proposed and implemented, some serious problems of spectral resolution still remain unsolved/open. In particular, problems rise when the above-mentioned methods are used to recover minor components having weak signals and when components have a high degree of spectral overlap and spectral non-linearities. Therefore, a significantly new general approach to overcome these shortcomings is certainly needed especially when a very complex reaction system is investigated. In order to develop this general approach, in past few years, our research group re-examined the use of information entropy criterion for spectral reconstruction. Zeng and Garland (1998) [8] successfully revised Sasaki's model and applied fourth order spectral derivatives as part of the entropy measure to resolve a highly overlapping two-component mixture case. Later, Pan et al (2000) [9] employed piecewise-continuous variance-weighted spectral measurements coupled with entropy minimization to resolve the problem of spectral windows having significantly different variance. However, Zeng and Pan could only resolve two or three component mixture systems. In the next phase, Widjaja and Garland (2002) [10] used a combination approach of entropy minimization, simulated annealing, and spectral dissimilarity to resolve large-scale pure component spectra having a high degree of spectral overlap. Although good first estimates could be obtained, some spectral over-resolutions were clearly seen. This finding suggested that reconstruction one-spectrum-at-a-time would be necessary to overcome the above-mentioned problems. These previous studies have led to the development of the novel Band-Target Entropy Minimization (BTEM) algorithm. BTEM has been successfully applied to various simple and complex reactive mixtures measured using in-situ FTIR spectroscopy [11, 12]. However, since BTEM was developed as a general pattern-recognition tool, its applicability is wide. BTEM is not limited to only FTIR spectroscopy, but it can be used also for other chemical spectroscopic data such as Raman, NMR, and MS. Its general applicability is presented here. 2

Band-Target Entropy Minimization (BTEM)

The BTEM approach starts with the data decomposition of a set of spectroscopic mixture data using singular value decomposition. This is followed by the transformation of the basis vectors into individual pure component spectrum one at a time. The transformation is based in part on some seminal ideas borrowed from information theory and requires massive numerical computation to obtain an optimal solution. In the method, one targets a spectral feature to retain, and then combinations of the basis vectors are searched to achieve the global minimum value of an appropriate objective function. The proposed objective function is based on Shannon's information entropy criterion and various physically meaningful constraints are imposed to ensure non-negativity of the pure component spectral estimate and its corresponding concentrations. For detailed descriptions of the BTEM algorithm, readers can refer to references [11] and [12].

3

Experimental Section

The performance of the BTEM algorithm was examined using four sets of various spectroscopic data: [i] three experimental runs of semi-batch homogeneous rhodium-catalyzed alkene hydroformylation measured by in-situ FTIR spectroscopy; [ii]

64

environmental lead compound mixture data measured by FT-Raman spectroscopy; [iii] PGSE-NMR mixture spectra; and [iv] MS of three-component mixture data. The first two data sets were obtained from our own laboratory, whereas the last two data sets were obtained from internet-downloadable files, and previously were analyzed by Windig et al [13,14]. 4

Results and Discussions

FTIR spectroscopy data BTEM was applied to recover the pure component spectra of observable species present in the homogeneous catalytic hydroformylation of 3,3-dimethylbut-l-ene using Rh4(CT-CO)9(u-CO)3 as catalyst precursor and n-hexane as solvent. Three semi-batch experimental runs were performed and measured using in-situ FTIR spectroscopy. These yield 250 reaction mixture spectra. Without any sort of spectral pre-processing, these mixture spectra were subjected to singular value decomposition (SVD) and BTEM. Accordingly, all observable species were resolved, namely background moisture and C0 2 , hexane, dissolved CO in hexane, organic reactant 3,3-dimethylbut-l-ene, the organic product 4,4-dimethylpentanal, the catalyst precursor Rh4(a-CO)9(ii.-CO)3, and the observable organometallic intermediate RCORh(CO)4, Rh6(CO)i6, and the previously unknown component Rh4(a-CO)i2 (see Figure 1).

la: Major Components lb. Minor Components Figure 1. The Recovered FTIR Pure Component Spectral Estimates via BTEM. Species 1-8 are moisture, CO2, hexane, dissolved CO, 3,3-dimethylbut-l-ene, 4,4-dimethylpentanal, Rh4(o"-CO)9(|l-CO)3, RCORh(CO)4, Rh6(CO)i6, and Rh4(o-CO)i2. (Ordinate: Normalized Absorbance; Abscissa: Wavenumber)

FT-Raman spectroscopy data Eight mixtures of solids containing lead carbonate, arsenate, and oxide were made with different proportions of each compound, and later were measured using FT-Raman spectrometer. All mixture spectra were collected and consolidated into a single data matrix. Using the SVD and BTEM approach, all pure component spectra were resolved. The results are shown in Figure 2. Due to its sensitivity, BTEM was also able to recover a previously unknown and very minor compound. It was suspected that this new species is the dehydrated form of lead arsenate due to sample heating by the high power laser. The correlation coefficients between the reference and reconstructed pure component spectra for lead carbonate, arsenate, and oxide were 0.997, 0.977, and 0.999 respectively.

65

Laid Ciibonala

ISOO

3000

1500

2000

1500

2000

Figure 2. The Recovered FT-Raman Pure Component Spectra via BTEM. (Ordinate: Normalized Intensity; Abscissa: Raman Shift)

PGSE-NMR spectroscopy data This data was a mixture of 0.1% w/w TX-100 (a non-ionic surfactant) in water and 5% w/w gelatine in water. In this experiment 20 mixture spectra were measured in the range of-0.58 to 8.1 ppm using PGSE-NMR spectroscopy. Previous analysis was performed using DECRA approach (Windig and Antalek, 1997) [13]. Reasonable pure component spectra were recovered in spite of highly overlapping spectral features between TX-100 and gelatine. In the current study, the applicability of the BTEM approach to resolve NMR data was tested. Accordingly, the consolidated data matrix was subjected to SVD and BTEM, and the pure component spectra of TX-100 and gelatine were reconstructed as shown in Figure 3. The correlation coefficients between the reference and the estimated pure spectra for TX-100 and gelatine are 0.994 and 0.973 respectively. These values can be compared to the DECRA results (both gave same values) which are 0.995 and 0.930 respectively.

Figure 3. The Recovered PGSE-NMR Pure Component Spectra via BTEM. (Ordinate: Normalized Intensity; Abscissa: ppm)

MS spectroscopy data Time resolved mass spectral data of a three-component mixture previously was analyzed by Windig et al using SIMPLISMA approach. The data consists of three photographic color-coupling compounds, dissolved in methanol. More details of the experimental set up can be found in reference [14]. Again, the applicability of BTEM to resolve pure component spectra was tested for MS data. Accordingly, the mixture spectra data matrix was decomposed using SVD and later was subjected to BTEM. All three pure component spectra were resolved, and are shown in Figure 4. The correlation coefficients between the reference and the estimates for compounds A, B and C are 0.988, 0.974, and 0.992 respectively.

66

JXA •0.11

li I. . 1

. i.. L.

. ( . —J-¥

1,1.1 n . , J l l • .1 -

,

,

.

.

•,

i

> '

Figure 4. The Recovered Pure Component Spectra via BTEM. (Ordinate: Normalized Intensity; Abscissa: M/e)

5

Conclusion

In this paper, BTEM as a general spectral reconstruction algorithm is presented. Four various spectroscopy data obtained from four different chemical spectroscopies, namely FTIR, FT-Raman, NMR, and MS were analyzed. It was seen that the pure component spectra for all observable components were well reconstructed. References 1. Lawton W. H. and Sylvestre E. A., Interactive self-modeling mixture analysis, Technometrics 13 (1971) pp. 617-633. 2. Ohta N., Estimating absorption bands of component dyes by means of principal component analysis, Anal. Chem. 45 (1973) pp. 553-557. 3. Sasaki K., Kawata S., and Minami S., Estimation of component spectral curves from unkown mixture spectra, Appl. Optics. 23 (1984) pp. 1955-1959. 4. Shannon C.E., A mathematical theory of communication, Bell System Tech. J. 3 (1948) pp. 379-423. 5. Neal S.L., Davidson E.R., and Warner, I.M., Resolution of severely overlapped spectra from matrix-formatted spectral data using constrained nonlinear optimization, Anal.Chem. 62 (1990) pp. 658-664. 6. Brown S.D. and Harper A.M., Multivariate analysis of time-resolved pyrolysis mass spectral data, Computer-Enhanced Analytical Spectroscopy, Vol. 4 (1993) Ed. By Wilkins, C.L. pp. 135-163. New York: Plenum Press. 7. Volkov V.V., Separation of additive mixture spectra by a self-modeling method, Appl. Spectrosc. 50 (1996) pp. 320-326. 8. Zeng Y. and Garland M., An improved algorithm for estimating pure component spectra in exploratory chemometric studies based on entropy minimization, Anal. Chim. Acta. 359 (1998) pp. 303-310. 9. Pan Y., Susithra L. and Garland M., Pure component reconstructions using entropy minimizations and variance-weighted piecewise-continuous spectral regions. Application to the unstable experimental system Co-2(CO)(8)/Co-4(CO)(12), J. Chemometr. 14 (2000) pp. 63-77. 10. Widjaja E. and Garland M., Pure component spectral reconstruction from mixture data using SVD, global entropy minimization, and simulated annealing. Numerical investigations of admissible objective functions using a synthetic 7-species data set, J. Comput. Chem. 23 (2002) pp. 911-919. 11. Widjaja E., Li C.Z., and Garland M., Semi-batch homogeneous catalytic in-situ spectroscopic data. FTIR spectral reconstructions using Band-Target Entropy Minimisation (BTEM) without spectral preconditioning. Organometallics. 21 (2002) pp. 1991-1997. 12. Chew W., Widjaja E. and Garland M., Band-target entropy minimization (BTEM): An advanced method for recovering unknown pure component spectra, application to the FTIR spectra of unstable organometallic mixtures, Organometallics 21 (2002) pp. 1982-1990. 13. Windig W. and Antalek B., Direct exponential curve resolution algorithm (DECRA): a novel application of the generalized rank annihilation method for a single spectral mixture data set with exponentially decaying contribution profiles. Chemom. Intell. Lab. Syst. 37 (1997) pp. 241 -254. 14. Phalp J.M., Payne A.W., and Windig W., The resolution of mixtures using data from automated probe mass spectrometry, Anal. Chim. Acta. 318 (1995) pp. 43-53.

DEVELOPMENT OF A WEB-BASED STATISTICAL PROCESS CONTROL ANALYSIS AND REPORTING SYSTEM FOR TABLET PROCESSING PLANTS T.T. LEE Y.L. LI AND P. WU Institute of High Performance Computing, 1 Science Park Road #01-01 The Capricorn, Singapore Science Park II Singapore 117528 M. K. CHIN T. LIM AND L. K. SEE Sumitomo Bakelite Singapore Pte Ltd, 1 Senoko South Road Singapore

758069

This paper presents a method and system for automating the process of generating statistical summaries, products distribution and statistical process control charts for tablet process plants. We integrate the capabilities of commercial real-time statistical process control software, Microsoft® SQL Server database, TomCat Servlet Container, HTML, JavaServer Pages™, JavaBeans™ and JDBC™ technologies. The system allows for not only real-time SPC analysis and reporting of daily production lots but also monthly SPC analysis and reporting and operator entry of essential lot data into a centralised database through a user-friendly web-based graphical users interface (GUI).

1

Introduction

Statistical Process Control (SPC) composes a set of statistical tools for monitoring of production process and making timely modifications to process variables in order to improve and/or maintain quality and increase productivity. A real-time SPC system typically allows for both automated and non-automated data collection, real-time plotting of industry-standard SPC charts and computation of critical statistical parameters for process monitoring, triggering control actions, storage of data into database server(s) as well as conducting statistical analyses and generate SPC reports [1]. Such system also allows for remote real-time monitoring of functionally and geographically distributed plants and supporting facilities via networks. While SPC tools such as control charts are mature technology, the implementation technologies are still advancing due to the rapid development in computers and computing technology [2]. This paper presents development of a Java-enabled web-based automated SPC analysis and reporting system for tablet manufacturing in the specialty chemicals industry. The system provides easy-to-use graphical users interface (GUI). Users enter essential production lots and tablet standard numbers (TSN) through the GUI, connect to a centralised database server, retrieve and process raw data to generate SPC reports for analysis. 2

Manufacturing process overview and SPC system requirement

Epoxy moulding compounds for encapsulation (EME) is one of the main products in the specialty chemicals industry. The manufacturing process involves a series of unit operations: (i) mixing and kneading of raw materials; (ii) cooling and pulverising of EME products in powder form and (iii) postmixing and pelletising to produce the EME products in their final form i.e. tablets. Tablet sizes vary according to TSN, which contains such information as customer, grade, lot number, tablet weight and tablet height. It is imperative that the manufactured EME tablet products are within customer's

67

68 specifications. One way to monitor product quality specifications is via the web-based SPC analysis and reporting system which encompasses three distinct subsystems namely: pelletising lot SPC analysis and reporting subsystem, pelletising monthly SPC analysis and reporting subsystem and end lot data input. The pelletising lot SPC analysis and reporting subsystem contains amongst other: information on customer, tablet specifications, raw data of EME tablet weight and height, and vital statistics such as mean, standard deviation (SD), process capability indices (Cp and CpK) and X-bar and S charts. For a given production lot, the daily pelletising SPC report should also show control limits (CL), upper control limits (UCL) and lower control limits (LCL) and indicate number of out of specification (OOS) tablets and a listing of all out of control (OOC) tablet products. The pelletising monthly SPC analysis and reporting subsystem contains such information as: quantity of irregular products, final yield of each production lot and vital statistics such as mean, SD, Cp, CpK, OOS and OOC of the final product specifications for a given month. It is necessary for operator to insert lot data into the relational database after each lot has been completed. These data serves to identify the machine and operator where data is acquired and for computation of statistics required in the pelletising lot and monthly SPC reports. This motivates the development of the end lot data input subsystem. 3

System architecture and functions

The system consists of a centralised, Microsoft® SQL [3] database server, a web server and client workstations as in Fig. 1. This system was integrated to the EME tablet realtime database created using WinSPC® [4], which is hosted in a dedicated database server and interfaced to plant data acquisition servers and devices [1]. The system includes three tiers of client tier, the middle tier (the web server) and the end tier (the database server). We employed Tomcat [5] for the web server. The clients send requests via web browsers and the requests are sent to the SPC web server for processing. While the requests are processed, the web server retrieves data from the database server and processes the retrieved data using the logic programmed in the JavaServer Pages™ (JSP™) and JavaBeans™ [6]. After SPC data processing is completed, the web server generates the required SPC report and sends it to the client's browser in HyperText Markup language (HTML) format. The system has the capabilities to automatically compute the EME tablet statistical parameters such as USL (upper specification limit), LSL (lower specification limit), CL, UCL, LCL, SD, Cp/CpK, OOC, mean, max and min values of queried variables. The EME tablet real-time database comprises of some 100 relational tables. In this work, only the relevant tables for the reporting system are identified. The tables containing such information as TSN, usernames and end lot inputs are also created as part of the system development. 4

Demonstration examples

Fig. 2 is the main login interface of the reporting system for the EME tablet plant. There are three possible login options representing the three subsystems. To access the pelletising lot SPC analysis and reporting subsystem, operator presses the toolbar "Pelletising Lot SPC Report" for login. Because of security issues, users need provided their assigned UserName' and 'PassWord' for login. It is shown in Fig. 3. This subsystem

69

Hfttt

A AAA * +

*

SPf Vfrtt.

Figure 1. The system architecture

1M

F ' i l« SPC Reporting System

E I C ' i JI S3C ^e il i j s

P>

e^* _i*-™** J

I li>l!Il 2

MLIIII I I I [

iw **^** ^™ j

I mure J ( i l I mi SR Kjp.ui

»-*«, - ,> i j a * - * i ( »— J » » « .**• *#-•( w*M8tjs*ww&#»e**j»

1

,s

. 1

.•B

Figure 4 Demonstration example of lot SPC reporting system

i

i: 1

I

if,

jf

.1

70

also asks for 'LOT No' and 'Tablet STD No' for querying and computing the lot SPC report. After a successful login, raw data retrieving and processing, a lot report is generated. As shown in Fig. 4, the report gave the pelletising machine number, the EME tablet products specifications, operators' identification, customer's name and EME tablet weight and height specifications in terms of USL, LSL, UCL, CL and LCL. The bulk of statistical summary is given in the next row under subtitle 'LOT SUMMARY'. Query results shown in Fig. 4 (top left) indicated that minimum values of the EME tablet weight and height are near to zero and there are presence of OOS points in the collected raw data (i.e. 42 for weight and 47 for height from over 2000 data points). The precise OOS points in the raw data are explicitly identified in the result by an asterisk (see Fig. 4, top right). Inspection the histograms in Fig. 4 (bottom left) showed that although they appears to be normally distributed, all of the OOS data are near to zero implying that the EME tablet real-time SPC database contains 'unreasonable' weight and height data. These unreasonable OOS data contributed to the falsely calculated process capability indices (CP and CpK) of less than 1 or process not capable. This result also suggests that the autochecker equipment at pelletising machine number Tmtws09, which automatically collects weight and height raw data have some operational problem requiring maintenance. Alternatively, the SPC administrator could apply a standard control test in the WinSPC package to reject unreasonable data from entering the EME tablet real-time SPC database and to generate correct statistics. 5

Conclusions

Using the pelletising lot SPC analysis and reporting subsystem, plant operators working at independently-operated pelletising machines are able to capture summary statistics required for analysing EME tablet produced for each production lot at their respective SPC workstation. Other manufacturing personnel such as QA people and senior management can also access the system through their own workstation via network. Plant operator could also enter information necessary for monthly SPC analysis and reporting after end of data collection for each production lot. Plant engineers are able to quickly capture summary statistics for each production lots produced at a given pelletising machine for a given month. In view of the modularised development, the system is easily scaleable to include other functionality. 6

References 1. T.T. Lee, 2002, Development of a plant-wide real-time SPC system for the specialty chemicals industry, HPC symposium 2002, Singapore. 2. S. Dogdu and D.L Santos, 1998, The paradigm shift in statistical process control due to the latest developments in computer technology, Computers & Industrial Engineering, 35(1-2), 177-180. 3. Microsoft is a registered trademark of Microsoft Corporation, http://www.microsoft.com 4. WinSPC is a registered trademark of DataNet Quality Systems, http://www.winspc.com 5. Tomcat servlet container is managed by the Apache Jakarta Project, http://iakarta.apache.org/tomcat/tomcat-3.2-doc/index.html 6. Java, JavaServer Pages and JavaBeans are registered trademark of Sun Microsystems, http://iava.sun.eom/i2se/l.3/index.html

MONTE CARLO SIMULATION OF SURFACE SEGREGATION IN Ni-Co NANOPARTICLES R. JAYAGANTHAN* and G. M. CHOW Department of Materials Science National University of Singapore Kent Ridge, Singapore 119260, Republic of Singapore E-mail: [email protected]« Abstract An analysis of surface segregation of Ni-Co alloy nanoparticle (atomic ratio 1:1) is carried out using Monte Carlo simulation with the energetics of alloy nanoparticles described by Bozzolo-Ferrante-Smith (BFS) method. The surface compositions of Ni-Co nanoparticles are calculated based on the simulation that takes into account of the size effect. The segregated surface compositions in the Ni-Co binary alloy single-crystal nanoparticles (with particle diameter of 10-100 nm) are observed to be larger than that of the corresponding bulk single crystal (with the particle diameter > 100 nm). The details of Monte Carlo simulation and the calculated surface compositions due to segregation are discussed. 1.0 Introduction Nanostructured particles and films (with particle diameter < lOOnm) have received great attention due to often-improved properties that arise from the size effects and surface/interface effects. The composition, structure, and properties of the nanoparticles and nanostructured thin films are affected by compositional surface segregation. The surface composition of nanoparticles is generally different from the average or bulk composition; this phenomenon is commonly referred to as surface segregation or surface enrichment. The driving force for the surface segregations are i) reductions in surface free energy, ii) bulk elastic strain energy and iii) size effects. The surface segregation in bulk materials has been reasonably well understood in the earlier thermodynamics work by Gibbs [1], however, surface segregation in nanoparticles and the nanostructured thin films (polycrystalline or single crystalline) is expected to differ from the bulk due to the contribution of the significant amount of surface atoms in nanostructures. It is therefore essential to analyze the thermodynamics of surface segregation in nanoparticles as they significantly influence the properties of the materials. In this paper, we present our Monte Carlo Simulation of surface segregation in Ni-Co nanoparticles with the energetics of nanoparticles, described by Bozzolo-Ferrante-Smith (BFS) method [2]. 2.0 Outline of the Modeling and Simulation Approach The Bozzolo-Ferrante-Smith (BFS) method [2, 3] is used to calculate the strain energy and chemical energy of nanoparticles and evaluate the surface segregation profile simulated by Metropolis Monte Carlo Algorithm [4, 5]. The BFS method [2, 3] is based on the idea that the energy of formation AH of an arbitrary alloy structure is the superposition of individual contributions of all the atoms in the alloy. AH = l e , = S(e,'-e,), (1) where s, is the energy of atom i in the alloy and ei is the energy of this atom in its equilibrium single crystal. Each individual contribution E; can be interpreted as the superposition of two distinct processes, strain energy and a chemical energy, that are properly coupled in an attempt to simulate the alloy formation in Ni-Co nanoparticles. A strain energy, e;S, and chemical energy eiC is linked by a coupling factor, &, E,=

e, s

+ gi(e1c-e,c°)

(2)

where ejCo is the reference chemical energy.

71

72 2.1 Monte Carlo Simulation Technique It is a stochastic technique employed for computing averages over a statistical ensemble. The Monte Carlo simulation does not require the physical dynamics of the system to be followed. In analyzing segregation profile, this is essential in computing the thermodynamics at finite temperature. For the case of alloys, it is possible to change the compositional arrangements of atoms without worrying about kinetics of the system. This is valid for the computation of thermodynamic equilibrium. In the Monte Carlo simulation, a series of atomic configurations is produced such that the probability of a configuration being generated is proportional to the probability of the configuration occurring in the statistical ensemble. This can be achieved using following Metropolis algorithm [4, 5]. The surface of nanoparticles was modeled using Monte Carlo method by first initializing the atoms in a fee lattice and then the following algorithm was implemented. 1. An atom is selected and allowed to interchange positions in the lattice with another atom. 2. The configuration energy change, AE, is due to the switch described in step 1 is calculated. 3. If AE is negative, the new configuration is accepted. If AE is positive, a random number between zero and one is selected from a uniform distribution. If the quantity exp (AE/kT) is greater than this random number, the new configuration is accepted; otherwise the old configuration is retained. 4. Steps 1 to 3 are repeated until "equilibrium" (no significant changes in total configurational energy occur with time) is reached. 3.0 Surface segregation of Ni-Co nanoparticles. In the present work, the Monte-Carlo simulation has been used for analyzing the surface segregation in Ni-Co nanoparticles (1:1). Co and Ni have very close bulk binding energies (4.446 and 4.435 eV, respectively) and nearest neighbor distances (2.506 A0 and 2.489 A0 respectively). Co is typically hep but for fee Ni-Co alloys, one can treat Co as fee. The first two coordination shells (nearest neighbors) of hep Co can also be approximated as the first coordination shell of fee Co, thus allowing us to treat the above argument without any problem. The theories based on the broken bond models and strain theory indicated a small surface segregation of 56 atomic percent for Ni-Co bulk alloys [6]. E.E.Hajcsar et. al. [7] has predicted Ni segregation at the surface across the composition range of Ni-Co. In this paper, the calculated surface compositions of Ni-Co nanoparticles (1:1) was based on Metropolis Monte Carlo Algorithm and are compared with the experimental results [7]. E.E.Hajcsar et. al. [7] used the Auger electron spectroscopy to determine the temperature dependence of surface segregation in Ni-Co bulk alloy single crystal on (111) faces. In our earlier work [8], the surface compositions of Ni in Ni-Co nanoparticles were calculated based on the thermodynamic model which takes into account of size effect. The experimental and theoretical results on surface compositions are plotted in order to compare with our calculations that took into consideration of the size effect on the surface segregations. The parameters used for calculating the strain energy and chemical energy ofNi-Co nanoparticles using eq. (1) are given in Table. 1. Table 1 Parameters of Ni and Co [3]

Parameter

Ni

Cohesive Energy

5.6724

Melting Temperature(K)

1726.1SK

Lattice Parameter

3.5173 (A0)

Co 5. 5637 1768.15K 3.4569 (A0)

73

0.66

O A

0.64

O 0.62 —

A O

A

0.60

:

a

O

0.58 -

•

A 0.56 -

•

i

0.54

•

O

A 0.52

O D

•

A

0.50 A

O

O

•

•

0.48 0.46

D O A

—

n id

1.02

•

i

1.04

i

1.06

1 1.08

i

Expt,Bulk [7] Size effect [8] Size effect [present work]

I

.

1.10 3

(1/TX10" )K"' Fig. 1 Plot of In C versus 1/T for (111) face of Ni-Co Nanoparticles

I

.

1.12

.14

74

•

• O

) ]

O -

•

Size effect [8] Size effect [present work]

O

•

-

o

-

•

o •

-

-

O

•

-

o •

o •

-

20

1

i

40

60

.

i

o • .

o

D

i

80

Particle Size, R (nm) Fig. 2 Plot of Surface Composition of Ni versus particle size of Ni-Co Nanoparticles

• •

100

120

75 4.0 Results and Discussion The Monte Carlo Simulation has been used to study the influence of size effect on the surface segregation of Ni-Co nanoparticles. The surface compositions of the Ni-Co nanoparticles, influenced by particle size, have been calculated using Metropolis Monte Carlo algorithm. The ratio of the nickel surface composition ( x , ' ) to the nickel bulk (Xh' ) compositions (In C) versus temperature for the (111) face of Ni-Co nanoparticles are plotted in Fig.l. The calculated surface compositions of Ni, in this present work, are compared with that of the Ni-Co alloy surfaces with the large particles size as well as with theoretically calculated surface compositions. It is observed, from Fig. 1 that the segregated Ni surface compositions in the Ni-Co binary alloy nanoparticles (with particle diameter of 10-100 nm) are larger than the bulk alloy surfaces (with the particle diameter > 100 nm). It is evident that Ni segregates to the surface in preference to Co in the Ni-Co nanoparticles and the segregation is larger when compared to that of the bulk Ni-Co alloy single crystal surface due the size effect. The calculated surface compositions in the present work are in tandem with the theoretically predicted compositions. The segregated surface compositions, Xs (Ni), of Ni-Co single crystalline nanoparticles as a function of particle size for (111) faces are plotted in Fig. 2 and compared with that of theoretically predicted surface compositions. It is observed that the surface composition of Ni increases when the particle size decreases, due to the size effect manifested in Ni-Co nanoparticles. For a particle diameter of 10 nm, the surface occupies 25 percent of the total number of atoms and influences the physical and chemical properties of the particle. For a particle diameter of 100 nm and 1000 nm, the surface atoms are only 2.5 percent and 2.5 percent of the total number of atoms in the particle, respectively. Therefore when the particle diameter increases to 100 nm or above, the surface effect becomes less pronounced to affect the surface segregation in Ni-Co bulk alloy single crystal surface. It is evident that the surface composition tends to approach that of the composition of the bulk alloy surface when the particle diameter approaches 100 nm and above. When the particle size is very small, the surface area increases and the maximum number of atoms, of the particle, with the lower surface tension value occupy the surface. 5.0 Summary An attempt has been made to analyze the particle size effects on the surface segregation in the Ni-Co nanoparticles using Monte Carlo Simulation with the energetics described by BFS method [2]. The analysis of surface segregation in Ni-Co nanoparticles was based on the MC algorithm in the present investigation. The surface compositions of the Ni in Ni-Co nanoparticles for (111) faces are observed to be higher than that of the bulk alloy single crystal surfaces with larger particles size. Acknowledgement: G. M. Chow would like to thank the support of this work by the DSO National Laboratories, Republic of Singapore. R. Jayaganthan was supported by NUS research fellowship. References 1. 2. 3. 4. 5. 6. 7. 8.

J. W. Gibbs reprinted in: J. W. Gibbs, The scientific papers, Dover, New York, 1961. G. Bozzolo, J. Ferrante, R. D. Noebe, B. Good, Frank S. Honecy, Phillip Abel, Computational Materials Science 15 (1999) pp. 169-195. G. Bozzolo, J. Ferrante, NASA Technical Memorandum 106675, July 1994. N. Metropolis, A. W. Rosenbluth, M. N. Rosenbluth, A. N. Teller, and E. Teller, J. Chem. Phys., 21 (1953) pp. 1087-1092. N. Metropolis. The beginning of the Monte Carlo Method. Los Alamos Science 12 (1987) pp. 125-130. P. R.Underhill, Surf. Sci. 195 (1988) pp. 557-565. E. E. Hajcsar, P. Underhill, W. Smeltzer, P. Dawson, Surf. Sci. 191 (1987) pp. 249-258. R. Jayaganthan, G. M. Chow, Materials Science and Engineering-B 95 (2002) pp. 116.

FIRST PRINCIPLES STUDIES OF THE DISSOCIATION OF PROTONATED ALANINE (H+-ALA) COMPLEX SEDURAMAN ABIRAMI,ab CATHERINE WONG,c N. L. MAa*+, C. W. TSANGd*n, N. K. GOHb "Institute of High Performance Computing, 1 Science Park Road, #01-01 The Capricon, Singapore Science Park' • , Singapore 117528, Singapore National Institute of Education, Science and Technology Education, 1 Nanyang Walk, Singapore 637616, Singapore cDepartment of Chemistry, The University of Hong Kong, Pokfulam Road, Hong Kong. Department of Applied Biology and Chemical Technology, The Hong Kong Polytechnic University, Hung Horn, Hong Kong 1E-mail: [email protected] E-mail: [email protected]

The fragmentation pathway of protonated alanine (H+-Ala) in the gas phase has been investigated using hybrid density functional theories (DFT). Four fragmentation pathways are considered: loss of H 2 0; CO; CO+H 2 0 and NH3 from H+-Ala. Our result suggested that the formation of CO is the thermodynamically most favourable process, which is in exact agreement with electrospray mass spectrometry experiment. The Car-Parinello molecular dynamic (CPMD) simulation on H+-Ala gave new insight on the dynamical aspect of the fragmentation.

Introduction Proteins are made up of amino acids. At each position in the chain, only one of twenty amino acids may be present. In biological (aqueous) systems, the functions of enzymes are often dependent on their peptide or proteins structures in the dipolar (zwitterionic) state, which in turn are determined by the sequence (position) of constituent amino acids in the peptide or protein chain [1]. The first step in studying the biological function of a protein or peptide is to determine its amino acid sequence. Mass spectrometric analysis of peptides and proteins is fast and extremely sensitive (down to pico- to femtomole levels). This is mainly based on the fragmentation of protonated and alkali metal cationized peptides [2,3]. Unfortunately, the fragmentation chemistry of peptides is rich and complicated, comprehensible only to the highly experienced. Part of the reason is that the fragmentation mechanism and pathways of protonated cationized peptides, which are related to their structures and sequences in the gas phase (the conditions under which mass spectrometers operate), are not fully understood. In this work, we used state-of-the art quantum chemical calculations to study the fragmentation mechanism of protonated alanine (H+-Ala). In order to gain insight on the dynamical aspect of this system, Car-Parrinello ab initio molecular dynamics has also been carried out. Finally we compared our simulation results with mass spectrometry experiments. Computational details Ab initio molecular orbital calculations were carried out using GAUSSIAN98 package of programs [4]. All the species involved in the fragmentation were optimized at the B3LYP/6-31G(d) level and single point calculations were carried out at the B3LYP/6311+G(3df,2p) level. All stationary points were verified at the B3LYP/6-31G(d) level to be minima or transition states by frequency calculation. First principles molecular

76

77

dynamics calculations have been performed with the Vienna Ab-initio Simulation Package (VASP) [5-9]. The FT^-Ala was placed in a cubic box with edge length of 12 A to imitate gas phase condition. A planewave basis set was used for the electron wavefunction, which was solved at each MD step by conjugate gradient minimization of the total electronic energy within the framework of closed-shell LDA [10] with the Perdew-Wang gradient correction to the exchange-correlation functional [11]. For the core region, the optimized ultrasoft pseudopotentials supplied with VASP were directly used for all atoms, with a cutoff energy 270 eV. A preliminary equilibrating simulation was performed at 1500 K, starting with the ground state equilibrium H+-Ala geometry obtained from the ab initio calculations. The time step used was 0.5 fs, and the length of the simulation was 3 ps (6000 steps). Thirty five random configurations were selected from this simulation as the starting structures for further trajectory studies at 2700K, in order to raise the probability of observing the thermal dissociation reaction. For these trajectories, the time-step was again 0.5 fs and the duration of simulation was 1.0 ps (2000 steps). Mass Spectrometry All mass spectrometric experiments were performed on a quadruple ion-trap mass spectrometer (LCQ, Finnigan, San Jose, CA) equipped with an electrospray ionization (ESI) source. The CID (collision-induced dissociation) MS experiments were performed by mass selecting target ions using standard isolation (isolation width 1-3 m/z) and excitation (activation Q 0.25 and activation time 3-5 ms) techniques. All data collected were an average of 50 scans. Results and Discussion 1.1 Fragmentation pathways for the protonated Alanine (ff^-Ala)

Figure 1. The energy profile for the dissociation of H+-Alanine

78

The potential energy profile of the H+-Ala dissociation is shown in Fig. 1. Starting from structure 1A, a few pathways have been explored and only the most lowest energy pathway is reported below. Four processes are considered: 1. Loss of CO (8A +CO) 2. LossofH 2 0(9A + H 2 0) 3. LossofH 2 O + CO(10A + H2O + CO) 4. LossofNH 3 (llA + NH3) The loss of CO proceeds through the following path: 1A« 2A« 3A« 4A« 5A • 6A • 7A • 8A + CO One of the ammonium protons in 1A was transferred from the nitrogen to the carbonyl oxygen, yielding isomer 3A. The isomer 3A, via the transition structure 4A, results in the immediate loss of water to form 5A, which is a weak ion-molecule complex between CH3CH=NH2+, H 2 0 and CO. Further loss of CO from 5A take place via the transition structure 6A, to form the complex 7A. From the complex 7A either loss of CO or H 2 0 can take place in a stepwise manner. However, the loss of CO from 7A is found to be more favourable, yielding a stable complex 8A of the formula CH3CH=NH2+.. .H 2 0. The stability of this complex 8A arises from the very strong hydrogen bond present in which the amino hydrogen acts as the hydrogen bond donor, with a hydrogen bonding distance of 1.70 A. The next most likely fragment formed, based on energetics consideration, is 9A + H 2 0, which is 39 kJ mol"1 above 8A + CO. The iminium ion (10A + H 2 0 + CO) could be derived from the 8A + CO or 9A + H 2 0 in a stepwise fashion, which in turn depends on how the energy is partitioned between the fragments. An alternative pathway for the loss of H 2 0 + CO is the direct loss of both the neutral fragments from the species 5A. The loss of ammonia takes place by simple C-N bond breaking from 1A, producing the product 11A and NH3. The corresponding product is very unstable at 260 kJ mol"1 above 1A. From the above PES study, it is very clear that the loss of CO is thermodynamically more favourable compared to the other pathways. This is in consistent with the experiment in which lose of CO is exclusively observed under ESI low energy CID ion trap conditions. Car-Parrinello Molecular Dynamic (CPMD) Study The summary of the 35 CPMD trajectories is shown in the Table 1. Table 1. Summary of the reactions observed in the 35 trajectories Reaction observed 1A« NH3 + CH3CHCOOH+ 1A« NH3 + CH2CHC(OH)2+ 1A« CH3CHNH2++ C(OH)2 1A« NHt++ CH2CHCOOH No reaction

Frequency 8 (23%) 1 (3%) 1 (3%) 2 (6%) 23 (65%)

79

No reaction was observed in 23 out of 35 trajectory. From the CPMD simulation at elevated temperature, loss of NH3 is the most favored process from 1A through the direct fission of the C-N bond. The product ammonia is observed in 9 trajectories (i.e. 26%) along with CH3CHCOOH+ and CH2CHC(OH)2+. The loss of NH4+ ion is observed in two trajectories (i.e. 6%) and loss of dihroxycarbene is observed in only one trajectory. Comparison between CPMD results with PES study At elevated temperature, the CPMD results show that the loss of CO does not follow the lowest energy pathway as suggested by the ab intio and ESI-MS experimental results. The loss of NH3 pathway was more preferred over the other pathways. This suggested that using 'harder or more energetic' ionization methods (e.g. electron impact), the major fragment in the mass spectrometer derived from H+-Ala is likely to be NH3. Conclusions 1. The ab initio PES results suggest that the formation of CO fragment is preferred over the other fragments. Our simulation results are in exact agreement with the experiment in which loss of CO is exclusively observed under ESI low energy CID ion trap conditions. 2. The ab initio results shows that the loss of NH3 from H+-Ala is thermodynamically unfavourable compared to the other pathways. 3. The CPMD simulation at elevated temperature shows that the formation of NH3 is preferred over the other pathways. New fragments and reaction pathways are observed when the temperature is raised. References 1. Lippard, S. J.; Berg, J. M. Principles of Bioinorganic Chemistry, 1994, University Science Books, Mill Valley, California. 2. Jonsson, A.P., Mass Spectrometry for protein and peptide characterization, CMLS, 2001, Cell. Mol. Life. Sci., 58. 3. Kinter, M., Sherman, N.E., Protein Sequencing and Identification Using Tandem Mass Spectrometry, 2000, Wiley. 4. Frisch, M. J. et.al. Gaussian 98 Revision A.7 Gaussian, Inc.: Pittsburgh, PA, 1998. 5. Kresse, G.; Hafner, J. Phys. Rev. B. 1993, 47, 558. 6. Kresse, G.; Hafner, J. Phys. Rev. B. 1991, 49, 14251. 7. Kresse, G.; Furthmuller, J. Comput. Mat. Sci. 1996, 6, 15. 8. Kresse, G.; Furthmuller, Phys. Rev. B. 1996,54,11169. 9. Kresse, G.; Hafner, J. Phys. Condens. Mat. 1994, 6, 8245. 10. Payne, M. C ; Teter, M. P; Allan, D. C ; Aria, T. A; Joannopoulos, J. D. Rev. Mod. Phys. 1992, 64, 1045. 11. Perdew, J. P.; Ziesche, P.; Eschrig, H. Electronic Structures of Solids 91, Adademie, Berlin, 1991, p. 11.

MOLECULES AND STRUCTURES FOR MOLECULAR ELECTRONIC NANODEVICES PING BAI, SHUO-WANG YANG, PING WU, ER-PING LI Institute of High Performance Computing I Science Park Road, #01-01,The Capricorn Science Park II, Singapore 117528 Email: [email protected],[email protected], wuping®ihpc.a-star.edu.sg, [email protected] A review of molecular electronic structures and molecules for electronics is given. Molecular wire, molecular devices and their incorporation into electronic system have been addressed. The conductance through molecules particularly nanotubes is analyzed. To explore new molecules/nanotubes applied in electronic device, several borne, carbon and nitrogen mixed cyclancenes, (BCN)„, are investigated by using ab intio methods Hartree-Fock and unrestricted density functional theory to exam their energy gaps change along with molecular components, symmetry and size. The results show that thionations do not affect energy gaps much and (BCN)„ could be potential molecular structure for the molecular electronic devices through thiolated bonds.

1

Introduction

The microelectronics industry is presently close to the limit of minimisation trend dictated by both laws of physics and the cost of production. Molecular electronics offers a completely different approach to the problem [2]. Molecular construction is a bottom-up technology that uses atoms to build nanometer-sized molecules that could further selfassemble into a desired computational circuitry. This bottom-up approach gives rise to the prospect of manufacturing electronic circuits in rapid, cost-efficient, flow-through processes. It in principle allows a very precise positioning of collections of atoms or molecules with specific functionalities. Molecular electronics has received great attention because of the ultimate size limit of functional devices. Some studying on the molecular electronics have been carried out in number of research institutions [2-7]. However, it is very young. The development of molecule-sized components is still very much at the research stage. Although the advances were encouraging, the challenges remaining are enormous. In this brief, the basic molecular electronic structure is introduced first. Then molecules are analysed from electronic application point of view. Finally, we propose several thiolated borazine cyclancenes as novel potential molecules for molecular electronic devices. 2

Molecular electronic structures

Molecular electronics involves the replacement of a wire, transistor, or other basic solidstate electronic element with one or a few molecules. To construct a wire, an elongated molecule is needed to allow electrons to flow easily from one end to the other. Electrons in any quantized structure such as a molecule tend to move from higher to lower energy levels. To channel electrons, a molecule should have an empty, low-energy orbital that is dispersed throughout the molecule from one end to the other. A pi orbital is a typical empty, low-energy electron orbital. A pi-conjugated molecular wire could be constructed

80

81 by configuring the electron orbitals in which electron clouds overlap from one molecular component to the next. An active device such as a transistor has to do more than merely allow electrons to flow. It has to somehow control that flow. For instance, the design of a molecular transistor is generally based on the control from one electrode (the source) to the other (the drain) of the electron transfer rate through the molecule itself by a control electrode (the gate). The rate can be modified by the shift in energy of a given molecular level of the molecule relative to the Fermi level of the electrodes, by changing gap between the highest occupied molecular orbital (HOMO) and the lowest unoccupied molecular orbital (LUMO) of the molecule or the molecule-source and/or molecule-drain electrode electronic coupling. Electromechanical control [4] and field effect control [6] could be employed. Making a conjugated molecule is one thing; incorporating it into a circuit and measuring its ability to transport electrons are different problems. To incorporate molecules into nanoscopic circuits, either end of the molecule need to be linked to separate metallic electrodes via the strong thiol-metal atom interaction. This affords a metal-molecule-metal circuit capable of the transport of electrons from one electrode to the other through the molecules, in an interesting fusion of molecular and conventional electronics. The way in which the molecule is linked to electrodes is therefore as impotent as the molecule itself. Attachment of molecular alligator clips to one or both sides of related molecules has been demonstrated to improve the quality of the contacts with the molecule. Several types of molecular alligator have been studied both experimentally and theoretically. These include sulfur, selenium, tellurium, and isonitrile end groups [7]. There remains a need to develop molecular alligator clips that minimise the impedance mismatches between molecular structures and metal surfaces. 3

Molecules for electronics

Molecules studied as wires or devices cover a very wide variety of types, including conjugated hydrocarbons, nucleic acid strands, carbon nanotubes, and porphyrin oligomers. One feature they gave in common is the highly conjugated and delocalised structure [8]. Single-welled carbon nanotubes (SWCNTs) show unique electronic as well as mechanical properties and offer tremendous opportunities for the development of fundamentally new systems [9]. SWCNT can be visualised as a sheet of graphene that has been rolled into a tube. Transport of electrons within the plane of a graphene sheet can occur over long distances as it in a conjugated molecule. However, a graphene sheet is only a moderate conductor of electricity since mobility of electrons is only a necessary but not a sufficient condition to an electronic conductor. The other requirement is that there is enough charge carriers available to provide a large current in response to a given applied voltage. SWCNT can be either metallic or semiconducting, depending on its the atomic structure. By configuring the atomic structure, density of states at the Fermi energy, i.e. number of electrons near the Fermi energy, can be controlled. The atomic structure can be expressed in terms of tube chirality or helicity which is defined by the chiral vector (n,m) and the chiral angle. When the difference n-m is a multiple of three, a metallic nanotube is obtained. Otherwise, a semiconducting nanotube is obtained [10].

82

4

Thiolated borazine cyclancenes

SWCNTs are very attractive due to their unique properties but their energy gaps normally are less than 2.0 eV [1] which is not big enough to satisfy devises in electronics in some cases. Single wall boron nitride nanotubes (SWBNNTs) show quite larger band gaps comparing with SWCNTs. We performed a theoretical research with unrestricted density functional theory (DFT) on benzene and borazine cyclancenes, (BN)n (Fig-1), which are the unit fragments and possible precursors of BN nanotubles. The calculation results show that the frontier orbitals, HOMO-LUMO gaps (Agap) in (BN)n increase steadily with borazine number (n) and stabilizes at -6.7 eV for n> 9. In contrast to the carbon cyclacene system where the gap decreases and approaches that of the semiconducting carbon nanotube system (-1.13 eV) for n>8. In addition, Agap of SWBNNTs are little affected by chiral vectors (n,m) or chiral angles that is much different form SWCNTs. If boron or nitrogen atoms were replaced by carbon atoms, (BN)n will become (BCN)n and Agap will change accordingly. Our calculation results show that Agap strongly depend on component - how many B or N atoms are replaced, molecular symmetry, the size of the cyclancenes (n) as well as chiral vector. Table-1 lists some calculation results for several (BCN)n structures with n =4 or 6. For most of the (BCN)n (n=4), Agap are narrowed down after inserting carbon atoms instead of boron or nitrogen atoms. While there are two structures, 4R_C_pm and 4R_BC shown in Fig-2, appear even higher Agap after carbon replacement. When the n increases from 4 to 6 for first structure in Fig-2 (6R_C_pm), the Agap drops dramatically from 5.24 eV to 1.20 eV. It shows that (BCN)„ system including many molecules with the energy gap form 1.1 eV to 5.2 eV that allow us to design the molecules or nanotubes with wanted Agap. To study linkage between (BCN)n and metals contacts, we proposed thiolated (BCN)n structures which are prone to link metals such as gold (Au) easily through thiolate bonds. Both HF (HF/3-21G basis) and DFT (UB3LYP/6-31G(d)) are employed for three thiolated borazine cyclancenes. The structures are given in Fig-3 where (a) and (b) are analog. Four thiolic groups on both arenold belts respectively but their substitution are either (a) ortho- or (b) para-. Calculation results are given in Table-2. Due to HartreeFock theory neglects instantaneous electron correlation and is not accurate enough to describe energy of frontier molecular orbitals comparing to DFT method, so we will focus on DFT results. It can be seen that Agap do not change much, actually slightly widen after thionation for these three proposed structures. It implies other (BCN)n may possible to keep Agap relatively steady after thionation which allow scientists to apply them in molecules/nanotubes electronic devices. 5

Conclusions

Intrinsic properties of the molecules as well as nature of the molecule-metal linkage plays an important role in conductance in the molecular electronic device. In this paper, electron transport through molecules/nanotubes, molecular wire, molecular devices and molecule-contact linkage have been reviewed. Borne, carbon and nitrogen mixed cyclancenes, (CBN)„, are investigated by using ab intio methods HF and DFT. The results show that (BCN)n system includes many molecules with variable energy gap in a large range form 1.1 eV to 5.2 eV and thionations affect energy gaps little. These mean that thiolated (BCN)n structures could be potential candidate in application to molecular electronics.

83

6

References

1. Yang S W, et al., , "Ab intio Studies of Borazine and Benzene Cyclacenes", Accepted by Diamond and Related Material 2. Reed, M A., Proceedings of the IEEE, 87(4):652-658, April 1999 3. Guo J, et al, APPLIED PHYSICS LETTERS 80 (17): 3192-3194 APR 29 2002 4. Joachim C, et al., NATURE 408 (6812): 541-548 NOV 30 2000 5. Ellenbogen J C, et al., Proceedings of the IEEE, 88(3):386-426 March 2000 6. Park J, et al., Nature; 417; 722 - 725; Jun 2002 7. SeminarioJM, etal.,1 Am. Chem. Soc; 121(2); 411-416; 1999. 8. Ward M D, et al., J. Chem. Educ. 78(3), 321, March, 2001 9. Thostenson E T, et al., Composites Science and Technology; 61; 1899-1912; 2001 10. Srivastava D, et al., Computing in Science & Engineering, 3(4); pp. 42-55; July 2001

Table-1: Bonding Energy Agap of (BCN)„ with different carbon replacement n. File Name 4R 2R 4R BP 4R_C_pm 4R BC 4R BN m 4R_BNLp 4R_CN 4R C 4R_1 6R_C_pm

Component

Bonging E HOMO LUMO Gap B4C8N4H8 -144.1 -4.88 -3.08 1.8 B8P8H8 -101.41 -6.39 -4.23 2.15 B4C8N4H8 -6.35 -140.85 -1.13 5.23 B8C8H8 -5.85 -1.68 4.17 -131.16 B4C8N4H8 -5.64 -139.49 -2.16 3.48 B4C8N4H8 -5.75 -138.67 -2.77 2.98 C8N8H8 -141.98 -4.57 -1.77 2.8 C16H8 -4.9 -1.47 3.43 -323.05 B8N8H8 -310.13 -6.7 -2.92 3.78 -3.81 B6C10N6H10 -214.96 -2.61 1.2

; O o

Fig-1: 3-D drawing of 6-ring borazene cyclacene, side view (left) and top view (right). Red: B atom, blue: N atom and grey: H atom.

--

o

BV' (b)

^B ^N 4R_BC (4.53 eV)

Fig-3: Thiolated borazine cyclancenes, (a) 4HS-8BN4SH_o (top-view), (b) 4HS-8BN-4SH_p side-view.

N

ypB C

4R_C_pm (5.24 eV)

Fig-2: Unit structures (BCN)n (n=4 or 6)

Table-2: Energy gaps of (BCN)„ before and after thionation (eV).

HF

FTD

n 4HS-8BN-4SH 0 4HS-8BN-4SH_p (BN)8 (BN)7 4HS-8BN-4SH o 4HS-8BN-4SH p 7BN-SH

HOMO Gap2 HOMO Gapl HOMO Gap LUMO Gap+ LUMO Gap+ LUMO +2 2 1 +3 -2 -1 -6.83 0.05 -6.88 6.26 -0.62 0.2 -0.42 0.05 -0.36 -6.8 0.03 -6.78 0.06 -6.84 6.21 -0.46 0.04 -0.42 -6.75 0.03 -0.63 0.17 -6.77 0 -6.77 6.52 0.23 0.23 -6.81 0.04 -0.25 0.48 0.06 0.01 0.07 -6.63 0 -6.63 5.96 -0.67 0.73 -7.35 0.72 3.74 0.03 3.77 3.26 0.48 -9.98 0.09 -10.07 0.05 -10.12 13.38 3.66 0.03 3.69 0.36 -9.94 0.08 -10.02 0.05 -10.08 13.38 3.3 -9.89 0.11 -10 12.68 2.68 0.08 2.76 0.09 2.85 -9.6 0.29

CFD ANALYSIS OF FILM COOLING OF A CYLINDRICAL LEADING EDGE WITH COMPOUND ANGLE INJECTION Bijendra Singh Chufal, A.R. Srikrishnan Fluent India, Green Terraces, South Koregaon Park, Pune-411 001, India E-mail: [email protected] INTRODUCTION Film cooling is an effective method for providing thermal protection to the blades of high-performance gas turbines that operate under elevated temperatures. In this method, coolant gas is tapped from the compressor of the engine, and injected through holes on the blade surface to the high temperature boundary layer on it, thereby providing a protective film over the surface. Several parameters like the density ratio between the coolant and the hot gas, the blowing ratio, the angle of injection, relative locations of the injection holes on the surface and free stream turbulence intensity significantly influence the film cooling performance. Optimization of the mass flow rate of the coolant is essential for efficient operation of the system. Compound angle injection, where the coolant jets issue from multiple rows at different angles is found to be more effective in providing film protection to the blade than simple angle injection [1]. In the research on film cooling, computational studies have a great potential as they can be instrumental in evaluating the role of each of the above parameters on the cooling effectiveness. However, intricate geometric features coupled with the complexities in the flowfield due to the anisotropy of turbulence, thermal mixing of coolant jets etc. pose severe challenges to the numerical modeling of film cooling. Many studies conducted in the recent years were confined to simple geometries like flat plates [3], which do not represent the geometric features of turbine blades. The present study is a CFD analysis of film cooling with compound angle injection, on a cylindrical leading edge which is geometrically representative of the leading edge of a turbine blade. As opposed to the flat plate, the cylindrical leading edge provides a close approximation of the blade geometry and Cruse et al. [2] have presented experimental data for cooling effectiveness for this geometry. The present study makes use of FLUENT, a commercial multipurpose CFD code, for the computational analysis. First part of the study is a validation of the code by comparing the simulation results with experimental data reported in [1]. The second part is focused on analyzing the effect of varying the blowing ratio on the effectiveness of film cooling. Blowing ratio refers to the ratio of mass fluxes (density x velocity) of the coolant and the hot fluid (main stream). A low value of blowing ratio may result in inadequate cooling and a high value leads to an excessive loss in output power (as more working fluid is tapped out from the compressor). Hence optimization of the blowing ratio is essential for improving the efficiency of film cooling. THE GEOMETRY, FLOW DOMAIN AND BOUNDARY CONDITIONS Figure 1 shows the geometry considered in the present study. This is the computational representation of the experimental setup used in [1]. The model surface has two rows of holes -through which the coolant is injected- at an angle of 25° with each

84

85

other. The angle of injection is 20° with lateral direction. In the computational domain, periodicity is assumed between neighboring pair of holes and a horizontal plane of symmetry is assumed so that only one half of the surface needs to be modeled.

hot stre im in I t

coolant Injection holes

Figure 1: A Schematic diagram of the computational domain The free stream (relatively hot air) enters the domain at a velocity of 10 m/s and turbulence intensity of 0.5%. Inlet conditions of the coolant are specified using mass-flux values calculated based on the blowing ratio. Inlet temperatures of the 'hot' and cold streams are 300 K and 166 K respectively. The blade surface is modeled as an adiabatic wall. RESULTS AND DISCUSSION Part - 1 Validation Studies The experimental data [2] on adiabatic cooling effectiveness r| for this geometry were used in the validation study. The adiabatic cooling effectiveness is defined as the difference between the local wall temperature and the main stream temperature normalized by the the difference between the coolant inlet temperature Tcooiant, and the main stream temperature, To=. That is, r\= (Twan - T„)/(Tcooiant - T„). F; o [ixpenrocnial Experimental CFD: Present Work \ — CI-T):Thakurctal. - CPI): Present Work - CFD: Thafcur et al. "\ \\ 0.4 x s /°

I

V -

V^~---.__

< 0.2

J

surface coordinate, s/d

Figure 3: Comparison of predicted spanwise average of effectiveness with experimental data Figure 2 shows the distribution of adiabatic effectiveness at a location corresponding to s/d = 4.89 on the blade surface (s/d represents distance along the surface of the cylindrical

Figure 2: Comparison of predicted distribution effectiveness with experimental data

86

edge, normalized by the diameter of the injection hole). The plots compare results of the present study with the experimental data of Cruse et al. [2] as well as with the simulation results of another CFD study of this problem reported recently by Thakur et al. [4]. The predictions of our analysis show good agreement with the measured data. A comparison of the span wise averaged value of r) is shown in Figure 3. The results of our study match very well the experimental data both qualitatively and quantitatively. The validation study shows that the code (FLUENT) predicts the cooling effectiveness with good accuracy. Some deviation from the experimental data is observed near the periodic boundaries, which could be primarily due to the fact that the numerical study assumes spanwise periodicity between a pair of injection holes whereas the experimental setup deviates slightly from the ideal periodic conditions [4]. Part - 2 Analysis of the effect of Blowing Ratio After validating the code, further studies were conducted to analyze the effect of blowing ratio (BR), which is one of the most important parameters in film cooling. To improve the overall efficiency of this method of cooling, it is essential to employ an optimum value of blowing ratio. In the present study, a number of simulations were carried out using different values of blowing ratio (ranging from BR = 0.5 to BR = 4.0) and in each case, the velocity and temperature fields were analyzed. In Figure 4, the variation of (spanwise averaged) cooling effectiveness with the blowing ratio is plotted at various streamwise locations. An important characteristic of the plots is the yield of diminishing returns with increase in blowing ratio after a certain range. That is, while increasing the blowing ratio beyond say, 2.0, the resultant increase in effectiveness becomes progressively less. This can be readily inferred from the plot in Figure 4 and the quantification of this trend is an important outcome of the present study. The extent (or "depth") of penetration of the jet into the mainstream directly influences the cooling effectiveness. The more the jet penetrates into the hot stream, lesser will be the cooling effect on the surface. Based on a detailed examination of the contour plots of the flow variables, we find that the density distribution on a transverse plane can be used as a measure to quantify the penetration of the coolant at any location in the flow field (please note that the coolant density is 1.8 times larger than density of the main stream and density of the 'mixed' stream is calculated using weighted averaging). We use the local density ratio, LDR (defined as the ratio of the local value of density to the density of the hot fluid), as a parameter to assess the extent of penetration of the coolant. Figure 5 compares the LDR distributions for the blowing ratios of 0.5 and 4.0. The difference is to be noticed: At blowing ratio = 4.0, the curve indicates an increase in concentration of the coolant farther from the surface (at higher y/d) which in turn, is a result of the coolant jet dissipating deeper into the mainstream. The observation that the penetration increases prominently with increase in blowing ratio substantiates the increasingly lower gain in effectiveness at higher blowing ratios (Fig. 4), discussed above. The temperature field on the leading edge surface is compared in Figure 6 for blowing ratios of 1 and 4. The predominantly three dimensional nature of the higher blowing ratio case stems from the lateral spreading of the coolant jet by way of its higher momentum

87

component in the transverse (z-) direction. For blowing ratio in the range of 0.5 • 1.0, the coolant jets just get spread along the direction of the mainstream (Figure 6-a).

Normal distance above surface, y/d

Figure 4: The increase in average effectiveness with blowing ratio at various axial locations

Figure 5: The variation of density ratio in a direction normal to the blade surface, at s/d ~ 12

Figure 6-a: Temperature contours on the surface for blowing ratio = 1

Figure 6-b: Temperature contours on the surface for blowing ratio = 4

CONCLUSION Accurate prediction of film cooling on a three dimensional, cylindrical leading edge is achieved using a commercially available multipurpose CFD solver. The study provides quantitative insight to the variation of the cooling effectiveness with blowing ratio. It is found that a blowing ratio of 2 will be the optimum for a configuration like this. REFERENCES 1. Azzi, M.Abidat, B.A. Jubran and G.S. Theodoridis "Film Cooling Predictions of Simple and Compound Angle Injection from one and two Staggered Rows", Journal of Numerical Heat Transfer. Part - A, Volume 40, 2001, pp 273-294. 2. Cruse, M.W., Yuki U.M. and Bogard, D.G. "Investigation of Variuos Parametric Influences on Leading Edge Film Cooling", ASME Paper 97-GT-296, 1997. 3. Amer A.A., Jubran B.A. and Hamdan, M.A. "Comparison of different twoequation Turbulence Models for Prediction of Film Cooling from two Rows of Holes", Numerical Heat Transfer -Part A, Volume 21, 1992, pp 143-162 4. Thakur, S.S., Wright J.A. and Shyy, W., "Convective Film Cooling over a Representative Turbine Blade Leading Edge", International Journal of Heat and Mass Transfer, Volume 42, 1999, pp 2269-2285

FLOW ANALYSIS IN PORT AND CYLINDER OF A SPARK IGNITION ENGINE WITH OBLIQUE VALVE LUOMAJI

CHENGUOHUA

College of Energy and Power Huazhong

MAYUANHAO Engineering,

University of Science and Technology, Wuhan, 430074, E-mail: [email protected]

China

In this paper, a numerical analytic model based on the KIVA-3 code is developed for the threedimensional transient intake flow in a small high-speed two-valve-per-cylinder motorcycle engine, including a moving piston and a moving inclined intake valve. The valve model adopts the bodyfitted technique, the dynamic grids induced by the moving piston and moving valve are automatically generated by the snapper technique and the grid re-meshing technique separately. Turbulence is represented by RNG k-«model. The calculated results reveal the formation of the incylinder tumble motion, the variation of tumble ratios and the cylinder pressure. Comparison with the measured engine cylinder pressure shows that the simulation result is generally in good agreement with the experiment. The simulation results provide important information for the design of engine intake system.

1

Introduction

The in-cylinder flow of an internal combustion engine has great effect on the major engine performance characteristics such as the efficiency, combustion stability, heat loss, emission of NOx and hydrocarbon, etc[l]. The in-cylinder flow structure set up by the intake process is affected greatly by the intake system design. Facts show that the intake system with a semi-sphere combustion chamber and a radial port is very effective to the small high-speed two-valve-per-cylinder motorcycle engines, but there are few researches on the complex intake flow phenomena in such engines. Transient numerical simulation of the intake flow has been a challenging task because of the complicated geometry with moving piston and valves. The KTVA-3 code[2] has been widely used in multi-dimensional numerical simulation of the flow, combustion and emission in internal combustion engines. But it can not deal with the moving valve. In order to conduct 3D transient numerical simulation of intake flow in the intake port-valve-cylinder system on a high speed small gasoline engine, a bodyfitted valve model is added to KTVA-3 code. The valve model put forward within this paper can treat vertical and inclined valves, and has higher calculation precision. 2

Numerical Methods

The governing equations of gas flow consist of the mass, momentum and energy conservation equations, turbulence equations, gas state relation and other auxiliary relations. The details can see Ref.[3]. But the turbulence is represented by the RNG k-e model that developed by Yakhot et al[4]. The governing equations are discretized by the arbitrary Lagrangian-Eurian (ALE) method for structured non-orthogonal meshes. A transient solution to the discretized governing equations is marched out in a sequence of time steps or cycles. 88

89

3

Valve Model

To exactly fit the valve shape, the body-fitted technique is used to establish the valve model. In the body-fitted valve model, the shape of the valve (including the valve stem) is represented by surface body-fitted grid in the grid generation, and vertices on the surface body-fitted grids move according to the valve moving schedules in the flow calculation. The valve moving schedules include the intake valve lift profile and the intake valve moving velocity profile, as shown as Fig. 1(a) and (b).

Crank Angle (degree)

Crank Angle (degree)

(a)

Intake valve lift profile (b) Intake valve velocity profile Fig. 1 Intake valve moving schedules

4

Geometry and Grid Generation

Fig. 2 Computational grids of the intake port-valve-cylinder

The geometry used in this study is the exact geometry of the intake port-valvecylinder system in a high-speed small motorcycle engine. To generate the 3D bodyfitted grid for the computational domain, the commercial software Pro/Engineering is used to build the 3D geometry model and then is imported to the commercial software ICEM-CFD to generate the complex grid (Fig.2). Due to the piston and valve movement, the computational domain is continuously varying as a function of crank angle in 3D transient simulation of the intake process. In order to dynamically adopt the computational grids to the variation of the computational domain, the computational grids are rezoned at different positions of the piston and valve. The snapper technique is used to maintain a reasonable axial mesh size as the piston moves between the top dead center (TDC) and the bottom dead center (BDC). The dynamic grids induced by the valve movement are generated by the grid re-meshing method. The basic idea of this method is to regenerate the local grids near the valve on each time step (corresponding to each new valve position). The re-meshing grids are generated by the elliptic grid generation approach. 5

Calculation Results and Discussions

The calculation is carried out throughout the intake process, starting at the intake valve opening (IVO, at 0°ATDC, after top dead center) and ending at the intake valve closure (IVC, at 200°ATDC).

90

5.1

Velocity Fields

The velocity fields on the xz-plane, that is the symmetric plane of the port and the cylinder, are shown at different crank angles in Fig. 3. The phenomenon such as the backflow into the intake port, the dual-vortex structure and the large scale tumbling motion can be observed.The flow field evolution on the xz-plane indicates that the incylinder tumble motion is formed at the late stage of the intake process, and the incylinder flow structure set up during the intake process mainly affected by the flow toward the exhaust side.

10.10°CA

5.2

120.05°CA Fig. 3 Velocity fields on the xz-plane

200.03°CA

Pressure Fields

The pressure distributions on the xz-plane are shown in Fig. 4. They indicate the behaviors of flow resistance during the intake process are different. During the early and late intake process, due to the low valve lift, the intake pressure loss is principally caused by the valve's obstruction. During the middle intake process (from 60°CA to 120°CA), the intake lift becomes higher and the loss is dominated by the resistance of the intake port. The pressure distributions in the cylinder are uneven during the intake process.

20.09°CA

100.17°CA Fig. 4 Pressure fields on the xz-plane

125.07°CA

5.3 Integrate Flow Parameters To characterize the in-cylinder large-scale flow structure, tumble ratios R^ and Ra defined as Reference [5] are calculated. Fig 5 shows variations in the i?^,and

91

Rtx with respect to the crank angle. In-cylinder mean pressure p

t

is an important parameter for the in-cylinder gas

state. Fig 8 shows the variation in /? j with respect to the crank angle. It indicates that the in-cylinder mean pressure is fluctuating during the intake process. 5.4 Result Validation At present, it is difficult to perform velocity measurements for the transient flow fields in high-speed small engines with bores under 60 mm and speeds over 3000r/min. In this paper, the cylinder pressure is measured on the running engine to valid the calculated results. Fig 6 shows cylinder pressures versus crank angle for the experiment and computation during the intake process. It is showed that they have good agreement. 1.20 1.15 1.10 1.05 1.00 0.95 0.90 0.85 0.80 0.75 0.70 0.65 0 60 20

40

60

80 100 120 140 160 180 200

'**-*-:

A/ 20

Crank Angle (degree)

Fig. 5 Variation of tumble ratio R^

- Computed Measured

40

60

80 100 120 140 160 180 200

Crank Angle(degree)

and R.^

Fig. 6 Comparison of computed and measured cylinder pressure

6

Conclusions

Within this paper flow analysis in the port and cylinder of a spark ignition engine with oblique valve is performed. The body-fitted valve model put forward can be used to treat the vertical and oblique valves. Detail information about the flow field is obtained. Simulation results are helpful to understand the complex flow phenomena during the intake process and provide theoretical basis for optimizing the structure of the engine intake system.

92

References 1. Shi Shao-xi., "Recent progress in combustion technologies for automotive engines", Combustion Science and Technology (Chinese), 7»1«, pp. 1-15, 2001. 2. Amsden A A, "KTVA-3: A KTVA program with block-structured mesh for complex geometries", Los Alamos National Laboratory Report LA-12503-MS, March 1993. 3. Amsden A A, O'Rourke P J, Butler T D, "KTVA-II: A computer program for chemically reactive flows with sprays", Los Alamos National Laboratory Report LA-11560-MS, May 1989. 4. Yakhot V, Orszag S A, "Renormalization group analysis of turbulence I - basic theory", Journal of scientific computing, vol 1, pp. 39-51,1986. 5. W T Kim, K Y Huh, J W Lee, et al, "Numerical simulation of intake and compression flow in a four-valve pent-roof spark ignition engine and validation with LDV data", Proc Instn Mech Engrs, Vol 214 Part D, pp. 361-372, 2000.

ANALYSIS OF CHEMICAL REACTIONS IN ARGON THERMAL PLASMA FLOW USING THE EDDY DISSIPATION CONCEPT K. SUNDARAVADIVELU, H. W. NG, S. C. M. YU, J. C. CHAI AND Y. C. LAM School of Mechanical and Production Engineering, Nanyang Technological University, Nanyang Avenue, Singapore - 639798 E-mail: [email protected] Thermal and fluid flow characteristics of argon plasma coupled with particle dynamics is investigated in the presence of most effective chemical reactions. The reaction rate in turbulent flows is determined by the eddy dissipation concept using Fluent 6.0. Plasma temperature, velocity, mass fractions, particle temperature and velocity distributions are presented and discussed. A modification to the standard eddy dissipation model (EDM) is carried out to improve the reaction rate calculations, based on the Saha equation. This eliminated one of the basic assumptions of the EDM that the temperature should be sufficient for a chemical reaction to occur. The most unique aspect of this modified model is that, it can handle any order of ionization reactions. A comparison between the standard EDM and the modified EDM is made and physical explanations for the choice of the modified EDM is provided to substantiate its usefulness.

1

Introduction

Thermal plasma spraying techniques are widely used to produce protective coatings of high melting point and highly reactive materials and may also be used to manufacture particulate or fibre-reinforced ceramic or metal matrix composite deposits, [1]. Currently optimization of thermal spraying techniques is achieved by empirical means and there is still insufficient scientific understanding of the physical mechanisms controlling plasma gas and entrained particle behavior. Because of the extremely high temperatures and velocities, it becomes difficult to measure gas and particle temperatures and velocities in a plasma jet, although some experimental data has been obtained using specialized techniques. Therefore in the absence of conventional experimental techniques, numerical modeling has been promoted as a method to investigate the fundamental behavior of the plasma gas and particles, and as an aid too to optimize the plasma spraying processing conditions. Numerous efforts have been made to date to calculate the temperature, velocity of the plasma jet together with the particle dynamics [2], but with less attention paid to model chemical reactions that are inevitable in high temperature gas dynamics. Therefore in this work we propose a modified version of an existing chemical reaction rate model and study the resulting gas dynamics together with the subsequent particle transport phenomena. 2

Mathematical Model

The governing equations of turbulent plasma flow consist of the conservation of mass, momentum, species and energy equations in computational domain shown in figure 1. The fully elliptic Navier-Stokes equations together with the standard K-e turbulent model govern the above conservations laws. It is assumed that the plasma is in local thermodynamic equilibrium, thermodynamic and transport properties are temperature dependent and radiation effects are negligible. The steady state conservation equations are written in their general form as:

93

94

where <j) is the process variable and is listed below together with their respective diffusion coefficients and source terms are listed in table 1. Table 1: Equation

Variable (0)

Mass Momentum Energy Species Turbulence

1 u,v T Xi K,

e

Diffusion Coefficient (-T) -

Source Term (.St) -

r

SE+Zh.R, ILMjRi

a

A, Yt

£ k - P £ > c i £ ~Gk

G

K

-C2Ep

£2 — K

The reaction rate /?, in the source term is calculated using the standard Eddy Dissipation Model (EDM) and is expressed as i) R, = A.c, (e I K) and ii) Rf = AZf (e I K\X, which is the modified EDM or (MEDM). The quantities appearing in the above expressions viz., A, cf are respectively the empirical constant and time mean concentration of reacting species. The additional term % appearing in the MEDM is the degree of a reaction, calculated from the well know Saha equation, X2(y-X2TX =2QiQ^(2me)V2(kT)5,2(h3Pr1 exp(-E/kT), [1], in order to make the EDM, a temperature dependent one. Although eddy dissipation models are purely empirical, it is found to yield reasonable results. Quantities ht and M, appearing in the source terms are the heat of reaction and molecular weight of i'h species respectively. To account for the heating of primary gas by the electric arc we assume a volume averaged energy source term, as SE =(P0Ut /Anode volume), where Pou, is the torch power output calculated through the arc voltage, E, the arc current, I and the torch efficiency, r|t, assumed here to be 70 % (For details see [2]). In figure 1, the mass flow rate of argon gas is specified (2 g/s) at the inlet AB, together with a volume averaged heat source of 2.64 x 101 inside the nozzle calculated using the above equation for SE, upon using 2.1 kW of torch power output. At the solid boundaries (BC, CD, DE), the velocity is assumed to be zero and the temperature is specified at 500 K. At the free plasma plume, the free boundaries at EF and FG are assumed to be at a constant pressure of one atmosphere and at ambient temperature of 300 K. Another boundary condition to be specified is the mass fraction (m) of argon which was set at inlet (AB) m = 1 and at free boundaries m = 0. The mass fractions of the air species are set to be 0 at the inlet and 0.8 at the free boundaries. The governing equations are solved using the well-known SIMPLE technique based on control volume theory that is built in Fluent 6.0. 3

Results and Discussions

Figure 2 depicts the centerline plasma temperature and velocity together with the respective experimental temperature data. Numerical prediction is found to be in good agreement. The computed mass fractions of different chemical species in the presence of five different chemical reactions viz., two dissociation reactions involving molecular oxygen and nitrogen and three first order ionization reactions involving atomic argon,

95

oxygen and nitrogen, are shown in figure 4. Argon mass fraction computed numerically is found to match well the experimental data. As the first order argon ionization reaction is not that effective for the computed temperature range, the resulting mass fraction of ionized argon species is found to be small. The air species mass fractions are found to be significant in the downstream of the torch and thus clearly revealing the exact location of ambient air entrainment. Mass creation of atomic oxygen is found to be significant when compared with that of atomic nitrogen from their respective molecular dissociation reactions. This is as expected, because the oxygen molecule is of higher molecular weight, than that of nitrogen molecule, and hence it may travel with a low velocity so that the interaction among oxygen molecules may be more significant. On calculating the ionization reaction rates, not reported here, of atomic oxygen and nitrogen, it is found that nitrogen ionization reaction rate is higher. This is attributed to the low molecular weight of nitrogen and so that it can travel faster and may interact with electrons that are traveling with high velocity. Figure 3 depicts the difference in the calculated mass fractions of argon through the EDM and the MEDM. Mass fraction of the singly ionized argon species predicted by the standard EDM is found to be significantly high when compared with that of predicted by MEDM. The former (EDM result) is not realistic because the Saha equation reveals that the maximum degree of the first order argon ionization reaction in our computed temperature range should be less than 0.1 and is not met by the standard EDM. This is because the standard EDM is not temperature limited, whereas the MEDM predicts the mass fraction within the Saha equations' limit. Figures 5 and 6 present the particle velocity and temperature at an injection velocity of 20 m/s and for different particle dimensions. It is observed from figure 6 that a particle of diameter above 80 \xm is melted completely. The arrow in this figure indicates the melting temperature of the solid alumina particle. References 1. Boulos M.I., Fauchais P. and Pfender E., Thermal plasmas: Fundamentals and Applications, Plenum press, New York (1994). 2. Remesh K., Particle behavior and coating formation in plasma spray process, Ph.D. Thesis, Nanyang Technological University, Singapore (2002). 3. Chang C.H. and Ramshaw J.D., Numerical simulations of argon plasma jets flowing into cold air, Plasma Chemistry and Plasma Processing, 13 (1992) pp. 189-209.

Powder Injection Channel Torch Body

Free Plasma Plume

mt (

ymrrietry Axis

Nozzle Particle Injection Position 120

Figure 1 Schematic diagram of the computational domain (unts in mm).

96

I

100-Temperature (K) - Velocity (m/s) Fincke et al. (Refer to [3])

10-

1

0.02 0.04 0.06 0.08 0.1 0.12

0.04

AAxial Distance (m)

0.06

0.08

Axial Distance (m)

Figure 2 Axial dependence of flow characteristics

Figure 3 Comparison of Ar+ mass fractions.

0.06

0.03,

0

0.02 0.04 0.06 0.08

0.1

0.06

0.12

Axial Distance (m)

Figure 4 Axial dependence of various species mass fractions.

d = 20 micron d = 40 micron d = 80 micron

0.02

0.04

0.06

0.08

0.1

Axial Distance (m)

Figure 5 Particle velocity.

0.1:

0

0.02 0.04 0.06 0.08 0.1 0.12 Axial Distance (m)

Figure 6 Particle temperature.

NUMERICAL SIMULATION OF 2-D REACTING FLOW IN NOZZLE OF LIQUID ROCKET MOTOR QIANG HONGFU,* YANG YUECHENG AND XIA XUELI Xi 'an Hi-Tech Research Institute, Hongqing Town, Xi'an, Shaanxi, PRC, 710025 Email address: [email protected] This paper presents a numerical method to calculate combustion coupled ejection flow in nozzle processes in Liquid Rocket Motor (LRM). A two-dimensional axi-symmetric numerical simulation for the injector element-combustor-and-nozzle model is carried out using the Finite Different (FD) method. In the simulation, the available Computational Fluid Dynamics (CFD) software FLUENT is employed with three key enhancements for material modelling. Firstly, the influence of the twophase reacting flow is considered, in which the coupled implicit two-step finite difference method to solve the Re-averaged conjugated N-S equation with component conservative equation is used. Secondly, the 17-component, 12-step finite-rate chemical reaction model is employed. Thirdly, the Baldwin-Lomax model is used as turbulence model. These three enhancements are incorporated into the FLUENT code through its user subroutine capability. For both ignition and exhausted stages, the unsteady flow method is used while the steady flow method is adopted for the normally working stage. The distributions of the flow parameter in the combustor-nozzle system are achieved. In addition, the variations of flow parameter along with the time in different stages are also obtained. The results obtained from the numerical simulation are compared with independently available field tests data. Good agreement is observed.

KEYWORDS Combustion, Chemical Reaction flow, Finite-rate model, N-S equations, Nozzle, Liquid Rocket Motor

1

INTRODUCTION

The computer simulating technology for the test firings of LRM is a relative new and developing research method, in which the computer image processing, combustion theory, and flow analysis technology are integrated. It plays an increasingly important role in detailed analyzing complex physical-chemical phenomena including mixing, combustion and flow in entire LRM. Many endeavours were conducted in turbulence modelling on combustion [1-3, 5] and dynamic modelling on low frequency combustion [4]. However, little attention on turbulence combustion coupled flow in a nozzle domain is considered, i.e., numerical simulation for the entire LRM working process, this paper tries to explore the probability for it, and it is proved that numerical simulation for test firing process of LRM is feasible. In this paper, the influence of the two-phase flow is considered and the coupled implicit two-step finite difference method to solute the Re-averaged conjugated N-S equation with component conservative equation is used in the combustion coupled viscous flow in the nozzle system. For ignition and exhausted stages, the unsteady flow method is applied while the steady flow method is adopted for the operating process. Amongst them, the 17component, 12-step finite-rate chemical reaction model is employed and the BaldwinLomax model is used as turbulence model. Numerical simulations are performed widely, many results were obtained including both chamber and nozzle, and then compared with the available experimental data, good agreement is observed.

97

98

2

NUMERICAL MODELLING

2.1

Chemical dynamics model of combustion

The reacting products after mixing and combustion violently are very complicated in the combustor, simultaneously accompanying high temperature in a very short time, this procedure leads to fast decomposition and the distribution proportion of the internal energy for all products in all degrees of freedom. 17-component, 12-element chemical reaction model is employed in the model. The reaction equations are as followings:

CH3NHNHCH3 + 3.2HN03 <=> 2C02 +5.6H20 + 2.6N2 CH3NHNHCH3 + 2N20A <=> 2C02 + AH20 + 3N2 H2Ott-H2+OH 2 2 N2+02^2NO

C02aCO C02+H2^CO

+ -02 2 + H20

2HN03 <^> N2OA + H20 + -02 NO + HN03 d 3N02 + H20 H2 ^>2H 02 O 20 2NO + 02d 2N02 N2 <=> 2N All the above is incorporated in by means of the finite-rate-model (FRM), regarding to this model you can refer to [1]. 2.2

Control equations

If we ignored the gravitational body force, heat radiation and taking the combustor as an adiabatic system, the conservative N-S equation of 2-D axi-symmetric and incompressible flow can be expressed as [4]:

U,+Fx+Gy=±{Rx

+ Sy)

K

e

where

U =[p,pu,pv,e]T , F =[pu,pu2 + p,puv]T , G = [pv,puv,pv2 + p,v2(e + p)f ,R =[0,T ;ty ,T w ,5 4 ] r , p=

(y-l)[e-p(u2+v2)/2]T

and other variables can be indicated in [2] due to the length of paper. 2.3

Turbulence model

Regarding to the study, the k- £ turbulence model using RNG algorithm which comes from non-conservative N-S equation is used and its transportation equation is:

99

DkL_d_ dk P— = ^((XkHeff —) + Gk+Gb-pekM Dt ~~ dx. * to/

YM

p = ia M )+c iGt+c Gb) c p

^ ir ' ^ ^

^ ~ ^ T~R

where, all the variables in above on can be indicated in [5] for the limitation of the paper. 2.4

Boundary conditions

•

Inlet B.C. The inlet B.C. is determined by fuel-oxygen injector located at the head of the combustor, it is a group of spray flows, the B.C. includes the given velocity, the pressure, the temperature, the flow flux, as well as the certain spray angle and rotational angular velocity. • Wall B.C. The wall of the combustor and the nozzles are assumed to be adiabatic, of which the zero-slide grids are used and the boundary layer effects are taken into account. The temperature, pressure and mass fraction of ingredients are conveniently obtained by extrapolating from their first order derivatives, while the energy is obtained by extrapolating form second order derivative as, ut \w=0

and dX/dn\w=0,

where X

represents the temperature, pressure and other ingredients. The gas is taken as viscous gas, whose density is calculated from the state equation of the gas. • Axisymmetric B.C. The axisymmetric B.C. can be used by the symmetry of the structure. • Outlet B.C. The certain extension of the outlet boundary condition is used at the outside of nozzle. The values of the ambient pressure, the circumfluence temperature, the turbulence intensity and the hydraulic diameter etc. are given at the outlet boundary. 3

NUMERICAL EXAMPLE

An example is included to illustrate the results of the three enhancements incorporated. The LRM consists of head and body two parts, the former consists of injector and combustor, the latter consists of nozzle mainly. In order to study behaviours for reacting flow along the nozzle, the 2-D integrated model for injector element-combustor-andnozzle model is established by consideration of solid structure and the periodicity and symmetry of the flow distribution. The B.C. is given by design requirements and the computation domain is divided accurately by using quadrilateral grids with GAMBIT preprocessor. There are 5046 elements and 5217 nodes, including 186 elements and 282 nodes for the shell part, 3720 elements and 4059 nodes for the inner flow field, 1239 elements and 1140 nodes for the outer flow field, see Figure. 1. The simulation results are showed in Figure 2 to Figure 6. Figure 2 showed the Mach number distribution of flow field under stable condition. The others showed the unstable condition including both ignition and exhausted stages. • Stage of ignition

100 Figure 2 showed the thrust reaching 229.5 KN within 0.39 seconds, and satisfied the design requirement for reaching 30% of the rated propulsion within 0.5 seconds during the stage of ignition.

.F*""

Oct 20,1999 FLUENT 5 1 (2d. coupled imp. spofl, mgho) C s K V W l 9t MldYiriinlsU'

JIttipri siijaj. ^wpjetliwpiipa rn#

Figure 1 The 2-D mesh for LRM model

Figure 2 The mach number distribution of flow field

- Experiment -Simulation

Time (S)

Time (S)

Figure 4 Chamber pressure curves of at the stage of ignition

Figure 3 Thrust curves at the stage of ignition

- Experiment - Simulation

-1—'

1 S

'—r6

7

6

Time (S)

Figure 5 Thrust curves at the stage of exhausted

5

6

7

8

Time (S)

Figure 6 Chamber pressure curves of at the stage of exhausted

Figure 3-4 showed simulation results are lower than that of experimental data. What the reasons are: firstly, the ageing of propellant does not be considered, simultaneously, the

101

data of the propellant energy characteristics are not very accurate; Secondly, the vaporization model is not ideal than that of practical condition, so leads to the low chemical reacting energy. • Stage of stable The design values of the thrust are among 240±2KN. The low divergence of the curve shows the good stable flow. For the space limit, the plots are ignored. • Stage of exhausted Figures 5-6 showed both thrust and chamber pressure satisfied requirement for design value within less than 1.2 seconds when shut down. What the simulation results are lower than that of experimental data is similar to the stage of ignition. 4

CONCLUSIONS

The results showed this method could be applied in large complex LRM system numerical simulation. The major performance data agree with those of the experimental ones. The divergence is less than 10% of the experimental data amongst ignition, steady and exhausted stages. All of them reflect the correctness and validity of the simulation model. Through this research work, a new method using geometry modelling for entities of LRM, physical modelling for turbulence combustion and flowing as well as data analysis is given. This new method can be applied to replace part of engineering experimental work, and it explores a new way on how to prolong the service life of the rocket motor. REFERENCE 1. Magnussen B.F. and Hjertager B.H., One mathematical models of turbulent combustion with special emphasis on soot formation and combustion. In 16th Symp. (Int'l.) on Combustion. The Combustion Institute, 1976. 2. Gran I.R., Ertesvag I.S. and Magnussen B.F., Influence of turbulence modeling on predictions of turbulent combustion. AIAA Journal, 35 (1997) pp. 106-110. 3. Gran I.R. and Magnussen B.F., A numerical study of a bluff-body stabilized diffusion flame. 2. Influence of combustion modeling and finite-rate chemistry. Combustion Science and Technology, 119 (1996) pp. 191-217. 4. Karabeyoglu MA, Altman D.Dynamic modeling of hybrid rocket combustion. Journal of Propulsion and Power, 15(1999) pp. 562-571. 5. Choudhury. D., Introduction to the renormalization group method and turbulence modeling. Fluent Inc. Technical Memorandum TM-107, 1993. Correspondence author. Present address: Centre for ACES, Department of Mechanical Engineering, National University of Singapore, Kent Ridge Crescent, Singapore 119260. Tel.: +65-68744796; fax: +65-68744795. E-mail, [email protected].

SIMULATIONS OF THE ONSET OF CONVECTION IN A NONNEWTONIAN LIQUID UNDER FIXED SURFACE TEMPERATURE BOUNDARY CONDITION KEE CHIEN TING Department

Of Chemical Engineering, Faculty of Engineering, Loughborough, LEU 3TU, UK E-mail: [email protected]

Loughborough

University,

KA KHENG TAN AND THOMAS S. Y. CHOONG Department Of Chemical and Environmental Engineering, Faculty of Universiti Putra Malaysia, 43400 Serdang, Selangor Darul Ehsan, E-mail: [email protected]; [email protected]

Engineering, Malaysia

The onset of convection in a non-Newtonian liquid under Fixed Surface Temperature (FST) boundary condition has been simulated using a CFD package by Tan et al. (2000). For timedependent bottom-heating simulations, onset of convection were found to occur in a continuous shear thinning non-Newtonian liquid bounded by two horizontal rigid surfaces with fixed surface temperature and adiabatic vertical walls. In this study, further simulations were conducted for nonNewtonian liquid with flow behavior index n in the range of 0.70 to 0.94. Also, the liquid properties used in this study were based on Tien et al.'s (1969) experimental data for CMC (Carboxy Methyl Cellulose). Simulation results were found to be in good agreement with Tan et o/.'s (2000) result where critical transient Rayleigh number for non-Newtonian liquid was found to vary with flow behavior n of the Power Law model.

1

Introduction

Lord Rayleigh (1916) studied the onset of buoyancy convection in a horizontal liquid layer bounded by two free surfaces based on an adverse linear temperature gradient. He showed that convection occurred when the value of Rayleigh number (Ra^gaATa^/vK = gafiATcf/vK) exceeded its critical value. For natural convection induced by a time-dependent and non-linear temperature profile in a deep fluid, Tan and Thorpe (1996) developed a new transient Rayleigh number for the deep fluid under various boundary conditions. They defined a transient Rayleigh number as a function of penetration depth, z, and the local temperature gradient clfT/dz as Ra=gccz4(dT/dz)/vK They also proposed that the correct way to begin any stability analysis is to identify the Biot number. Tan et al. (2000) successfully simulated the onset of convection in a non-Newtonian liquid under Fixed Surface Temperature (FST) boundary condition using a commercial Computational Fluid Dynamics (CFD) package. They successfully applied the critical transient Rayleigh number

102

103

for non-Newtonian liquid Ra^c derived by Tan (1994) to show that RCINNC for non-Newtonian liquid varies with flow behavior n. However, Tan et a/.'s (2000) simulations were conducted for a limited range of flow behavior index n. Moreover, properties of dilute nonNewtonian liquid were assumed to be similar to that of water, as suggested by earlier researchers (Tien et al, 1969; Liang & Acrivos, 1970; Ghani et al, 1999). Although their simulations showed convincing results, further verifications for a wider range of n and with real CMC properties were deemed necessary. In this study, a computational fluid dynamic (CFD) package FLUENT was used to simulate the onset of convection in the non-Newtonian liquid with n in the range of 0.70 to 0.94. Carboxy Methyl Cellulose (CMC) properties used in Tien et a/.'s (1969) experiment were adopted. Simulation results were again analyzed using Tan's (1994) theory of transient convection in non-Newtonian liquid. Results obtained in this study will be compared with Tan et a/.'s (2000) results. 2

Theory

The detailed mathematical derivation for the transient critical Rayleigh number for non-Newtonian liquid Ra^c can be found in Tan (1994) and a summary is shown in Tan et al. (2000). Recalling the maximum transient Rayleigh number, which at the onset of convection may be expressed in terms of critical time, tc, as: Ra *vuAWmax

_P8<x(4 + 4nrnJtXn(T0-Ts){ —

„.

/—

(Hn)] Lc

J

W

KylK

For Newtonian liquid, n = 1, Eq.(l) becomes; 4.89

gafiC{T0-Ts)

^max =

(2) V

3

Methods

In this study, 2-D time-dependant simulations with setups identical to Tan et a/.'s (2000) study were conducted. Liquids were bounded by two rigid surfaces. The vertical walls were kept adiabatic. The top and bottom surfaces were held at initial temperature T=293K and at time t = 0s, the bottom surface was suddenly heated by an increase of temperature to

104

T+AT. CMC with n and K values as tabulated in TABLE 1 were simulated. TABLE 1: Properties of non-Newtonian liquid, CMC (Tien et al. 1969).

n CMC-A CMC-B CMC-C

4

0.94 0.85 0.70

K (Pa.sn) 0.337 1.130 0.331

P (kg/m3) 0.995* 103 1.014*103 1.010*103

a

cP

(K"1) 5.51*10" 4 4.43* 10"4 4.93* 10"4

(J/kg.K) 4.187*103 4.187*103 4.187*103

k (W/m. K.) 6.11*10"' 6.49*10"' 6.77*10"'

Results and Discussion

The critical times and the critical transient Rayleigh numbers for the bottom heating of glycerine and non-Newtonian liquid under FST boundary condition obtained from the simulations is tabulated in TABLE2. The maximum transient Rayleigh number at the critical time for nonNewtonian liquid RaNNc were calculated using Eq.(l). RaNNc versus consistency index, n, in comparison to the value obtained from Tien et al. (1969) were plotted in FIGURE 1. TABLE 2: Critical times and critical transient Rayleigh Numbers for bottom heating of glycerine (Tan et al., 2000) and non-Newtonian liquid under FST boundary condition.

Glycerine

CMC-A

CMC-B

CMC-C

0.94 0.337

0.85 1.130

0.70 0.331

(Tan et al, 2000)

n K

1.00 1.350

4T=50K

tc Rac

80 1465

tc RaNNc

120 1303

270 527

230 540

AT=25K

tc Rac

180 1530

tc RaNNc

170 1076

400 447

260 313

AT=\0 K

tc Rac

450 1434

105 10000

1000

*

z z This study CMC-A • CMC-B

100

• CMC-C _ Tien et al. (1969) - - -Ozoe&Churchill (1972)

Tan et al. (2000) • Glycerine • CMC-1 CMC-2 • CMC-3 CMC-4

10 0.4

0.6

0.8

FIGURE 1 : Ramc versus n for non-Newtonian liquid under FST boundary condition.

5

Conclusion

Simulations were successfully conducted for non-Newtonian liquid CMC with real properties and with flow behavior index n between 0.94 and 0.70. Further verifications were successfully carried out for critical transient Rayleigh number Ramc for non-Newtonian liquid derived by Tan (1994) under FST boundary condition. Critical transient Rayleigh number for non-Newtonian liquid was found to vary with flow behavior n. References Ghani, A. G. Abdul, Farid, M.M., Chen, X.D. & Richard, P., "Numerical simulation of natural convection heating of canned food by computational fluid dynamics", J. Food Engng, 41 (1999), pp. 55-64,.

2. Liang, S.F. & Acrivos, A., "Experiments on buoyancy driven convection in non-Newtonian fluid", Rheologica Acta, Band 9 (1970), Heft 3, pp. 447-455. 3. Ozoe, Hiroyuki & Churchill, S.W., "Hydrodynamic stability and natural convection in Ostwald-de Waele and Ellis fluids : the development of a numerical solution", AICHE Journal, vol. 18 (1972), no.6, pp. 1196-1207. 4. Rayleigh, Lord, "On convective currents in a horizontal layer of fluid when the higher temperature is on the under side", Philos. Mag., 32 (1916), pp.529. 5. Tien, C , Tsuei, H. S. & Sun, Z. S., "Thermal instability of a horizontal layer of non-Newtonian fluid heated from below", Int. J. Heat Mass Transfer, 12 (1969), pp. 1173-1178. 6. Tan K. K., Gas diffusion into viscous and non-newtonian liquids and the onset of convection, Ph.D. dissertation, Cambridge University, 1994. 7. Tan K. K. & Thorpe, R. B., "The onset of convection caused by buoyancy during transient heat conduction in deep fluids", Chem. Engng Sci, 51(17) (1996), pp.4127. 8. Tan K. K., Ting K. C. & Thomas S.Y. Choong, "Simulations of the onset of convection in a non-Newtonian liquid under Fixed Surface Temperature boundary condition", Proceeding of Chemical and Process Engineering Conference 2000, Singapore, 2000. Nomenclature d g k q t T AT Ra n K z

Total depth of liquid layer (m) Gravitational acceleration (m/s2) Thermal conductivity (W/m. K) Heat flux (W/m2) Time (s) Temperature (K) Temperature difference between top and bottom surface (K) Rayleigh number Flow behaviour index in Power Law model Consistency index (Pa.sn) Vertical distance in liquid measured from the bounding surface (m)

107

Greek Symbols a P K \i v p

Volumetric coefficient of thermal expansion (K"1) (Constant) temperature gradient (°C/m); f3=AT/d Thermal diffusivity (m2/s) Viscosity (Pa.s) Kinematic Viscosity (m2/s) Density (kg/m3)

Abbreviation CFD FST NN CMC

Computational Fluid Dynamics Fixed Surface Temperature Non-Newtonian liquid Carboxy Metyhl Celulose

Subscripts c 0 s max

Critical Initial state Surface Maximum

COMPUTER VISUALIZATION OF FLUID CIRCULATION IN ANNULI OF HEATED ROTATING CYLINDERS OF LOW PRANDTL NUMBER FLUIDS

XU ZHIDAO, HUO YUNLONG & TS LEE(*) Mechanical Engineering Department National University of Singapore Singapore 119260 (*) Corresponding Author

The present work considered fluid circulation in an annular region between two cylinders with the inner cylinder heated and rotating. The Prandtl number (Pr) considered here varies from 0.01 to 1.0 and Rayleigh number (Ra) is of the order of 10 . Reynolds number Re in the range of 0 to 1120 was considered. Mono-thermal plume above the stationary inner was observed for higher Prandtl number fluids (Pr»0.1) while bithermal plume above the stationary inner cylinder was observed for lower Prandtl number fluids (Pr«0.1). However, when the inner cylinder is made to rotate, the thermal plume for higher and lower Prandtl number fluids were observed to move in different directions. The mechanism of the mono- and bi-thermal plumes movements were investigated through numerical flow visualization. The application of these studies with rotating inner cylinders are typically encounter in the food and medical processing industries

1.

Introduction and Reviews

Modes of fluid circulation and Heat Transfer in the annuli of stationary concentric horizontal cylinders of different fluids were first investigated by Kuehn and Goldstein's (1980). For the present study, the inner cylinder is assumed heated and rotating at Reynolds number in the range of 0 to 1200. The effects of centrifugal acceleration and three dimensional Taylor vortices are considered negligible in the range of parameters studied. The Prandtl number considered here varies from 0.01 to 1.0 and Rayleigh number varies from 10 to 10 . Due to the limitation of the data available from experimental work[Kuehn and Goldstein's (1980)], the radius ratio of the cylinders considered here is fixed at 2.5 [Lee et al (1984-2002)]. In the present study, natural convection is driven by vertical temperature gradient and vertical gravity force. The interaction with the effect of high rotational rate of the inner cylinder is expected to lead to complicated three-dimensional flows with Taylor vortices. Hence, the present study [Figure 1] also purposely limits the calculations to a range of parameters that would exclude this possibility. Details of the numerical methods are given in Lee(1998) and will not be repeated here.

108

109

Figure I C<>-»rili«i»tt SystWf inAiuiular Region _

j

™

* - . . : - „

'.».. :'

• :

#^vy * ' " - ^

\

^: :• ^ ::: •--•••••

• . .

X \

•'' ^

' -

^ s

Figure 2 total Nusseltwttli RsMkm ®f Inner tG}.0; RR^2,5; ;Pr»l,0; Ra« 10 )

Figure 3 Rotational Strrajnlincsantflsothertis

110

SntEAMtJNSS ;

; Rgu re 4 tffecfs of in lierCylittder R»Mttta ' # « ; RR=£:S:; :Fr=CH02<Mfci*ury); M = W?:,

<«i «*>*#. '= .
".=,1000 . .

Pigtire 5 • Effects, af Iim^r:Cylifi:a#r^la4ioii. on .MEfcao .••• A « m o « s m e e t o

.e - ^ ^ ' WB.-WS

j2#g

ftumJtMutator •

111 2.

Results and Remarks

Figures 2 and 3 shows a sample of the results. For fluid with Pr of the order of 0.02(Mercury), symmetrical bi-thermal plume was observed above the inner cylinder for the case of stationary inner cylinder [Figure 4]. Two secondary cells were formed in addition to the main cells. Below these two cells, two small bubbles which correspond to the reattachment of the boundary layer are observed. The flow pattern in the annulus is thus observed to be multi-cellular. The heat transfer characteristics for these fluid flow with low Prandtl number were found to have points of maximum and minimum at the interior nodes, instead of the top and bottom nodes which is known for the higher Prandtl number fluids. The immediate effect of rotating the inner cylinder is to cause the two smaller bubbles to collapse. The net effect is that a single thermal plume is observed to move in the direction opposite to that of the inner rotating cylinder. This thermal plume movement is different and opposite to that observed for the higher Prandtl number (Pr»0.10) fluid flow where the mono-thermal plume on top of the inner cylinder moves in the same direction as the rotation of the inner cylinder. The mean Nusselt number characterictics for various eccentric annular region is shown in Figure 5.

References 1. Kuehn, T.H. and Goldstein RJ.(1980), "A parametric study of Prandtl number and diameter ratio effects on Natural Convection Heat Transfer in Horizontal Cylinder annuli", J. Heat Transfer, Vol. 102, pp.768-770. 2. T.S. Lee, "Numerical Experiments with Laminar fluid Convection between Concentric and Eccentric Heated Rotating Cylinders", Int. J. Numerical Heat Transfer, Vol.7, No.l, pp.77-87, 1984. 3. T.S. Lee, N.E. Wijeysundera & K.S. Yeo, "Convection in Eccentric Annuli with Inner Cylinder Rotation", AIAA Journal, Vol. 24, No.l, January, 1986. pp.170-171. 4. T.S. Lee, "Mixed Convection of Low Prandtl Number Fluids in the Annuli of Concentric Rotating Cylinders", AIAA Journal . Vol.6, No.l. January-March 1992, pp. 162-165. 5. T.S.Lee, "Laminar Fluid Convection between Concentric and Eccentric Heated Horizontal Rotating Cylinders for Low Prandtl Number Fluids.", Int.J.Numerical Methods in Fluids. Vol.14, No.9, 1992. pp.1037-1062. 6. T.S.Lee, "Numerical Computation of Fluid Convection with Air Enclosed between the Annuli of Eccentric Heated Horizontal Rotating Cylinders", Computers and Fluids, Vol.21, pp.355-368, 1992. 7. T.S.Lee, "Mixed Recirculatory Flow in the Annuli of Stationary and Rotating Cylinders with Different Radius Ratios", Int. J. Numerical Methods for Heat and Fluid Flow". Vol.4, No.6, December 1994. pp. 561-573. 8. T..S.Lee, "Numerical Study of Mixed Heat and Fluid Flow in Annuli of Heated Rotating Cylinders", Int.J. CFD, 1998, Vol.9, pp.151-163. 9. TS Lee, GS Hu and C Shu, "Application of GDQ Method for the Study of Natural Convection in Horizontal Eccentric Annuli", Numerical Heat Transfer, Part A, Vol.41, September 2002, pp.803-815.

FINITE ELEMENT ANALYSIS OF HOLLOW BRICK DRYING IN FORCED AND MIXED CONVECTION ENVIRONMENTS H.N. SURESH AND J. RAVI KUMAR Military College of Electronics and Mechanical Engineering, Secunderabad - 500 015, India, E-mail: [email protected] P.A. A. NARAYANA AND K.N. SEETHARAMU Universiti of Sains, Tronoh, Malayasia, E-mail: [email protected] A system of coupled non-linear partial differential equations, for flow field and solid in the forced and mixed convection environments, which represent the heat and mass balances in the solid and air phases of the systems undergoing drying are solved. The buoyancy effects due to temperature and concentration on the temperature and moisture content distributions at the interface of the solid have been studied. Insulated and convective boundary conditions are imposed on the inner sides of the hollow brick. An interesting observation on temperature and moisture distribution within the solid and also along the solid surface is discussed. Results of early hours of drying are presented. Non-uniform distribution of temperature and moisture contours within the solid is noticed during early hours of forced convection drying. Whereas in mixed convection drying, in addition to non-uniform distribution of temperature and moisture contours within the solid, the rate of evaporation is high on the top surface of the solid. INTRODUCTION Evaporating drying has an important application in drying of porous materials such as foodstuffs, ceramic products, clay products, wood etc. to remove volatile liquid. But in the cases of preservation of food product and manufacture of ceramic products like bricks and tiles, drying phenomenon is more complex because of their size and shape [1]. The different regimes of drying have been studied theoretically either using Luikov's model [2] which based on irreversible thermodynamics or Whitaker's continuum approach [1]. The drying of porous material have been studied using one-dimensional [3] and twodimensional [4] models, using heat and mass transfer from the available correlation using boundary layer equations. Since the actual process of drying is a conjugate problem, the heat and mass transfer, to and from the porous solid have to be studied along with the flow field. Conjugate analysis of drying [5] also show that the drying behaviour differs from that obtained by decoupled analysis due to temperature and concentration nonhomogeneties at the solid-fluid interface. In the present work, two-dimensional Navier-Stokes equations have been solved for the flow field with and without additional buoyancy terms, resulting from temperature and concentration gradients, in the Y-momentum equation coupled with energy and moisture transport equations. Convective boundary conditions are imposed on top, left and right sides of the two-dimensional hollow brick. Insulated and convective boundary conditions are imposed on the inner sides of the hollow brick. Buoyancy effects, at low velocity, during forced convection drying environment is analyzed. An interesting observation on temperature and moisture distribution within the solid and also along the solid surface is discussed.

112

113

SOLUTION METHODOLOGY Air with a uniform free stream velocity of U is assumed to flow over a hollow brick, placed on a flat surface as shown in Fig.l. The brick is assume to be saturated with water. The temperature of the brick is less than that of air and the water evaporates from the brick due to the concentration difference between the brick surface and the ambient (7). %-r,;:m=u ,o=c,*=o: HowDoiiUin:

„ c

":< B

„: =

c

HpUawiBMck ••••••/

.p-o;

c *

-E±]|/ - • - - . . - • . • . •

L

, j

ft

.•

Fig. 1 Schematic diagram of Computational domain GOVERNING EQUATIONS FOR THE FLOW FIELD For the flow domain, the Navier-Stokes equations (for forced and mixed convection environments), energy equation and the moisture transport equation are considered (6). The governing equations are solved using corresponding boundary conditions (6). GOVERNING EQUATIONS FOR THE POROUS SOLID Using Darcy's law for capillary liquid mass flux and Fick's law for diffusive mass flux, the final form of energy and moisture conservation equations can be obtained in terms of the equivalent medium properties [4]. The top, left and right surfaces of the solid experience the interface boundary conditions, while the bottom surface is insulated. RESULTS AND DISCUSSION Computations were performed for the conjugate drying analysis of a hollow brick, under forced and mixed convection drying environments, as shown in Fig. 1. The bottom side of the hollow brick is assumed to be adiabatic for heat and moisture transfer. During conjugate analysis, two types of boundary conditions are imposed on the inner sides of the hollow brick. They are (i) Insulated boundary conditions and (ii) Convective boundary conditions. The hollow brick is assumed to be at initial moisture content of 0.13 kg/kg of dry solid and temperature of 303 K. The drying medium is assumed to be same at 303 K with 50% relative humidity. Comparison of temperature and moisture content distributions along the outer surface of the solid, due to insulated and convective boundary conditions, after 12 hours of drying are shown in Figs. 2(a) and 2(b) respectively. It is observed that, with both the boundary conditions, the solid cools faster at the leading edge corner B and temperature distribution is almost identical along the surface of the brick. The latent heat of evaporation is absorbed more from the left surface of the solid. Fig. 2(b) shows that, the evaporation rate is high with convective boundary conditions when compared with that of insulated boundary conditions. Also, evaporation rate is more on the top side, due to convective boundary conditions, when compared to insulated boundary conditions, since moisture transfer takes place from both

114

inner and outer surfaces of the solid with convective boundary conditions and also thickness of the brick, from the top surface, is small. Comparison of temperature contours obtained due to insulated and convective boundary conditions, after 12 hours of drying, are shown in Fig. 3. Figs. 4(a) and 4(b) shows the moisture contours obtained due to convective and insulted boundary conditions. It is observed that the solid cools faster with insulted boundary conditions when compared to convective boundary conditions. Temperature and concentration gradients are high at the leading edge due front stagnation effect. The effects of the thermal and concentration boundary layer over the solid are clearly indicated by the non-uniform distribution of temperature and moisture content of the solid. During forced convection drying, the evaporation rate is high with convective boundary conditions when compared with that of insulted boundary conditions. Hence at the leading edge the moisture content, with convective boundary conditions, reduces from initial value of 0.13 to 0.1139 kg/kg of dry solid whereas with insulated conditions it reduces to 0.1184 kg/kg of dry solid. Average moisture content variations are compared with time due to convective and insulted boundary conditions are shown in Fig. 5. It is observed that average moisture content decreases with time in both the situations. The decrease rate is high during brick drying with convective boundary conditions, from initial value of 0.13 to 0.12 kg/kg of dry solid, when compared to insulted boundary conditions, from 0.13 to 0.127 kg/kg of solid. Comparison of temperature contours obtained due to insulated and convective boundary conditions after 12 hours of hollow brick drying are shown in Fig. 6. It is observed that the solid cools faster with convective boundary conditions when compared to insulated boundary conditions and gradients are high at the leading edge due front stagnation effect. Moisture contours after 12 hours of mixed convection drying with convective and insulated boundary conditions on the inner side of the hollow brick are shown in Figs. 7(a) and 7(b) respectively. It is observed from the above figures that, with convective boundary conditions, moisture content at the leading edge of the solid reduces from the initial value of 0.13 to 0.1057 kg/kg of solid, whereas with insulated boundary conditions, it reduces to 0.1081 kg/kg of dry solid. Hence the moisture evaporation is more with convective boundary than with insulated boundary conditions. The effects of the thermal and concentration boundary layer over the solid are clearly indicated by the non-uniform distribution of temperature and moisture content at the surface of the solid. Comparison of temperature distributions along the outer surface of the solid, due to insulated and convective boundary conditions, after 12 hours of mixed convection drying are shown in Fig. 8. It is observed from the above figure that, the solid outer surface temperature drop more with insulated boundary conditions when compared to convective boundary conditions. This is because the latent heat of evaporation is absorbed from the outer surface of the solid with insulated boundary conditions whereas it takes place from both outer and inner surface of the solid with convective boundary conditions. Due to thermal boundary layer temperature drop at leading edge corner B is more compared to other regions of solid. Fig. 9 indicates the moisture content distribution along the surfaces of the solid. It is seen from Fig. 9 that, the moisture evaporates more on the outer surface of the solid with convective boundary conditions when compared to insulted boundary conditions, since the moisture transfer takes place from both inner and outer surfaces of the solid with convective boundary conditions.

115

t = 12 hrs C T = 303 K

4> =

5 0 %

ABLeft BC Top CD Right I - Inside Convectionf — Inside Insulated

Length (m) Fig. 2(b) Comparison of moisture content distribution along the surface of the hollow brick

Length (m) Fig. 2(a) Comparison of temperature distribution along the surface of the hollow brick

B r

•»

J'

. ^ / *<\

\.

1 - Inside Convection - Inside Insulated

''V D

A

Fig. 3 Comparison of temperature contours

Forced convection

"~ ""— — ' — " —"Ti Fig. 4(a) Moisture contours with " Convective boundary conditions

B_

Re = 200 T = 303 K

>«&o

<j> = 5 0 %

1-24E-001 . .

Averagie moistlire conl

V

1J0E001

Fig. 4(b) Moisture contours with insulted boundary conditions

1.16E00I

. ConvB.C. Insu B . C . I.12E-001 0.00

4.00

S.OD

D

12.00

Time (hours) Fig. 5 Comparison of average moisture content with time B

!

J

—^gyy^M.

—

,C

H Inside convection - - Inside Insulated

.*

c

L $

_1L

Fig. 6 Comparison of temperature c o n t o u r s at the end of 12 hours

Fig. 7(a) Mositure contour at the end of 12 hours (convective boundary conditions)

116

Mixed convection J - A t = 12 hrs / % T = 303 K / 08 \ < | > = 50% , - < ^ /

m,i» . am..so

-

\\

C

/'-'"'

\

/ \ f D

/i

°>,

"* «'s:n

8«S. .10

AB Left BCTop CD Right Inside convection Inside Insulated

n

Length (m) Fig. 8 Comparison of temperature distribution along the surface of the hollow brick

Fig. 7(b) Mositure contour at the end of 12 hours (Insulated boundary conditions)

Mixed convection Mixed convection

Re = 200 T = 303 K 0 = 50%

.12 hrs XJ •• 303 K VC = 50% AB Left BC Top CD Right Inside Convection — -Inside Insulated 02

03

OA

Length (ni) Fig. 9 Comparison of moisture content distribution along the surface of the hollow brick

I'S

0.120 -^

if I

on"

Inside convection Inside Insulated

0.108

Time (hours) Fig. 10 Comparison of average moisture content with time

117 Comparison of average moisture content variation with time, due to convective and insulated boundary conditions on the inner surfaces of the hollow brick are shown in Fig. 10. It is observed that average moisture content decreases with time during both the conditions. With insulated boundary conditions, it reduces from initial value of 0.13 to 0.125 kg/kg of solid and reduces upto 0.117 kg/kg of dry solid with convective boundary conditions. This shows that hollow brick dries earlier with convective boundary conditions. CONCLUSIONS Conjugate transient drying analysis, using FEM, has been carried out under forced convection environments to study the drying behavior of brick. Drying behavior is also studied under mixed convection environments, to study the effect of buoyancy on the drying behavior of hollow brick with convective and insulated boundary conditions on the inner side of the solid. From the present analysis the following points are concluded. •

• •

• • •

Temperature and concentration gradients are high at the leading edge when compared to other regions of the solid. Hence the rate of drying is high at that location. Non-uniform distribution of temperature and moisture in the solid occur due to leading edge effect. In mixed convection drying, in addition to non-uniform distribution of temperature and moisture contours within the solid, the rate of evaporation is high on the top surface of the solid. As expected, average moisture content in the solid decreases linearly with time Overall heat and mass transfer is enhanced from all the sides of the brick due to buoyancy effects. Solid dries faster with convective type of boundary conditions when compared to insulated boundary conditions on the inner sides of solid.

REFERENCES 1. Whitaker,S., A Theory of drying, Advances in Heat Transfer, 13, pp.119-203, 1977. 2. V.Luikov, Heat and mass transfer in capillary porous bodies, Pergamon Press, New York, 1966. 3. Murugesan, K., K.N. Seetharamu and P.A. Aswatha Narayana Heat and Mass Transfer, 32, pp. 81-88, 1996. 4. Ferguson, W. J. and R. W. Lewis, Seventh Int. conference on Numerical Methods in Thermal Problems, Vol. VII, Part.2, pp. 973-984, Stanford, 1991. 5. Dolinsky, A. SH. Dorfman and B.V.Davydenko, Int.J.Heat Mass Transfer, 34, 28832889, 1991 6. Murugesan K., H.N. Suresh, K. N. Seetharamu, P. A. Aswatha Narayana and T.Sundararajan, A theoretical model of brick drying as a conjugate problem, International Journal of Heat Mass Transfer Vol. 44, pp. 4075 -4086, 2001. 7. Suresh H.N., P. A. Aswatha Narayana and K. N. Seetharamu, (2001) Conjugate mixed convection heat and mass transfer in brick drying, Journal Heat and Mass Transfer Vol. 37 pp. 205-213

HIGH ACCURACY SIMULATION OF MULTI-MEDIUM FLOW C.W. Wang 1 , T.G.Liu 1 , B.C.Khoo 2 Institute of High Performance Computing The Capricorn Science Park II, 117528, Singapore Department of Mechanical Engineering National University of Singapore, 10 Kent Ridge Crescent, Singapore wangcw @ ihpc.a-star. edu. sg

119260

In this paper, simulations of multi-medium flow using the ENO method [1] have been studied. The Modified Ghost Fluid Method (MGFM) proposed by Liu et al [3] is employed to handle the possible strong shock interface interaction, where the original Ghost Fluid Method (GFM) presented by Fedkiw et al [2] has been found not to work too well or even fail. This results in the development of an MGFM-based ENO algorithm in this work. Furthermore, TVD Runge-Kutta time discretization is adopted in order to improve the time accuracy and stability of the said algorithm. Two representative applications are used to demonstrate the capability of the present method, and show the MGFMbased ENO method works well for multi-medium flow.

1. Introduction ENO schemes [1] with robustness and a sharp transition to a discontinuity, have been shown to be very successful in solving hyperbolic conservation laws, and work well for most gas flows. On the other hand, when such a scheme is applied to multi-medium or multi-phase flows, nonphysical oscillations can occur unless additional techniques are developed and employed. The Ghost Fluid Method (GFM) recently developed by Fedkiw et al [2] provides us a new and easy way to overcome the difficulties mentioned above for the multi-medium/multi-phase flow. However, it has been found that incorrect results can still be obtained in its applications to a strong shock impacting on an interface [3]. In such a situation, the pressure, velocity and entropy at the interface may experience a sudden jump due to the shock impacting. Such a sudden jump across the interface implies that the pressure and velocity of real fluid may not readily be taken as the pressure and velocity of ghost fluid as they should be done in the original GFM; thus these have to be predicted before the GFM is applied. This has led to the development of a modified GFM (MGFM) by Liu et al [3]. In this paper, the said MGFM will be employed to treat the interface region; the level set technique is used to capture the moving material interface. In the MGFM, a few grid cells next to the interface are defined as ghost cells for each medium, at which density, pressure and velocity are obtained using an exact Riemann solver. Computation is carried out for each medium including the ghost fluid, respectively. Multi-value thus results at ghost cells. By dropping the solution on one medium according

118

119

to the new interface location, which is computed in advance via the level set method, a sharp contact discontinuity is then obtained. 2. Equations 2.1 Governing Equations The governing equations considered compressible flow are the ID Euler equations

for

one-dimensional

^ + ^ > = 0 (1) dt ox T where U = [p,pu, E] F(U) - [pu,pu2 + p,(E + p)uf , p is the density, u is the velocity, p is the pressure and E is the total energy per unit volume. The total energy is the sum of internal energy and kinetic energy, E = pe + pu212

(2)

e is the internal energy per unit mass. 2.2 Equations of State To close (1), equation of state (EOS) is required. For ideal gas, we may write the EOS as p = (y-l)pe (3) where y is the ratio of the specific heats. We can use the Tait equation of state for water. f

\N

n p=B P A

B+A

(4)

where JV = 7.15, A = l(TPa, B = 3.31x10sPa, and p0 ^lOOOkg/m' ,In addition, we define e as the internal energy per unit mass. f _ NAM B A B ~ e=+ (5) (#-!)/>(, vA>y 3. Modified GFM It is found that the GFM is not so efficient for some problems, such as a strong shock impacting on the interface; numerical oscillation occurs or even the GFM fails to work due to the large gradient of density, pressure and entropy at the interface during the shock impact. In this work, a

120

Riemann solver is developed to give a middle state of the interface, say, Pj and u, are defined as the respective pressure and velocity of the middle state, dL and dR represent the density on the left and the right side of the interface respectively. Suppose that the interface lies between node / and node / + 1 . We assume that fluid 1 takes node i and those to the left of node i, fluid 2 takes node / +1 and those nodes to the right of node i + l. Nodes i + l to i + k are ghost cells for fluid 1, where the real fluid status of fluid 1 and the ghost fluid status of fluid 1 coexist. Here k is the number of ghost cells. Similarly nodes i to i-k are the ghost cells of fluid 2. In order to update the state at the nodes near the interface, once the ghost cells are defined, one needs to define the ghost fluid status for fluid 1 at its ghost cells. Unlike the original GFM, which takes density from isobaric fixing and velocity and pressure from the real fluid (fluid 2) as the ghost fluid status, we take p,, u, and dL as the pressure, velocity and density at these ghost nodes. In the computation to fluid 2, we also take p, and Uj as the ghost fluid pressure and velocity, dR, instead of dL, is taken as the ghost fluid density. 4. Examples Case 1—an gas shock impacting on an gas-water interface A shock is located at the same position as the interface which lies at the middle of the computational domain [0, 1], the status behind the shock is defined as ys=l.4, ps =0.00596521, us =911.8821, p 5 = 1 0 0 0 ; the non-dimension states on the left and right sides of the interface are initially defined as: yL=lA, yR=7.i5, pL =0.001, pR=\, pL=l, PR=l, uL=uR=0. We run the code to a time of 0.00071545, Fig. la and Fig. lb show the pressure with the GFM and MGFM. Numerical inaccuracy appearing in the results obtained by the GFM is removed by the MGFM. Case 2—2D gaseous bubble explosion near an interface The problem is considered in the computational domain [0, 2] x [0, 2] with 200x200 points, the interface is located at x, = 1.2, and the center of the bubble is located at point [1,1] with radius 0.05. The state inside the bubble is defined as yB=lA, pB=l, /> B =1000, uB=0, the state outside the bubble to the left and to the right of the interface are defined as YL=1A, pL=0.00\, pL=l, uL=0 and 7 R =1.66667, pR=0.l, pR-l, uR=0 respectively. Fig 2 shows the pressure distribution. It

121

shows that a global shock passes through the interface and propagates to the right side of the interface and a weak reflected rarefaction shock propagates to the left side of the interface.

"•—i—'—i—'—r 0.00

0.20

0.40

0.60

- 1

0.80

0.00

1.00

Fig. la Results obtained by GFM

|

0.20

l

|

0.40

P

|

I

060

|

0.80

P

|

1.00

Fig. lb Results obtained by MGFM

5. Conclusions /

/

^

r ;; If

1 1

I '«

Y 1 \

\

In this paper, we implemented an MGFM-based ENO method and applied to simulate ID and 2D shock impacting on an material interface. The present method has successfully removed the numerical inaccuracy, which can appear in the original GFM.

^

/ 9 ^

Fig. 2 pressure distribution

References [1] C.W.Shu and S.Osher, Efficient Implementation of Essentially nonoscillatory shock capturing schemes, J. Comput. Phys., 77(1988) [2] Fedkiw R. P, A. Marquina and B. Merriman, A non-oscillatory Eulerian approach to interfaces in multimaterial flows, J. Comp. Phys., 152(1999), 457-492 [3] T.G. Liu, B.C. Khoo and K.S. Yeo, Ghost Fluid Method for Strong Shock Impacting on Material Interface, submitted to J. Comput. Phys.

THE MODIFIED GHOST FLUID METHOD

LIU TG AND HUNG KC Institute of High Performance

Computing, 1 Science Park Road, #01-01 The Capricorn, Science Park II, Singapore 117528 Email: liutg @ ihpc. a-star. nus. edu. sg

Singapore

KHOO BC Department of Mechanical Engineering, National University of Singapore, Singapore 119260. Singapore-MIT Alliance, 4 Engineering Drive 3, National University of Singapore. 117576 Email: [email protected]. Since the pattern of shock refraction at a material interface depends highly on material properties, a direct use of the GFM developed by Fedkiw et al [1] can result in shock-refraction characteristic somewhat only pertinent for a homogeneous medium while not captured those features as expected for a heterogeneous medium. This may lead to inaccurate results or even failure during the simulation of a strong shock impacting on a material interface. In this work, we will propose a modified GFM (MGFM) with a predicted ghost fluid status using the approximate Riemann problem solver developed by Liu et al [2]. The MGFM is found to be far more robust than the original GFM and less problem-related.

1

Introduction

In general, a high-resolution scheme such as TVD or ENO performs very well for homogeneous compressible flow. When such a scheme is employed to solve multimedium flow, however, numerical inaccuracies can occur at material interfaces. Various techniques have been developed to overcome these difficulties. The fairly recently developed Ghost Fluid Method (GFM) by Fedkiw et al [1] is one of the technique used to overcome these difficulties. In the GFM, a band of 3 to 5 grid points as ghost cells is defined in the vicinity of the interface, which is captured using Level Set technique. At the ghost cells, the ghost fluid is defined with pressure and (normal) velocity as for the real fluid and the density is obtained from isobaric fixing technique. The GFM allows calculation in a single medium. Thus, numerical oscillations, which usually occur in an Eulerian method as applied to multi-medium flow, are generally eliminated. By discarding one side of the computation at the ghost points with respect to the new interface position, a sharp contact discontinuity is then maintained. The GFM makes the interface "invisible" during the computation of flow field such that its extension to multi-dimensions becomes fairly simple. Since the pattern of shock refraction at a material interface and interfacial status depend highly on material properties, a direct use of such method can result in a shock refraction somewhat only pertinent for a homogeneous medium when used for a heterogeneous medium. Inaccurate results or even computational failure can occur in simulating a strong shock impacting on an interface. In such a situation, the real fluid pressure and velocity at the location of ghost points cannot be readily assumed to be ghost-fluid pressure and velocity without taking into consideration the shock-interface interaction. Reasonable ghost fluid pressure and velocity and thus isobaric value have to be formulated. The plausible causes for the inapplicability of the original GFM in such situations have been analysed and discussed in Liu et al [3]. In this work, a technique is developed to predict the ghost fluid status and thus leads to the development of a modified GFM with greater robustness.

122

123 2

The Modified GFM

As stated in the introduction, if one wants to utilise the simplicity of the GFM, the ghost fluid status and isobaric value have to be evaluated having taken into account shockinterface interaction. We can look for possible solution from the characteristic relationship and shock jump conditions. The following modification to be presented is an extension of the implicit characteristic method discussed in [2]. We use ID Euler equation to describe the modified GFM. The ID Euler equation can be written as d

JL+dim=Q,

(i)

dt dx where u =[p,pu,E]T, F([/) = [p«,p«2 + p,(£ + />)«f > P is the density, u is the velocity, p is the pressure and E is the total energy. The total energy is given as E = pe+pu112-* where e is the specific internal energy. For closure of system, the equation of state (EOS) is required. Here, we shall focus on two types of EOS. The first one is the y-law for perfect gases. The other is the Tait's EOS for water, which has the form of p = B[p/p ]" - B + A. where JV=7.15, A=1.0E5Pa, B=3.31E8Pa and po=1000.0kg/m3. The two nonlinear characteristics intersecting at the interface for system (1) are used to predict the interface status and given as ^-P.^-o. * L - p B c , , A = 0, dt

Kw

along al0

(2a)

%-",+'*'

(2b>

"g * = „, -cm >

" dt

dt

'

'"

where PIL (pm) and C,L (cm) are the density and speed of sound to the left (right) of interface; Uj and Pl are the velocity and pressure at the interface. Discretising (2a) and (2b) at the interface, we obtain £IZ£JL+(UI-UIL)

= 0>

PILCIL

£LZ£»_(llf_Ha) Patcu

(3)

= 0.

where piL£iL and pm£m are the respective approximation of P ; t C / t and P ; R C ; R . U/L (UIR) and p (p ) can be evaluated along the characteristic lines. To apply (3) to a shock impacting on an interface, pIL£IL and pmcm have to be specially approximated to take into account the shock-interface interaction. Specifically, for an interface located between grid i and ; + i at time t=tn, we use the following equations to predict the interface condition for evaluating at the next time step of t=tn+1.

w'1=pLp-

p

'~plx

PI-PU

'

W2=oRo"

p

'~p<*

•

(4b)

P; -P,+2

System (4) has to be solved via iteration and is a two-shock approximation to the Riemann problem at the interface. The energy jump condition is employed to close (4). One solves system (4) for predicting the condition at the interface. Then one uses the predicted pressure and velocity as those for the ghost fluid at the ghost points, and the predicted isobaric value (entropy) to fix the real fluid density at the point next to the interface. The remaining steps are the same as those for the original GFM.

124

:: l-

IT

_

i Li

i 3

~

Fig. la. Figlb. Pressure and density for problem 1 using the original GFM and the comparison to the analytical solution

Fig. 2a. Fig2b. Pressure and density for problem 1 using the modified GFM and the comparison to the analytical solution.

;

, /

(I

: :

>f

i

N }

%\ ~

/ i

f

\ \

., s

\

:

i ..

-2

-1

Fig. 3a.

A-/^

\

3

-

:

x. /\ A

\ \ 2

-

1

0

1

/

1

W>

:\\ \

j

~x ,._ - f

/

/

fei

-y

.

•

vl
f"X

1

Fig3b.

—\ |

/ /

0

/ 1

\

: \ 2

\, I

1M | \

V

\ .

yS

,.* y

\ i d , 11

3

Fig. 3c. Fig3d. The series of pressure contour for problem 2 to demostrate the cavity evolution due to underwater shock impacting. 3a: regular shock refration occuring at the bubble surface and bubble starting to deform; 3b: regular refraction transferring into an inregular type; 3c: water jet forming; 3d: bubble callopsed and shock generated.

125 3

Results and Discussions

Two applications are presented. One is a ID-plane shock in helium gas impacting on a helium-air interface. The other is an underwater plane shock impacting on an underwater cavity and causes the gas cavity collapse. In the computation, MUSCL is employed. Shock Refraction with Reflected Shock Wave (Problem 1): The strength of the incident shock is p4/pi=100.0. Other initial non-dimensional flow parameters are pi=1.0, p02=1.0, Poi=0.1, ui=0.0, Yo2=1.6667, Yoi-l-4- The non-dimensional computational domain is [0,1] and CFL number is set to 0.9 with 201 uniform mesh. The results of modified GFM are shown in Figs. 2a and 2b for the pressure and density compared to the analytical solution. The results obtained with the original GFM are shown in Figs la and lb. It is clear that there are discrepancies of both locations of the shock front and interface and the density plots in comparison to the analytical solution. All undesirable or incorrect features appearing in the Figs, la-lb do not occur for the modified GFM. Underwater Shock-Cavity Interaction (Problem 2): The initial conditions are given as follows. An air cylinder of unit radius is located at the origin (0.0, 0.0) in water and the initial non-dimensional flow parameters inside the gas bubble are p =i.o> p =l.0> = o.o • The initial non-dimensional pressure and density for the still water are PM=\.Q and p =iooo.O- The shock wave is initially located at the straight line x = -1.2 with strength of p =8000.0- The computational domain is a rectangular region with ;cxye[-4,3]x[-3,3] and 211x181 grid nodes are uniformly distributed in the respective x and y directions. CFL is taken to be 0.45. Figures 3a to 3d show the evolution of the cavity till the collapse of the cavity and the generation of a shock due to water jet impacting. The original GFM fails to work for this problem. u

4

- o.o and

v

Conclusions

In this paper, a modified GFM has been developed to overcome the difficulties encountered by the original GFM as applied to a strong shock impacting on a material interface. Numerical tests have shown that the modified GFM is far more robust than the original GFM and less problem-related. References 1. Fedkiw R. P., T Aslam, B. Merriman and S. Osher, A non-oscillatory Eulerian approach to interfaces in multimaterial flows (the Ghost Fluid Method), J. Comp. Phys. (1999), 152,457-492. 2. Liu T. G., B. C. Khoo and K. S. Yeo, The simulation of compressible multi-medium flow. Part I: A new methodology with applications to ID gas-gas and gas-water cases. Comp. & Fluids. 30(3) (2001), 291-314. 3. Liu T. G., B. C. Khoo and K. S. Yeo, Ghost Fluid Method for strong shock impacting on material interface. J. Comp. Phys (2002). Submitted for publication.

GAUSS QUADRATURE vs. ANALYTIC INTEGRATION FOR FINITE-ELEMENT CFD CODES M.M. El-Awad, M.J.N. Boyce and F. Tarlochan College of Engineering, Universiti Tenaga Nasional (UNITEN), 43009 Kajang, Selangor, D.E., Malaysia Tel: (03) 8928 7251 Fax: (03) 8926 3506 e-mail: [email protected]

ABSTRACT The paper compares one-point quadrature and analytic integration as efficient alternatives to the standard two-points Gaussian quadrature for CFD finite element codes that use 4node, bi-linear, quadrilateral elements. The differentially-heated square cavity problem, for which benchmark solutions exist, is used to compare the accuracy and computation time of the three integration methods. The results obtained show that one-point quadrature requires less computational time than analytic integration. Unlike the analytic integration, one-point quadrature also required minor modifications to existing finite-element codes that use two-points quadrature. Moreover, the code that applies Gauss quadrature is easily vecorizable, which is beneficial on supercomputers. In general, one-point quadrature requires an "hour-glass" correction to be made, but for the cavity-flow considered here this was not necessary thanks to the Dirichlet boundary condition applied over a large part of the solution domain. KEYWORDS: Finite-Element Method, CFD, Gauss Quadrature, Analytic Integration. INTRODUCTION The finite element method (FE) offers great flexibility in handling difficult geometrical and boundary conditions, but its computational requirements, i.e. running time and memory, are relatively high. Efficient numerical techniques that can minimise the method's computer time and memory must be used. Usually, the most time-consuming step in the FE method is the formation of the elemental systems, which require the evaluation of certain integrals over the element. The integration is normally done numerically with Gauss quadrature. For the 4-node, bi-linear element, which is a commonly used element in CFD FE codes, Gauss quadrature with two points yields accurate results but is time consuming. In order to minimise the computational time of the method, Gresho et al [1] and Molina and Huot [2] used one-point quadrature (1PQ). Compared to the two-point quadrature (2PQ), one-point quadrature reduces the computation time by a factor of 4 for 2D elements (and a factor of 8 for 3D elements). However, its solutions exhibit an oscillatory behaviour, known as the "hour-glass" mode, which results from under-integration of the diffusion terms in the transport equations. Gresho et al [1] added an "hour-glass" correction term so as to suppress these oscillations. As an alternative to one-point quadrature, Mizukami [3] suggested analytic-integration formulae, which are exact for a parallelogram element and can be used as a good

126

127 approximation for a general quadrilateral element. He compared the accuracy of the analytic-integration formula with one-point quadrature and two-point quadrature by solving the case of transient heat conduction in a circular plate. His results showed that the analytic integration (AI) does not produce the oscillatory behaviour that characterises onepoint quadrature. However, values of the computation time were not given. The present work compares one-point quadrature and analytic integration as efficient alternatives to two-point quadrature for solving CFD problems which are governed by Navier-Stokes and the energy equations. These equations are more general than the diffusion equations since they include advection and source terms, as well as diffusion terms. Also, because of the advection terms, both one-point quadrature and analytic integration involve approximations that can reduce the accuracy of the FE method. COMPUTATIONAL ASSESSMENT OF THE INTEGRATION SCHEMES In order to compare the accuracy and computer time of 1PQ and AI, the two integration schemes were implemented in an existing FE code that solves the equations of the incompressible laminar flow using the Boussinesq approximation [4]. The code adopts a numerical scheme similar to that of Gresho et al [1]. Four-node, isoparametric elements are used for the spatial discretization and the explicit forward Euler scheme (with the balancing tensor-diffusivity) for the temporal integration. However, unlike Gresho et al, the code uses 2PQ instead of 1PQ. In the present study two modified versions of the code were developed; one used 1PQ and the other AI. Unlike 1PQ, AI required a major modification to the original code. The alternative integration schemes were compared by considering the non-isothermal case of the buoyancy-driven laminar flow in a differentially heated square cavity (Figure 1). Insulated, u = v = 0

Hot side T=T„ u=0 v=0

/ /

lg

\ \

t

Cold side u=0 v=0

I Insulated, u = v = 0

Figure (1): The boundary conditions and FE mesh for the cavity flow. This flow has been considered by numerous workers for the validation of numerical solution methods of incompressible-flow equations (see e.g. de Vahl Davis [5], Le Quere [6] and Choi et [7]). Here, we consider a vertical square cavity of length /, of which the left vertical side was maintained at a temperature 7/, and the right vertical side at a temperature Tc, where Th > Tc. The top and bottom sides of the cavity were insulated and, therefore, had zero temperature gradients. The temperature gradients near the hot and cold walls of the cavity create density gradients and buoyancy forces, which in turn create circulation of the flow and natural heat-convection. For this problem the energy equation is coupled with the momentum equation through the gravity term.

128 Two dimensionless quantities characterise the problem, which are the Prandtl number (Pr) and the Rayleigh number (Ra). In the present work the problem was solved for Pr = 0.71 and Ra = 103, 10 4 , 105 and 106. A 48x48 grid was used which was packed closer to the cavity walls. Since all the elements in this grid are exact rectangles or squares, the only inaccuracy in AI comes from the use of the centroid velocity in the advection term. Therefore, it is expected to be more accurate than IPQ. The solutions obtained using 2PQ, IPQ (with the hour-glass correction) and AI for the dimensionless temperature, T/(Th-Tc), and the dimensionless stream function, y//cc, where a is the thermal diffusivity, showed a good agreement between the three solutions. It should be mentioned that for IPQ there was no significant difference between the results obtained with and without the hour-glass correction. A benchmark solution for the cavity problem was given by de Vahl Davis [5], who solved the cavity problem for Pr = 0.71 and three values of the Rayleigh number: 103, 104 and 105. Different grids were used and the solutions on these grids were extrapolated to obtain grid independent solutions. He gave non-dimensional values of the maximum M-velocity, the maximum v-velocity, the maximum and mid-point values of the steam-function and the average value of the Nusselt number (Numg) at the hot side of the cavity. Choi et al [7] also solved the cavity problem for Pr - 0.7, but with four values of the Rayleigh number: 103, 104, 105 and 10 . They adopted a fractional four-step FE formulation to solve the unsteady incompressible Navier-Stokes equations. Table (1) compares the results obtained in the present simulation using 2PQ, IPQ and AI with those of Davis [5] and Choi et al [7]. The table shows that all the three methods of integration are adequately accurate. Particularly for the cases of Ra = 103 and 104 the differences between the three integration methods and the benchmark solution of Davis are negligible. Table 1: Comparison of the present results with those of Davis [5] and Choi et al [7]

Ra 103

104

105

106

Vmid

Davis Choi et Present Present Present Davies Choi et Present Present Present Davies Choi et Present Present Present Present Present Present

al results (2PQ) results (IPQ) results (AI) al results (2PQ) results (IPQ) results (AI) al results results results results results results

(2PQ) (IPQ) (AI) (2PQ) (IPQ) (AI)

1.174 1.175 1.183 1.183 1.183 5.071 5.136 5.081 5.091 5.087 9.111 9.299 9.210 9.157 9.147 16.458 16.568 16.485

* As reported by Choi et al [7].

Vmax

Wmax

''max

9.61 9.81 9.683 9.667 9.652 16.881 16.991 16.906

3.649 3.644 3.680 3.680 3.682 16.178 16.418 16.216 16.248 16.235 34.73* 35.97* 44.081 44.086 43.966 127.673 129.513 128.233

3.697 3.726 3.722 3.722 3.722 19.617 19.801 19.630 19.650 19.647 68.59 69.02 68.871 68.850 68.797 222.082 223.308 222.552

Afaavg 1.118 1.143 1.113 1.113 1.113 2.243 2.264 2.245 2.245 2.244 4.530 4.519 4.498 4.495 4.413 8.589 8.578 8.567

129 The computational times were recorded for typical runs of the three codes for 100 iterations on a PC (with 200-MHz Pentium processor). These were 20.44, 9.28 and 13.12 seconds using 2PQ, 1PQ, and AI, respectively. Therefore, 1PQ reduced the computational time by 54.6% and AI reduced it by 35.8%. Note that these computation times are for the whole solution step (i.e. including the calculation of pressure and boundary integrals, insertion of Dirichlet B.Cs. and calculation of the new time-step) and not just for integrating the elemental matrices. CONCLUDING REMARKS Considering the minor modification made to the existing FE code, the significant saving in the computation time achieved with one-point quadrature makes this method favourable over analytic integration as an efficient alternative to two-point Gauss quadrature. Unlike analytical integration, extending the code with one-point quadrature to 3D is a trivial task. Another important advantage of one-point quadrature over analytic integration is its suitability for vector processors. Through vectorization, the computer time can be reduced further by an order of magnitude [8]. While the code that uses Gauss quadrature is easily vectorizable, the analytic-integration code calls three external functions that make it difficult to vectorize. ACKNOWLEDGEMENT The authors are indebted to Dr. V. Haroutunian and Dr. P.L. Betts (UMIST, UK) for providing the original FEMSET-2D code used in the present analysis. REFERENCES 1. Gresho, P.M., S.T., Chan, R.L. Lee, and CD. Upson. "A modified finite element method for solving the time-dependant three-dimensional incompressible NavierStokes equations. Part 1: Theory". Int. J. Numer. Methods Fluids, 4: 557-598, 1984. 2. Molina, R.-C. and J. -P. Huot. "A one-point integration finite element solver for the fast solution of the compressible Euler equation", Comput. Methods Appl. Mech. Eng., 95: 37-48, 1992. 3. Mizukami, A. (1986). "Some integration formulae for a four-node isoparametric element", Comput. Methods Appl. Mech. Eng., 59: 111-121, 1986. 4. Haroutunian, V. "A Time-dependant Finite Element Model for Atmospheric Dispersion of Gases heavier than Air", PhD thesis, the University of Manchester Institute of Science and Technology (UMIST), 1987. 5. de Vahl Davis, G. "Natural convection of air in a square cavity: A benchmark solution", Int. J. Numer. Methods Fluids, 3: 249-264, 1983. 6. Le Quere, P. "Accurate solutions to the square thermally driven cavity at high Rayleigh number", Comput. Fluids, 20: 29-41, 1991. 7. Choi, H.G., H. Choi, and J.Y. Yoo. "A fractional four-step finite element formulation of the unsteady incompressible Navier-Stokes equations using SUPG and linear equalorder element methods", Comput. Methods Appl. Mech. Eng., 143: 333-348, 1997. 8. El-Awad, M.M., A. Anuar, and M.Z. Yusoff. "Programming the Finite-Element Method for Vector Processors", Proc. the Experimental and Theoretical Mechanics Conference (ETM2002), Bali, Indonesia, March 18-19, 2002, 160-166, 2002.

A P R O J E C T I O N M E T H O D FOR SOLVING INCOMPRESSIBLE VISCOUS FLOWS ON DOMAINS WITH MOVING BOUNDARIES H U A PAN, LUN SHENG PAN, DIAO XU, AND T E N G Y O N G NG Institute

Of High Performance

Computing, 1 Science Park Road 01-01 The Singapore 117528, E-mail: huapan@ihpc. a-star. edu.com

Capricorn,

GUI R O N G LIU National

University of Singapore, Singapore 117584

In this paper, a projection method is presented for solving the flow problems in domains with moving boundaries. In order to to track the movement of the domain boundaries, the arbitrary-Lagrangian-Eulerian (ALE) coordinates are used. The unsteady incompressible Navier-Stokes equations on the ALE coordinates are solved by using a projection method developed in this paper. The multi-block structured grids are used to discretize the flow domains. The grid velocity is not explicitly computed and instead the volume change is used to account for the effect of grid movement. A new method is also proposed to compute the freestream capturing metrics so that the geometric conservation law (GCL) can be satisfied exactly in this algorithm. This projection method is also parallelized so that the state-of-the-art high performance computers can be utilized to match the computation cost associated with the moving grid calculations. Several test cases are solved to verify the performance of this moving-grid projection method.

1

Introduction

Flow problems with moving boundaries can be encountered in a variety of applications. Generally, it is desired to use boundary-conforming coordinate transformation in favor of dealing interface. Therefore, the arbitrary-Langrangian-Eulerian (ALE) coordinate system must been applied to exactly represent the moving boundaries of the fluid domain. Owing to the movement of coordinate system, an additional conservation equation results, which relates the change of elementary control volume to the coordinate frame velocity and was named by Thomas et al 1 as geometric conservation law (GCL). This conservation equation must be satisfied simultaneously with other conservation equations, such as mass and momentum, in solving the moving boundary problems. A good review regarding the moving grid can be found in Zhang et al 2 , where the GCL for both static and dynamic meshes were identified in detail as the surface and volume conservation law respectively. It has been demonstrated that artificial mass source would be introduced if it is failed to satisfy the GCL as shown in Demirdzic et al 3 . Bell et al 4 proposed a Godunov-projection method for solving the unsteady incompressible flows. The cell Reynolds number restriction is removed or greatly enlarged in this method. Subsequently, this method was developed and studied by other researchers. Pan et al 5 applied this method on the overset grids and implemented it on the high performance computers. In this work, an attempt is made to solve the moving boundary problems on general three-dimensional computational domain by extending the Godunov-projection

130

131 method. The general formulation of unsteady viscous incompressible Navier-Stokes equations on the ALE coordinate system are presented at first. Then the freestream capturing metrics is presented and a formula for computing the Jacobian is proposed to ensure VCL. The numerical procedure of the improved projection method is briefly introduced. One benchmark case is presented to demonstrate the applicability of this method. 2

Governing Equations

The governing equations for the unsteady incompressible flow on the ALE coordinate system can be categorized into conservation laws for surface, volume, mass and momentum: | j ( ^ - a ) = 0

(1)

%[-w{9l^)=0

(2)

| j ( ^ - ) = 0 du

1 l"~<

/

S3 U U

;J[ ^ - ^W

\1 du

e d („; S

1 S

(3) _L.

du\

1 d , „,r >

= -JW{ -J W)--JW

{SP)+F

_, (4)

Here £ = {£0, £l, £2)T denotes the computational coordinates. S1 = V f and J = de(($), where $ is the mapping between the physical and computational space, a is the arbitrary constant vector, ui, is the coordinate velocity or grid velocity. The conservation law for other fluid quantities, such as temperature and concentration, takes the similar form. 3

Freestream Capturing Metrics

The surface and volume conservation laws, presented in equation (1) and (2), must be satisfied exactly in order to capture the flow accurately on changing domains. The surface of hexahedron is not unique since it has four vertices's and only three vertices's are sufficient to define a plane. One way to solve this dilemma is to introduce a surface center point, which is the average of four vertices. The surface vector equals to the sum of the surface vectors of four triangles. After simplification, the surface vector can be computed as below:

It can be easily proved that this formula satisfies the surface conservation law, equation (1). In fact, this formula, which was also presented as the average of area of all possible triangulations, is common practice in finite volume discretization. Several formulae have been proposed for estimating the volume of hexahedral cell based on different approximations. However, after careful examination, it is found that these formulae fail to ensure the VCL exactly. The author suggests to divide the hexahedron into 24 tetrahedrons and to take the sum of volumes

132 of tetrahedrons as the volume of hexahedron. After simplification, the following equation is obtained:

where a^j-i-i denotes the surface center point introduced above. It has been verified that this formula can guarantee the VCL exactly when both the cell volumes and the volume changes are evaluated by using this formula. 4

Numerical Procedure

The Godunov-projection method was originally designed by Bell et al 4 , where the cell Reynolds number restriction is removed. This scheme is second-order accuracy both in space and time. In this work, Bell's method is improved so that the flows on the domains with moving boundary can be simulated. Basically, the procedure includes three steps, estimation of convection term by using Godunov procedure, computing intermediate velocity field where the mass conservation law is neglected and projecting the intermediate velocity field to enforce the mass conservation. The major improvement is on the convection term, where the convection velocity on the ALE coordinates is adjusted by the grid velocity Uf, as shown in equation (4). Therefore, the following equations is used to approximate the convection term: n+i

n+i

(7)

"(^•'-"•^W? n~\- —

7i+

where i t , , - , \ is the extrapolated velocity on the cell surface and u d2 is the advection velocity. Both of these velocity are obtained by extrapolation of half time step. The grid velocity Ub, which is absorbed in the advection velocity, is not explicitly computed in the algorithm and instead the volume change is used. The details of the numerical procedure can be found on Bell's papers. Parallelization of the algorithm is completed by performing grid partitioning. MPI is used in the code to provide the data communication and synchronization functions. 5

Numerical Results

The flow inside a channel with a moving indentation is simulated here to verify the projection method presented in this paper. This problem had been presented by Demirdzic 6 . The same parameters as in Demirdzic's case are used here. The shear stress on the lower wall at instants of time are presented in figure 1, which indicates the strength of the eddies and the position of the separation and reattachment. After comparison of these shear stress distribution with Demirdzic's results, it can be concluded that the same stress distribution is obtained in this calculation by using the proposed projection method.

133

shear: s t r e s s on t h e l o w e r

wall

<

'

\ I /

\ 1 i

/

J/

S

-

i ~

-v

' t

0

t t

0 4 O 5

2

l5 11

Figure 1. Shear Stress on the lower wall at various instants of time, t" = 0.1—1.0

6

Conclusions

In this paper, a projection method is presented to solve the unsteady incompressible Navier-Stokes equations on the ALE coordinate system so that the flow problems with moving boundaries can be solve accurately. The freestream capturing metrics for general three-dimensional structured grids are reviewed and a new formula is proposed to exactly compute the volume of hexahedral cell. One benchmark case is demonstrated. It can be concluded that this method is capable to solve the the ALE formulation of unsteady incompressible flows. References 1. P.D.Thomas and C.K.Lombard. Geometric conservation law and its application to flow computations on moving grids. AIAA Journal, 17(10):1030-1037, 1979. 2. H.Zhang M.Reggio J.Y.Trepanier and R.Camarero. Discrete form of the gel for moving meshes and its implementation in cfd scheme. Computers and Fluids, 22:9-23, 1993. 3. I.Demirzic and M.Peric. Space conservation law in finite volume calculation of fluid flow. International Journal for Numerical Methods in Fluids, 8:10371050, 1988. 4. J.B.Bell P.Colella and H.M.Glaz. A second-order projection method for viscous incompressible flow. In Proc. of AIAA Computational Fluid Dynamics Conference, pages 87-1560, 1987. 5. H.Pan and M.Damodaran. Parallel computation of viscous incompressible flows using godunov-projection method on overlapping grids. International Journal for Numerical Methods in Fluids, 39(5):441-463, 6 2002. 6. I.Demirzic and M.Peric. Finite volume method for prediction of fluid flow in arbitraily shaped domains with moving boundaries. International Journal for Numerical Methods in Fluids, 10:771-790, 1990.

Multi-Physics Simulations of Vortex-Induced Cylinder Vibrations S.Y. LEE AND M. LEE School ofMech. & Aero. Engineering, Sejong University 98 Kwang-Jin Gu Kunja-Dong, Seoul 143-747, Korea Email.mlee @ sejong.ac. kr This paper presents a numerical simulation of an oscillating circular cylinder in the incompressible laminar flow. Two different cases are investigated: one is the rigid case, and the other is the elastic case. In latter case, the cylinder is represented by a spring-damper-mass model. The unsteady flow field is resolved by a finite volume method with a deforming grid to accommodate the moving cylinder. Numerical simulations of the rigid cases are validated against published results, and good agreement is obtained. For the elastic case, the simulations are carried out at a Reynolds number of 200 where the near-wake is predominantly two-dimensional and laminar. The numerically simulated fluid-structure interactions with various structural parameters produce resonance and off-resonance situations of the elastic cylinder. The present method can be applied to solution of the fluid-structure interaction system in laminar flow regime.

1. Introduction All the structure under the influence of fluid can experience a distortion under load. When these loads are caused by fluid around a structure which themselves depend on the geometry of the structure and the orientation of the various structural components to the surrounding flow then structural distortion results in changes in dynamic load leading to further distortion and so on. These interactions could be developed into complicated vibrations of structure and structural damage would be the result of certain unfavourable conditions. Among problems associated with fluid-structure interactions, the vortexinduced vibration of a circular cylinder has drawn attraction for many years due to its strong vortex shedding. Numerous studies [6] have been carried out and early experimental studies are concentrated on rigid structure in a cross flow. These are progressed into elastic structures to deal with growing importance in engineering fields. In recent years, substantial progress has been made in the development of an improved unsteady fluid dynamics and structural dynamics. The computational methods for unsteady fluid dynamics have been developed in either Eulerian or Navier-Stokes formulation of fluid dynamic equations. The structural side of computational methods have been evolved in the finite element method using the Lagrangian formulation. Since these methods are firmly established and widely used in practice, it is wise to take full advantage of developments in both fields by coupling of fluid and structural solvers. This partitioned analysis for fluid structure interaction provides an efficient and modular way to deal with fluid-structure interaction problems. In the present approach, the code is structured in such a manner that the fluid and the structure solvers can be modelled separately by using the domain decomposition approach. The present paper examines the effect of vortex shedding behind the cylinder towards the vibration of elastic cylinder. 2. Numerical approach 2.1 Governing equations The flow analysis and the structural response are coupled in the time marching process. The instantaneous values of forces from the unsteady flow solution are used in a simultaneous solution of the structural solution. The physical problem under consideration is that of incompressible viscous flow and the mathematical model used is

134

135

the two-dimensional Navier-Stokes equations. The integral form of governing equations with the grid velocity v can be written as: Continuity equation: Momentum equation:

— \pdQ.+ \p(v-v\ndS=0 s dt -In Js jLj p^dH + J p<j){v-vg)»ndS = j rV»ndS + j q$dQ.

(!) (2)

where Q is an arbitrary moving volume and S is the surface area. <>| represents the conserved property per unit mass, and TV§ and q<|) are the diffusive flux and source terms, respectively. When the control volume moves, the space conservation law (SCL) has to be satisfied. This law is related between the rate of change of CV volume and its surface velocity and expressed as:

(3)

— f dV-\v»ndS=0 h « dth

The equations are solved by using the FV method, and the solution domain has been divided into a finite number of CVs that can be of any shape. 2.2 Numerical method In the flow solver, the central difference scheme (CDS) and the midpoint rule is used in approximation of both surface and volume integrals. The unknown integrands of convective and diffusive terms at the cell-face centre have to be interpolated from the control volume (CV) centre. Therefore, cell-face values of variables are approximated by using linear interpolation: «,-^=0£A.+^(l-Ae) (4) where the linear interpolation factor Xe is defined as: K - {Xe~Xp)/{XE~Xp) (5) Subscript e, P and E represent midpoint of east cell-face, CV centre node and CV centre node of east cell, respectively. Equation (3) is second-order approximation at mid-point location. However, in case of non-uniform mesh, the second-order accuracy can be restored by adding a correction term as follows: *.«fc+(V0V(r,-r,) (6) where re and re- are position vectors. The SIMPLE algorithm (Caretto et al,. [1]) is used to solve the pressure-velocity coupling. The linearized momentum equations are solved in each time step, and the mass conservation is imposed on the new velocities by applying a velocity correction that is related to gradient of pressure correction. In the structure solver, two-dimensional structural model is developed to simulate the motion of elastic cylinder under dynamic loadings. A spring, mass and damper model are used to describe the elastic behaviour of cylinder. Dimensionless form of the equation of motion respect to the cylinder diameter can be written as: (

t

\

c.

(8 tr=—^> dx f, dx f, V where a is the damping factor, x=X/D an(* T = ? ^ ~ I D,fn the natural frequency of the cylinder, fs is the vortex-shedding frequency of the rigid cylinder. The Strouhal number St* is defined by St~=f~D/U„ for the rigid cylinder. The applied force coefficient is C =2(2(0/P DU"• • The equation of motion can be then solved with a numerical integration scheme of direct integration method. 2

3. Results and discussion The code has been tested in the rigid cylinder cases in order to verify the ability of the

136

numerical computation to accurately resolve unsteady flow and its associated phenomena such as vortex shedding. For all cases investigated in this study, the time step, AtU_/D was set to 0.02, and about 6 iterations per time step were required for convergence. Figure 1 shows the Strouhal number, st = fd/U_ vs. the Reynolds number. Obtained Strouhal number is compared with the Strouhal number as a function of Re by Williamson [5] and the numerical results of Park et al. [3]. The results are in good agreement. Comparison of the force coefficients (Table 1) also shows discrepancies as well, however these are well within acceptable range compare to the data set in Willden and Graham [4]. Therefore, all the expected features of a rigid cylinder in a cross-flow are correctly described by the current method. In the elastic cylinder case, simulations of the elastic cylinder were carried out in the two degrees of freedom of the structural model for Re=200. The test parameters were the Mass ratio, M , the reduced damping, Sg and the frequency ratio, j If . Figure2 shows the cylinder vibration amplitude versus Sg at M* =1.0 known as th"e Griffin plot [2] of peak-amplitude. The current results were plotted against Griffin plot and previous numerical results with reasonably good agreement. Figure3 shows the time histories of the forces and the cross-flow displacement of the cylinder at ///"=o.65 vvith the phase variation between the lift force and the cross-flow displacement. Also to be noticed is that the phase shift pattern changes case by case, however the anti-phase pattern occurs in lower values of / //* and vise versa in higher values. The oscillations in both directions for flexible cases indicate that the vortex structure in the wake experience dramatic changes in flow compare with the rigid and less flexible cylinder cases. Figure4 shows the results of the root mean square amplitude of cylinder versus the frequency ratio f I f • The plotted coefficient for M =1 and 10 reach their maximum value at the frequency ratios of / fl // s *~0-97 and 1.30, respectively. These frequency ratios are close to the natural frequency of the fluid-structure system. The other parametric values are converged into those for the rigid cylinder case as the frequency ratio f n / / " increases to infinity. 4. Conclusion The responses of elastic cylinder were investigated with the unsteady incompressible Navier-Stokes solver. First, the rigid case was compared with the previous high-resolution calculations and showed good agreement. Based on this validation, the elastic cylinder case was simulated with two degrees of freedom in structural model. The elastic cylinder cases were carried out at Re=200 as this retains the features of the laminar flow and twodimensional feature in the wake. The limit cycle oscillation of the cylinder is clearly captured and the interaction between the fluid and structure are examined. Table 1 Drag and lift coefficients. Current Study Park et al [3] Re 60 80 100 120 140 160

^ d mean

1.386 1.344 1.326 1.315 1.313 1.318

*—l max

0.1161 0.2200 0.3146 0.3941 0.4704 0.5436

*—d mean

*-"l max

1.39 1.35 1.33 1.32 1.32 1.32

0.1344 0.2452 0.3321 0.4103 0.4823 0.5501

137

:

j

^ ^ v

O

in Water

i

x8

| / S~

:

/ °

|

-

I ?,c8o0. •

0

— - Analyteal value

oP

Present study

40

60

80

0

Park etal. (1998)

.!.

. i . . . i ,

100

120

140

160

o

v

IP Ofe

o 0.01

180

Sg

Re

Figure 1 Strouhal number, St vs. Reynolds no.

\l\

I

»0.6f, Sg-1.0&M*-1.0

1 % o

Figure 2 Cylinder vibration amplitude vs. Sg, at M*=1.0

e.

~*~^£f&&&^^

20

40 60 Nondimensionalized time

Figure 3 Force & displacement time history

Figure 4 Root mean square of amplitude vs. f

If'

138 Acknowledgement This work was supported by Korea Ministry of Science & Technology. References 1.

2. 3.

4. 5.

6.

Caretto, L. S., Gosman, A.D., Pantankar, S.V., Spalding, D.B., Two calculation procedures for steady, three-dimensional flows with recirculation. In Proc. Third Int. Conf. Numer. Mrthods Fluid Dyn., (Paris, 1972) Griffin, O. M., "Vortex-excited cross-flow vibrations of a single cylindrical tube," ASME Journal ofPressure Vessel Technology, Vol. 102, pp. 158-166, 1980. Park, J., Kwon, K. and Choi, H., "Numerical solutions of flow past a circular cylinder at Reynolds numbers up to 160," KSME International Journal, Vol. 12, No. 6, pp. 1200-1205, 1998. Willden, R.J.J., Graham, J.M.R., "Vortex-induced vibration of deep water risers," Fluid-Induced Vibration, edited by S. Ziada and T. Staubli, 2000. Williamson, C. H. K. and Brown, G. L., "A Series in 1/Re to represent the StrouhalReynolds number relationship of the cylinder wake," Journal of Fluids and Structures, Vol. 12, pp. 1073-1085, 1998. Williamson, C. H. K., "Vortex dynamics in the cylinder wake," Annual Review of Fluid Dynamics, Vol. 28, pp. 477-539,1996.

CHARACTERISTICS OF AIRFLOW IN A PROTOTYPE OF A HARD DISK DRIVE X. G. XU, N. M. SUDHARSAN AND K. KUMAR Institute of High Performance Computing 1 Science Park Road, #01-01 The Capricorn, Singapore Science Park II, Singapore 117528 Email: [email protected] T. H. YIP, M. A. SURIADI AND E. H. ONG Data Storage Institute, DSI Building, 5 Engineering Drive 1, Singapore 117608 Email: [email protected] A prototype of a hard disk drive (HDD) with three disks co-rotating at 10,000 rpm was numerically simulated by using The Computational Fluid Dynamic (CFD) technique. Three types of turbulence models, i.e. RNG k-E, Realisable k-E and RSM models, have been adopted in simulations, respectively. Special attentions have been paid to the effects of asymmetry in geometry and obstruction from arms upon the flowfield.The aerodynamic characteristics about flow patterns in terms of vortical structure and turbulence kinetic energy were examined and presented. The numerical results may be valuable for designers to efficiently improve their designs of a hard disk drive.

1.

Introduction

Recently, studies about high-density storage media spinning at high speed have been intensively carried out to achieve the demand for high mass-storage and high-speed hard disk drives. However, the increase in rotating speed of disks can not only result in the high power consumption, but also induce strong disturbance to the airflow in the enclosure of the drive. Addressed by Aruga [7], the windage is one of the major causes for such disturbances, which can be grouped into three categories according to its frequency. The first category refers to the generated disturbance when the airflow impinges to the actuator arm. The frequency in this category ranges in less than several hundreds of Hz. Under such a circumstance, the actuator arm is assumed to be a rigid body. The second category is called "disk flutter" due to aerodynamic excitation. Disks oscillate at the frequency ranging from 1.5 to 2.5 kHz within this category. The third category corresponds to high frequency vibration of both arms and suspensions, which is related to high-frequency turbulence of the airflow. All these disturbances are regarded as the major reason for positioning errors that occur in a hard disk drive. It is therefore necessary to understand the pattern of airflow in the enclosure of a hard disk drive so as to improve the overall performance of the system. In a real hard disk drive, the flow is so complicated that it combines the nature of the flow near a rotating disk, the rotating flow near a stationary wall, the flow between two co-rotating disks and the flow crossing bluff bodies. To simplify the problem, most of researchers have focused on features of the co-rotating disk flows within an axi-symmetric enclosure. Obstruction from bluff bodies, i.e. arms, has been investigated by Tzeng et al [1] and Tatewaki et al [6]. In their models, the arms were radially aligned with the disks. To the best knowledge of the authors, there is only one academic paper written by Shimizu et al [5] to deal with a real hard disk drive. In their models, disks were symmetrically positioned along the axial direction within the enclosure. No publications have been found to address the effect of asymmetric geometry upon the flow patterns in a real hard disk drive. The major objectives for current study are as follows: 1) To examine the basic flow characteristics in such a generic hard disk drive; 2) To investigate the effect of asymmetrically positioned disks upon the flow pattern; 3) To study the influence of obstructions from arms; 4) To find out the difference of results predicted by using different turbulence models. In this paper, brief description of the geometrical model of a unique hard disk drive will be given in the following section. Subsequently, the numerical methods and solution procedure adopted in current simulation will be introduced, followed by the discussion of numerical results. At the end, some concluding remarks obtained from current study will be highlighted. 2. Model Description As mentioned in the preceding section, a unique hard disk drive model with three co-rotating disks asymmetrically positioned in the axial direction was simulated. To gain the insight into the effect of their obstruction, the arms along with their accessories were modeled. Figure 1 shows some details about the model. In this model, disks have a diameter of 3.3 inch and are 2 mm thick. The spacing

139

140 between two co-rotating disks is 4.5 mm. The shroud gap is 2 mm. The disks co-rotate at the speed of 10,000 rpm. yrr^^

(a) Isometric view (b) Top view Figure 1. Geometrical model of the hard disk drive

(c) Side view

The computational domain is composed of both tetrahedral and prismatic elements. Three layers of prismatic elements have been generated just adjacent to the surfaces of disks and hubs in order to capture the effect of boundary layers in an accurate way. In addition, care has been taken to use very fine mesh on the surfaces where large pressure gradients may exist. Totally hybrid elements of over three million were created to represent the computational domain. 3.

Numerical Methodology

3.1 Governing equations In current investigations, steady-state simulations of the airflow were conducted. The adopted timeindependent Reynolds-averaged conservation equations of mass and momentum in an inertial reference frame are summarized as follows: • Mass

|0*,)=o

(i)

Momentum d_ dx.

'

_3p+_a_ dx, dx,

du, dx,

duj dx,

2 ~ 3M, 3 " x.

where p is the static pressure, jl is the molecular viscosity, - pu'u.

+^-(-PWt)

(2)

dx. are these Reynolds stresses,

respectively. In order to close Equation (2), these reynolds stresses must be modelled. A common method employs the Boussenisq hypothesis to relate the Reynolds stresses to the mean velocity gradients: -r-F ,du, du, 2 3« nx oxj ox, 3 dxi The Boussinesq hypothesis is used in all k-e models. Under such circumstances, two additional transport equations (for k and £) are solved and the eddy viscosity (//,) is computed as a function of k and e. Generally, fi, is computed from A,

=/*:„-

W

£

One of major differences between the Realizable k-e model and the RNG k-e model is that C^ is no longer constant. The adjustment implemented in the Realizable k-e model is to tune the model to satisfy certain mathematical constraints on the normal stresses, consistent with the physics of turbulence flows. The alternative approach to deal with the turbulent viscosity, embodies in the Reynolds Stress Model (RSM), is to solve transport equations for each of the terms in the Reynolds stress tensor. An additional scale-determining equation (normally for e) is also required. This means that five additional transport equations are required in 2D flows and seven additional transport equations must be solved in 3D. In nature, the RSM model is clearly superior for situations in which the anisotropy of turbulence has a dominant effect on the mean flow. Most of results presented here are obtained with RSM model. The details about the formulation and application of each turbulence model can be found in Ref. [10]. The above governing equations were discretized onto grids in an algebraic form using a finite volume method. Second-order discretization scheme was used for spacial terms. The well-known SIMPLE algorithm together with multi-grid technique was employed to efficiently solve the system of algebraic equations.

141 3.2 Boundary Conditions Dirichlet-type boundary conditions in the form of angular velocity were applied to both rotating and non-rotating surfaces, respectively, according to an inertial cylindrical coordinate system. All the wall surfaces were assumed to be stationary and non-slipping, except that the disks and hubs rotate at the speed of 10000 rpm. The turbulence quantities on the rotating surfaces are in accordance to the inputs for turbulence intensities and hydrodynamic diameters. 4. Results and discussion The "radial pumping" phenomenon can be observed in Figure 2 and Figure 3. The layers near the rotating disks are carried by disks through friction and are radially thrown outwards owing to the action of centrifugal forces. Thanks to the pressure gradients from the side-wall, a pair of counter-rotating vortices are formed at regions near the trailing edge of co-rotating disks. The "radial pumping" flow is compensated by an axial flow established through gaps between the trailing edge disks and side-wall. All the above features have been observed in co-rotating disk flows in an axisymmetric enclosure [1-4]. In the real hard disk model, due to the asymmetry in geometry, the flow pattern varies circumferentially. Therefore, the counter-rotating vortices and inward compensation flow can not be observed any more in Figure 3. The missing of such features can be attributed to the existence of "buffer zone" outside of the rotating region. The "buffer zone" provides the space for the flow to flee away from the confined rotating region, due to inertia effect. As a result, the vortices diminish and large-scale outward flow is observed, instead.

Figure 2 Vector plot at the central plane y=0 (inflow end)

Figure 3 Vector plot at the central plane y=0 (outflow end)

In this model, the spacing between the top cover and top disk is much larger than that between the base and the bottom disk. The flow patterns in the two regions show great differences in terms of vortical structure, flow direction and velocity magnitude. It seems that the asymmetry in geometry enhances the strength of axial flow in such a way that the vortical structures observed near the shroud within co-rotating disk region is altered subsequently. As a result, the axial flow augments the strength of the vortex which is aligned with its flow direction. Meanwhile, it pushes the other vortex inward and reduce its scale. This can be verified in Figure 2. As shown in Figure 4, the obstruction from the operational arm can be found in the form of a pair of vortices downstream of the arm. However, its scale reduces greatly, comparing to that for a radiallyaligned arms, because of the appropriate alignment of arms with flow direction. In addition, the insertion of arms into flow field results in the reduction of flow area. Accordingly, the flow is accelerated when it goes through these regions. It implies that high disturbance can occur regionally. This can be seen in Figure 5 in terms of turbulence kinetic energy.

Figure 4 vortical structure right behind the inserted arms

Figure 5 Contour plot of turbulence kinetic energy o n the plane across the arms

For a rotating/swirl flow, the RNG k-£ model, realisable k-e model and RSM model may be suitable to model turbulence, depending on the strength of swirl. The swirl number is usually used as the criterion how to select an appropriate turbulence model. According to a plane right below the top spinning disk, the swirl number is estimated to be 533. Hence, the RSM model is chosen to simulate the turbulence. It should be noted that the swirl number varies in the axial direction. In order to

142 quantify the discrepancies caused by wrongly selected turbulence model, the other two models, RNG k£ model and realizable k-e model, are also adopted in the current simulation. Figure 6 indicate the difference of turbulence kinetic energy on the middle plane between two co-rotating disks for each turbulence model. It is obvious that differences lie on both the magnitude and the distribution pattern of the variable.

Figure 6 Contour plots of TKE on the middle plane between middle disk and bottom disk

5. Conclusion The folio wings are the major conclusions drawn from this study. 1) Flow features observed in co-rotating flows in an axisymmetric enclosure still can be observed in the real HDD environment, but are rigorously altered due to the asymmetry in geometry. 2) The asymmetry in geometry in both axial and circumferential directions enhances the strength of axial flow, which consequently interacts with vortical structures in the enclosure; 3) The inserted arms not only result in vortices but regionally augment the strength of turbulence kinetic energy; 4) Care should be taken to choose an appropriate turbulence model to simulate turbulence, because wrongly selected model can provide deviated results embodied in both turbulent and mean terms. 6. Acknowledgement The work is a part of the collaboration project between IHPC and DSI, Singapore. Many thanks to Mr. Li Youcheng for providing the geometrical model. Reference 1. Tzeng H. M. and Humphrey J. A. C , Corotating disk flow in an axisymmetric enclosure with and without bluff body, Int. J. Heat and Fluid Flow 12 (1991) pp. 194-201. 2. Chang C. J., Humphrey J. A. C. and Greif R., Calculation of turbulent convection between corotating disks in axisymmetric enclosures, Int. J. Heat Mass Transfer 33 (1990) pp. 2701-2720. 3. Humphrey J. A. C , Chang C. J. and Schuler C. A., Unobstruted and obstructed rotating disk flows: A summary review relevant to information storage systems, Adv. Info. Storage syst. 1 (1991) pp. 79-110. 4. Abrahamson S. D., Chiang C. and Eaton J. K., Flow structure in head-disk assemblies and implications for design, Adv. Info. Storage syst. 1 (1991) pp. 111-131. 5. Shimizu H., Tokuyama M. Imai S. Nakamura S. and Sakai K., Study of aerodynamic characteristics in hard disk drives by numerical simulation, IEEE Trans. Magn. 37 (2001) pp. 831836. 6. Tatewaki M., Tsuda N. and Maruyama T., A numerical simulation of unsteady airflow in HDDs, FUJITSU Sci. Tech. J. 37 (2001) pp. 227-235. 7. Aruga K., 3.5-inch high-performance disk drives for enterprise applications: AL-7 series, FUJITSU Sci. Tech. J. 37 (2001) pp. 126-139. 8. Herrero J., Giralt F., and Humphrey J. A. C , Influence of the geometry on the structure of the flow between a pair of corotating disks, Phys. Fluids 11 (1999) pp. 88-96. 9. Humphrey J. A. C , Schuler C. A. and Webster D. R., Unsteady laminar flow between a pair of disks corotating in a fixed cylindrical enclosure, Phys. Fluids 7 (1995) 1225-1240. 10. Fluent. Inc, FLUENT 6.0 User's Guide, 2 (2001)

LINEARIZED NAVIER-STOKES EQUATIONS AND ITS APPLICATIONS IN UNSTEADY AIRFOIL FLOWS RENJING CAO Department of Mechanical Engineering, The Hong Kong Polytechnic University Hung Horn, Kowloon, Hong Kong, E-mail:Renjing. Cao @polyu. edu. hk

The unsteady Navier-Stokes equations in transformed coordinates have been linearized in terms of the harmonic perturbation method. A new scheme has been proposed for solving the linearized Navier-Stokes equations. In terms of the developed linearized Navier-Stokes equations, the numerical investigation is performed on the unsteady flows around a cascade of biconvex airfoils executing torsion mode oscillations with realistic reduced frequencies in a compressible flow field. The calculated results indicate that the incidence angle and the reduced frequency have great effects on the chord-wise distributions of the unsteady pressure around the airfoil surface.

1

Introduction

There are two distinct approaches for solving unsteady airfoil flows: the nonlinear time-marching approach and the time-linearized approach. Time-linearized approaches have been widely used in aeroelastic and aeroacoustic analyses, in which the linear perturbation is superimposed on a steady solution [1]. While the time-linearized Euler methods [2] (Hall et al., 1994) was used to predict transonic unsteady cascade flows. Ning and He[3, 4] (1998) developed a time-linearized Euler method concerning about the nonlinear effects on unsteady flows. It is well known that the strong viscous effects play essential roles in the flow separation and shock-boundary layer interaction. It is desirable to develop a method not only including high computational efficiency as the conventional time-linearized method, but also representing the effects of time-marching Navier-Stokes method. In this paper the unsteady Navier-Stokes equations in transformed coordinates have been linearized in terms of harmonic perturbation technique. And a new difference scheme has been developed for solving the linearized Navier-Stokes equations. 2

Linearization of Unsteady Navier-Stokes Equations

Unsteady Navier-Stokes Equations. The unsteady compressible two-dimensional Reynolds-averaged Navier-Stokes equations in transformed

143

144

coordinates can be written in a nondimensional form as dQ dr

^ P dG_ d£ dr}

_L * . <*L

(1)

Linear Unsteady Navier-Stokes Equations. If the change of temporal and spatial terms of a flow is very small compared to the steady value, then Eq.(l) can be linearized. Assume that the flow field can be divided into two parts: a steady flow plus a small harmonic perturbation, i.e. f=f Here /

(2)

+ fe"

is the amplitude and co is the frequency of the small harmonic

perturbation. Accordingly the grid is also assumed to undergo a small harmonic deformation about its steady position. Substituting the perturbation series into the nonlinear Navier-Stokes equations, and the linearized amplitude Navier-Stokes equations can be formulated as dQ dr

dF d£

dG dr)

1 dR dS Re dC, dr)

(3)

where Q = j[p pu + pu pv+pv E] T , and 0 pV+pV puO + puU + puU +Ctp puV +puV + puV + 7]xp R = J F=J G=J %t, + *„Cy pvU +pvU+pvU +(yp pvV + pvV+ pvV +7]yp (E + p)U + (E + P)U-£p (E + p)V+(E + p)V-ri,p K&+S&

0

pU+pU

3

S =J

?vnx + T„iy _RiT}J+Si7]y_

Numerical Method

The coefficients of the linearized unsteady Navier-Stokes equations inherently depend on the solutions of the steady Navier-Stokes equations. The steady equations are solved by using a conventional finite volume time-marching method. The numerical scheme is written in a cell-centered finite-volume formulation where the inviscid fluxes are differentiated according to the so-called MacCormark-scheme (MacCormack 1984), while the viscous fluxes are computed with central differences. By using a step technique, a semi-implicit numerical scheme has been proposed for solving the linearized unsteady Navier-Stokes equations. The mass, momentum, and energy equations in Eq.(3) are solved step by step. The differences algorithm for the solution of Eq.(3) can be expressed as

145 Predicting step:

AQl = -Ar(A f + F" M + ArttGn1.k)-C"i.tAT

+ -^(A(tR"1.t

+ A„ + S" M )

^

1

SQjt* = L"*'j,t - ATM"* 'j,k

Correcting step:

AQ;;= =-Ar(A f _F" + ' M + A^G^J,k)-C"*'1,tAT

+ ^;(A(J"*'].t

+A,_S" +i J , t ) ( 5 )

1

8Q% = L"*
where

Af+ F M

= FJ+ll -PIJk

A G =G -G n-

j-t 1 2

M",.

j.t

,

A,+GM =GM+1-GM

• Define A=—,

;.*-i

dQ

iJc

,

Ac_Fj,t = FM - F^

,

B = — > the numerical fluxes L"t2 and dQ

in Eq.(5) for predicting step are obtained by (/-ArAf+A)(/-AEVfl)<s3;;" = AG;,

(6)

and the numerical fluxes L"^1 and M"^1 in Eq.(6) for correcting step are calculated by (/ + ATA(_A)U+ATA,_B)52;;'

4

= Ag;;>

(7)

Results and Discussion

A numerical test has been carried out for unsteady flow around an oscillating cascade. The cascade consists of nine uncambered biconvex airfoils with a chord of 7.67 cm, a thickness-to-chord ratio of 0.076, a solidity of 1.3, and a 53-deg stagger angle [5]. The unsteady pressure coefficient is defined as the complex number

p cp = —_ — 2 ' where 0 is the oscillatory amplitude. The final unsteady 2 r'm in

pressure data are in the form of a complex dynamic pressure difference coefficient, Acp defining the nondimensional pressure difference across the chord length of an airfoil. It is shown in Fig.l that the incidence angle has a large impact on unsteady blade loadings. Reduced frequency significantly affects the magnitude

146

of the dynamic pressure coefficient greatly. Increasing reduced frequency k from 0.223 to 0.390, only causes small changes of the magnitude of Acp near the leading edge and near the trailing edge, but with large changes as increasing the reduced frequency k from 0.390 to 0.557. Hence, the linearized Navier-Stokes method proposed in this paper accounting for the viscous effects is able to give helpful predictions typically at large incidence angle.

(a)

(b)

Figure 1. Dynamic pressure difference coefficient distributions, (a) (X =7, O =-90-deg, (b) (X =0, G =-90-deg

References 1. Lindquist D. R., Giles M. B., On the validity of linearized unsteady Euler equations for shock capturing, AIAA Journal, 1991; 31(1) pp. 46-53. 2. Hall K C , Clark W. S., Lorence C B, A linearized Euler analysis of unsteady transonic flows in turbomachinery, ASME Journal of Turbomachinery, 116 (1994) pp 477-488. 3. Ning W., He L., Computation of unsteady flows around oscillating blades using linear and nonlinear harmonic Euler methods, ASME Journal of Turbomachinery, 120 (1998) pp 508-514. 4. He L., Unsteady flow in oscillating turbine cascades: part 2 computational study, ASME Journal of Turbomachinery, 120 (1998) pp 269-275. 5. Buffum D. H., Fleeter S, The aerodynamics of an oscillating cascade in a compressible flow field. ASME Journal of Turbomachinery, 112 (1990) pp 759-767.

COST-EFFECTIVE FORMULATION OF A FINITE-ELEMENT MODEL FOR ATMOSPHERIC DISPERSION OF DENSE GASES M.M. El-Awad College of Engineering, Universiti Tenaga Nasional (UNITEN), 43009 Kajang, Selangor, D.E., Malaysia Tel: (03) 8928 7251 Fax: (03) 8926 3506 e-mail: [email protected]

ABSTRACT This paper deals with the efficient formulation of a three-dimensional finite element (FE) model for the dispersion of denser-than-air gases into the atmosphere. The model, which applies the k-e turbulence model and takes into considerations the effects of heat transfer and density stratification in its mathematical formulation, requires cost-effective numerical methods in order to minimise its computer-time and memory requirements. The paper focuses on the most time-demanding step in the numerical solution, viz,, the procedure that assembles the global FE system of equations. The assembly-process has been suitably structured so as to take advantage of the vector processing capability of modern supercomputers. The possibility of using reduced, one-point quadrature for costeffectiveness has also been investigated. Computations on a supercomputer and a PC demonstrate the success of the techniques adopted in reducing the time requirements of the model by a significant factor. The paper also reports on the model's validation against experimental data that confirms the accuracy of its transient solution. KEYWORDS: Dense-Gas Dispersion, Finite Element Method, Vectorization, Reduced Integration, Conjugate-Gradient Method. 1. INTRODUCTION The FEMSET model of Betts and Haroutunian [1] was one of the earliest 3D hydrodynamic models for the atmospheric dispersion of dense-gases. The model, which solves a set of partial differential equations that govern the conservation of mass, momentum, and energy of the turbulent atmospheric flow, was one of the first dense-gas dispersion models to adopt a suitably modified version of the k-e turbulence model. It was also one of the earliest modes to adopt the finite element method for the numerical solution, which is particularly suited for modelling the flow in a complex geometry. However, the early attempts to validate the model against some relevant experimental data showed that the computer requirements of the model were prohibitive even on a supercomputer. This made it practically impossible to adequately validate the model or seek further refinement of its mathematical or numerical formulation. FEMSET adopts a numerical solution scheme largely based on that described by Gresho et al [2]. Typically, the computer requirements of the model were those for the 26x18x12 grid used by Betts and Haroutunian [1] in their simulation of the Burro-8 LNG dispersion.

147

148 The model required 810 time steps to simulate 227 second in real time. Each time-step took about 76 seconds on the Cyber 205 supercomputer used in the simulation (i.e. about 18 hours for the whole simulation). Timing of the different steps showed that the procedure that assembles the global algebraic system alone consumed 48.0 seconds and, therefore, was the most time-consuming step in the scheme. On the other hand, the coefficient matrix of the pressure equation could not fit into the memory of the Cyber 205 and an out-of-core solver had to be adopted to solve the equation. The present paper introduces a modified model based on FEMSET, which retains the advantages of FEMSET's mathematical formulation but introduces certain numerical modifications that reduce its computer time and memory requirements. The modifications discussed below focus on the two most demanding steps, aiming to introduce costeffective methods that effectively minimise the computer time and memory requirements of the model. It will be shown that the computer time of the assembly procedure could be reduced by more than two orders of magnitude through proper structuring of the assembly procedure, vectorization of the assembly code, and the use of single-point quadrature instead of two-point qudarature. On the other hand, the memory requirements of the pressure solution could be significantly reduced by applying a vectorized preconditioned, conjugate-gradient solver for the equation. 2. THE MODIFIED MODEL Following the usual finite-element practice, the FEMSET model forms the global algebraic system by assembling smaller subsystems formed at element level. The elemental subsystems themselves are formed from certain integrals that approximate the different terms in the governing equations. The integration is dome numerically using two-point Gauss quadrature, which requires eight integration points (2x2x2) in the three directions. In the modified model the computational time of the assembly procedure could be reduced by using one-point. On modern vector processor, the computer time could be reduced further by vectorization [3]. Table (1) shows the computer time of the original FEMSET model and that of the modified model (FEMSET-V) on a Cyber 205 supercomputer. The table shows the time taken in a single solution step (or major iteration) of both models for the Burro-8 simulation in which a grid of 26x12x18 elements was used. For the vectorizable code the table shows the computation times in both scalar and vector modes and with two-points or one point quadrature. With the vectorized assembly procedure, most of the computer time now goes into initialising the load-vector arrays, solving the pressure equation, adding the pressure correction to the velocity loadvector, advancing the solution to the new time-level, assigning the appropriate boundary conditions, and estimating the new stability-determined time-step.

Table 1. Computer time (seconds) for the assembly procedure by FEMSET and FEMSET-V on the Cyber 205. Code FEMSET With two-point qudrature FEMSET-V With two-point quadrature FEMSET-V With one-point quadrature

Scalar mode

Vector mode

48.0

-

27.0

6.92

2.9

0.4

149

Fortunately, apart from the pressure solution, all these parts are automatically vectorizable by the Cyber 205 compiler or can be vectorized with minor changes to the original code. El-Awad et al [4] developed a solver that applies a diagonally-scaled version of the Conjugate-gradient (CG) method. Compared to other pre-conditioning schemes, this algorithm maintains the simplicity, low-memory requirement, and vectorizability of the basic CG method. This solver is adopted by the modified model to solve the algebraic system. 3. VALIDATION OF THE MODEL A simulation of test III, which was the most stable of the three tests reported by McQuaid [5], was reported by Betts and El Awad [6] who used the original FEMSET model and a 26x12x1 grid. For the purpose of validating the present model, the same test is considered here. Being two-dimensional in nature the test could be simulated with the 2D version of the model developed by El-Awad and Abas [7]. This code offers the user the choice between two-point quadrature or one-point quadrature and a direct solver for the pressure equation or the CG solver. A portion of the flow field in the wind-tunnel was simulated which extended from 1.0m upstream of the centre of the line-source to 6.5m downstream of it and about 0.5m high above the floor of the wind-tunnel. The simulated region was covered with a 40x32 grid as shown on Fig. (1). The grid is shifted above the floor by a distance Az which is large enough to place the solution domain out of the viscous sub-layer and into the fully turbulent flow. The grid is also packed horizontally close to the floor and vertically in vicinity of the line source to allow better resolution of the rapidly changing flow parameters there. In the vicinity of the line source the vertical lines of the grid are fanned out so as to increase the diffusion-limited time step (Af) of the solution without jeopardising its stability.

Figure (1): The grid.

Using the 2D code, two simulations were performed on a personal computer (PC). In one of the simulations two-point quadrature and a direct solver for the pressure equation were used. In the second simulation one-point quadrature and the CG solver were used. The results of the two simulations were compared with the experimental data for the gas concentration at the floor of the wind-tunnel and the height above the floor at which the gas concentration drops to half its value at the floor (zi/2). The comparison showed that model's predictions, with its original or modified formulation, compare well with the measured data over most of its range. The results also show that the modified model gave accurate results compared to the original formulation. The accuracy of the modified model deteriorates slightly near the outlet where the grid is rather coarse. In order to assess the accuracy of the modified model in predicting the development of the dense-gas plume, the transient solutions of the original and the modified model were

150 monitored at selected points downstream of the source. Figure (2), which compares the two transient solutions, also demonstrates the accuracy of the modified model. Also here, the modified model becomes less accurate closer to the outlet.

tion(

s« a

60 40

rf

20

o

x = 0.1m

80

con cen

Each of the two simulations took about 10 thousands solution steps (or major iterations) to reach the steady-state solution. For the original numerical scheme, each time step took about 3.2 seconds on the PC, out of which the direct pressure solver required only 0.17 seconds. In the simulation that used onepoint quadrature and the CG solver, the computer time changed from one time step to another because of the iterative nature of the CG method. On average, the simulation took 2.67 seconds per time step, out of which 1.63 seconds were spent in the CG solver.

f

x = 1.0 m

r

I yS 5

10 15 Time (seconds)

20

25

Figure (2): Accuracy of the transient

ACKNOWLEDGEMENT The author expresses his thanks and gratitude to Dr. P. L. Betts and Dr. V. Haroutunian for providing the original FEMSET code and for their help throughout this work. REFERENCES 1. Betts, P.L. and V. Haroutunian. "Finite Element Calculations of Transient Dense Gas Dispersion". Proceedings of I.M.A. Conference on Stably-Stratified Flow and Dense Gas Dispersion (J. Puttok, ed), Oxford University Press, 349-384, 1988. 2. Gresho, P.M., Chan, ST., Lee, R.L. and Upson, CD. "A modified Finite Element Method for solving the time-dependent three-dimensional incompressible NavierStokes equations. Part 1: Theory. International Journal for Numerical Methods in Fluids", 4, 557-598, 1984. 3. El-Awad, M.M., A. Anuar, and M.Z. Yusoff. "Programming the Finite-Element Method for Vector Processors", Proc. the Experimental and Theoretical Mechanics Conference (ETM 2002), Bali, Indonesia, March 18-19, 2002, 160-166, 2002. 4. El-Awad, M.M., M.Z. Yusoff, and M.H. Boosroh. "A conjugate Gradient Solver for Large and Sparse Systems of Algebraic Equations". Proceedings of LASTED International Conference on Modelling and Simulation (MS 2002), Marina del Rey, California, May 13-15, 2002, paper no. 353-080, 2002. 5. Betts, P.L. and M.M. El Awad. "Stability Effects on Dense Gas Dispersion: Validation of Turbulence Model", Proc. 7th Int. Conf. on Numerical Methods in Laminar and Turbulent Flows, Stanford, July 15-19, 1991, Pineridge Press, 1991. 6. McQuaid, J. "Some Experiments on the Structure of Stably-stratified Shear Flow. Safety in Mines", Res. Est., U.K., Technical Paper P21, 1976. 7. El-Awad, M.M. and F. Abas. "Development of a 2-D Finite-Element Code for DenseGas Dispersion on a Personal Computer". Proceedings of the 5th Biennial IANGV International Conference and Exhibition on Natural Gas Vehicles (NGV'96), Kuala Lumpur, Malaysia, September-October, 1996.

SPATIAL SIMULATION OF 2D TOLLMIEN-SCHLICHTING WAVE OVER VOLUME BASED VISCOELASTIC LAYER

Z.WANG*, K.S. YEO** AND B.C.KHOO** *Temasek Laboratories, National University of Singapore, Singapore 117576 E-mail: tslwzdpnus. edu.sg **Department of Mechanical Engineering, National University of Singapore, Singapore 119260 E-mail: mpeyeoks&nus. edu.sg; mpekbc(a)jius. edu.sg Spatial simulation is performed to study the 2D Tollmien-Schlichting wave traveling in Blasius boundary layer over volume based viscoelastic wall. Computational results show that well tailored viscoelastic compliant layer do stabilize unsteady waves traveling in Blasius boundary layer , hence, confirmed the transition-delay effects predicted by linear stability theory.

1

Introduction

Compliant walls which can be practically employed for transition-delay are most likely viscoelastic layers. Researches on compliant or flexible walls had started as early as 60's ([1,2]). After half century's effort, 2D and 3D, parallel and nonparallel, convective as well as absolute instability in Blasius boundary layer over volume-based viscoelastic layer have been successfully investigated ([3]). However, as many simplifications and assumptions were adopted, those theoretical results are still awaiting for further confirmations by computational and experimental means. The first attempt to computationally confirm the transition-delay effect of volumebased layer was done by Hall ([4]) using temporal simulation. Although much information about wall deformation was presented, no detail results of flow field were provided. Moreover, temporal simulation is less accurate for CIFI wave prediction due to its intrinsic truncated domain and streamwise periodic boundary conditions. With the rapid growth in computing power, the spatial simulations of unsteady boundary layer over surface-based compliant wall model such as membranes and plates have been achieved ([5,6,7]). In this paper, the method developed in [7] is further extended for the spatial simulation of 2D unsteady boundary layer over a viscoelastic layer. Additionally, preliminary results of 3D transitional boundary layer over compliant membranes are also presented. 2

Methods

As the fluid-structure algorithm for simulating interactions between unsteady boundary layer and compliant walls has already been described in [7], here we shall primarily discuss the results for various walls. The material of the solid layer is assumed to be elast-dialatational and Voigt-deviatoric. Viscoelastic layers with finite thickness h and finite length is placed at xcs <x <xce, -h
151

152 92£ Kv v ^2
+Y

d

d2rj

d2rj

Y

Y

Ysid

d2rj

d^

Following [7], solid and fluid governing equations are nondimensionalized by the same reference length at the ribbon location. The viscoelastic layer investigated here possess the following nondimensional parameters: p(u=1.0, Ysr(u =2.0, Ysi(L) =0.0098, Yr(L)=150 where the wall reference Reynolds number Re(u=2xl04. Zero displacement boundary conditions are adopted on all the three sides of the compliant layer in contact with the rigid base. At the top surface of the viscoelastic layer, the stress boundary conditions are given,

Y„ M dJI, Ysi d M drj. T= — ( — +—) +— — ( — +— ) ' 2 dy dx 2 dt dy dx 0- =

m

(3)

y s r ^ + r s i .l(^) + I ( y i / - y J ( ^ + ^ ) - ^ L l ( ^ i + ^ ) , sr

dy

" dt dy

3

sr

dx

dy

3 dt dx

(4)

dy

where the shear stress T and normal a are computed from the perturbation fluid velocity fields (u, v) and pressure field pw at the mean flow-wall interface. 1 .du T =

(

Re dy

3vN

/c\

)w

H

(•>)

dx

a=

(6)

-p*+hihRe dy

The non-slip kinematic conditions at the interface have the following linearized form: x < xs or x > xe 0, (7) du w = ' xs<x<xe dt

fo, = < w

{dt'

x < xs or x > x xs<x<xe

(8)

Uniform grid is employed for the solid layer. Three-point backward scheme is used for the temporal discretization of the wall equations. Spatial derivatives are discretized using fourth-order accurate central difference scheme.

153

3

Computational Results and Discussions

Two cases are performed to study the compliant wall response to different frequencies perturbations (F=180 and F=120). Simulations are stopped after 20 Tollmien-Schlitching(TS) waves' periods. Velocity distributions at different height y compared against those of TS waves over rigid wall in Fig. 1 and Fig2.

(a)

(b)

Figl Velocity u at y=l.05(a) and y=0(b) (solid line for easel, dashed line for rigid wall)

' * 'J l! i! >! '! i! ',''' ','

(a)

(b)

Fig2 Velocity u at y=\ .05(a) and y=0(b) (solid line for case2, dashed line for rigid wall)

Parallel linear stability theory -

•

*

U

r-

"tV=°

uaty=2

-

.

/

f\/\

- ParallalllnearstablHiylheory

\

m "",""m" maximum v - maximumV

'"'

,. /\, '

\\ ,i >I I \ | . if \ A

/ V x/

- \ - \ -

1

/v -i

!A/ ii

^ ±< ,

600

700

„ k A

900

,

,

,

•-...1/

1

1000

;

i

1100

HI

A M

1'. i A i

iv:

/.v/V

i iy /
:

>M 'I '; d >'Xi /

SMT> •A>.\\ i\ ;i;i'E600

700

\;/ 900

1000

1100

Fig.3 Comparisons of different qualities's amplitudes and amplification curve calculated from linear stability theory.

In Fig.3, the amplitudes of different qualities are plotted together with the amplification curve predicted by linear stability theory. Nonparallel effects can be seen clearly from this

154 figure. Because of the coexistence of TSI and C M , neutral points can only be roughly estimated from this figure. Nevertheless, computational results for both two cases show that this viscoelastic layer can not only reduce the length of unstable region but also decrease the amplification rate of TS waves.

FigAVelcity u contour during subharmonic breakdown (a) over rigid wall and (b) over compliant membrane

In addtion, the simulations of 3D subharmonic breakdown over rigid wall and compliat membrane have also been performed. Preliminary results are presented in Fig 4 and show that nonlinear subharmonic breakdown can be delayed by this finite membrane. 4

Acknowledgements

The financial support of the scholarship from National University of Singapore(NUS) and the MINDEF-University Ph.D. Top Up (Grant No.: C-014-000-109-531) from Defense Science &Technology Agency (DSTA) is gratefully acknowledged. References 1. Landahl M.T., Kaplan R.E., Effect of compliant walls on boundary layer stability and transition, AGARDograh 97, (1965), pp.363. 2. Nonweiler T., Qualitative solutions of the stability equation for a boundary layer in contact with various forms of flexible surface, Aeronaut. Res. Council Rep., (1963) CP622. 3. Yeo K.S., Zhao H.Z., Khoo B.C., Turbulent boundary layer over a compliant surface: absolute and convective instabilities. Journal of Fluid Mechanics, vol.449, (2001), pp.141. 4. Hall M.S., The interaction between a compliant material and an unstable boundary layer flow, Journal of Computational Physics, vol.76, (1988), pp.33. 5. Davies C , Carpenter P.W., Numerical simulation of the evolution of Tollmien-Schlichting waves over finite compliant panels, Journal of Fluid Mechanics, vol.335, (1997), pp.361. 6. Wiplier O., Ehrensein U., Numerical simulation of linear and nonlinear Disturbance evolution in a boundary layer with complaint walls, Journal of Fluid and Structures, vol.14, (2000), pp.157. 7. Wang Z., Yeo K.S., Khoo B.C., Numerical simulation of 2D Tollmien-Schlichting waves over finite membrane, Proceedings of 9th Annual Conference of the CFD Society of Canada, (2001), pp453.

NUMERICAL STUDY OF PASSIVE DISPLACEMENT VENTILATION 'HEE JOO POH, 2 JING LOU AND J KURICHI KUMAR Institute of High Performance Computing, 1 Science Park Road, #01-01 The Capricorn, Singapore Science Park II, Singapore 117528 E-mail: 'pohhi @ihpc.a-star.edu.sg ; [email protected] J [email protected] A parallel 3-D computational fluid dynamics simulation was applied to the passive displacement ventilation in a large warehouse. The displacement ventilation performance and efficiency was numerically investigated under various operating / environmental conditions, such as internal heat load, external heat transfer, fresh air-supply, area of the air outlets, and staggering of the air outlets / cooling coils. The detailed air flow and temperature stratification inside the warehouse were analysed and discussed.

1

Introduction

In displacement ventilation system, the cool air is supplied into the space from a lower level at lower velocity. The cooler air spreads across the floor and forms a 'pool' due to gravity effect. When the cooler air encounters a heat source (people, equipment etc.), the air then warms and rises as a plume due to the temperature difference and resultant buoyancy force. As a result, the vertical temperature stratification is built up in the space to achieve the cooling effect [1]. Unlike the active system, the passive displacement ventilation does not require mechanical devices to circulate air into space (Fig.l).To the authors' knowledge, there are still no clear guidelines for passive displacement ventilation design and application. In this study, the passive displacement ventilation for a large warehouse is investigated with a three-dimensional CFD simulation. Several aspects of the displacement ventilation performance, such as internal heat load, external heat transfer, fresh air-supply, area of air outlet and staggering of the air outlet and cooling coil are studied. The air flow and temperature stratification are also analysed and discussed. 2

CFD simulation methods

In the study, an incompressible turbulent flow with heat transfer was applied to the passive displacement ventilation by using CFD method. Boussinesq approximation was used to calculate the buoyancy force by assuming that the air density changes only with temperature. The Re-normalization Group (RNG) K-e model is used in the study. Unlike the standard K-e model, the RNG model was derived with a more rigorous statistical technique, and the model constants are derived analytically. The RNG model can provide improved modeling accuracy for low-Reynolds-number flows, such as the displacement ventilation in the present study, though at a slightly high computational cost. The cooling coil unit was modelled as a heat exchanger element, which absorbs the heat from warm air. It is a lumped-parameter description and defines both the heat

155

156

transfer coefficient and the pressure drop as functions of the velocity normal to the heat exchanger. 3

Warehouse model description

The 3-D CFD warehouse model is shown in Fig.2. The warehouse has a dimension of 34 m (L) X14.55 m (W) xl4.2 m (H). The temperature profiles along two vertical lines, i.e. front profile line and back profile line, as shown in Fig.2, are plotted for simulation result and discussions. The heat load from each crane is 8 kW and is assumed to be distributed on the four vertical sides of the crane. Each conveyor belt has a heat load of 1 kW. The total heat load due to lighting is about 10 kW (10x1 kW) and is modeled as an area source of 0.5 m x 0.5 m each. There are total of 40 cooling coil units installed along two long-sided walls (APSD and BQRS). The fresh supply air is pumped into the warehouse from 20 pipes located behind each front cooling coil on one long-sided wall (APSD). The total fresh air supply rate is 1400 mVhour. The outlets of the displacement ventilation system are located at the bottom. In the CFD simulations, different arrangements of the outlets and cooling coils are considered. The various simulation cases are summarized in Table 1. Table 1: Warehouse simulation cases Case No. Air outlet area 29.5 m x 0.3 m 1 29.5 m x 4.2 m 2 Staggering alternate outlet at 7.0m height 0.3 mxl.4 m 3 4

Model result and discussion

4.1 Air outlet area and temperature stratification: Case-1 & Case-2 Case-1 and Case-2 have air outlet height of 0.3m and 4.2m respectively. Fig.3 shows that the clear temperature stratification is produced in the warehouse. The larger temperature gradient is observed in the lower zone of 2 m. Cooler air penetration to the central warehouse on the floor can be achieved. The rising hot air plumes near the crane and conveyor belts at 7.5 m are seen in the airflow plot in Fig.4. An upper zone air circulation is also observed. A comparison between the temperature stratifications of the Case-1 and Case-2 on the front and the back profile line is shown in Fig.5. The maximum temperature difference between those of the floor and ceiling is about 4-5 °C for both cases. Case-1 has a temperature gradient of 1.5°C/m in the lower zone of 2m, while Case2 shows a smaller temperature gradient of 0.63 °C/m in the lower zone of 4 m. 4.2 Effect of air outlet staggering: Case-1 & Case-3 In this case study, instead of mounting all outlets on the floor, they are staggered alternatively on the floor and at 7.0 m height respectively. Compared to the temperature stratification of Case-1, the staggering outlet shows a lower cooling efficiency (Fig.6). At the bottom, the temperature in Case-3 is about 2°C

157

jW'MM.-w/MM-.v/A'-

G — — A: heigth of the ' cooling unit J--

fall shaft

f - H : fall shaft height I (lower edge cooling unit i - upper edge air outlet)

course of the room air temperature

©

, _ | J._.Bj height of the air outlet -, i ,

_

" air outlet

Figure 1: Section View of Passive Displacement Ventilation.

Figure 4: Air flow velocity vectors on the cutplane of Y = 6.725 m along the storage racks of Case-1.

Temperature Stratification for Case 1 & Case 2

27 26 §

25-

B 24 CO

—•—Case 1 - front

I 23 I 22 -

A Case 2 -front —M—Case 2 - back

21 0

Figure 2. CFD geometry model set-up of the warehouse.

2

4

6 8 H e i g h t (m)

10

12

14

Figure 5: Temperature stratification comparison between Case-1 and Case-2 on the "front profile line" and the "back profile line". Temperature Stratification for Case 1 & Case 3

O 25 = 24

^P**«=^ /

-H

—•—Case 1 -front -»— Case 1 - back

—

• • •

-**— Case 3 - back

Figure 3: The temperature contours on cut-plane of Y = 6.725 m along the storage racks for Case1. Hot air plumes are observed near crane and the conveyor belts at 7.5 m height.

Figure 6: Temperature stratification along the front profile line and the back profile line for Case-3 and Case-1.

158 higher than that of Case-1. From 2 m to 6 m, Case-3 and Case-1 show similar temperatures. Above 8 m, the temperature in Case-3 is about 1.5°C higher. The model simulation shows that unlike the air penetration towards the central warehouse from the bottom air outlets, the supply air from the higher staggered outlets behaves like a waterfall. Once the cool supply air leaves the higher staggering outlets, it sinks down immediately due to density effect. In addition to the above case studies, other factors on displacement ventilation performance, such as external heat transfer, various internal heat loads and space volume, have been investigated. Some of the results are briefly summarised in the next section. 5

Conclusions

The CFD passive displacement ventilation analysis for the large warehouse has been conducted. The factors such as the internal heat load, external heat transfer, the cool air outlet size, staggering of the air outlets and cooling coils have been investigated with numerical modelling. In the large warehouse, the vertical temperature stratification is not linear. The temperature gradient for lower zone (height < 3m), is larger as compared to that of higher zone (height > 3m). The 4.2m-height of outlet opening is undesirable as it reduces the heat removal from the sources. Staggering the air outlet / cooling coils do not perform as efficiently as the conventional layout, as the cooling zone at the higher outlets is only constrained locally around the outlet opening. The cool air emerging from the higher outlets behaves like a waterfall, directly flowing downwards, due to higher density. It is also observed that with the staggering outlet configuration, a central converged airflow occurs from the higher outlets. References 1. Jackman, P.J., Displacement ventilation, (1990) Imprint Bracknell, Berks. : Building Services Research and Information Association. 2. Yuan, XiaoXiong., Chen, Qingyan. and Glicksman, Leon.R. Measurements and Computations of Room Airflow with Displacement Ventilation. ASHRAE Transactions, 105(1), (1999), pp.340-352. 3. Yuan, XiaoXiong., Chen, Qingyan. and Glicksman, Leon.R. Models for Prediction of Temperature Difference and Ventilation Effectiveness with Displacement Ventilation. ASHRAE Transaction, 105(1), (1999), pp.353-367. 4. Launder, B,E., Reynolds, W.C., Rodi, W., Matthew, J., and Jeandel, D., Turbulence model and their application, (Eol. Eyrolles, Saint German), Paris Vol. 2, (1984) 5. Xu, W.,. New turbulence models for indoor airflow modeling. (1998) Ph.D thesis, MIT, Cambridge, USA. 6. Chen, Q. Comparison of different K-e models for indoor airflow computations. Numerical Heat Transfer, Part B, 28, (1995), pp.353-369. 7. Chen, Q., 1996. Prediction of room air motion by Reynolds stress models. Building and Environment, 31(3), (1996), pp.233-244. 8. Niemela, R., Koskela, H. Air flow patterns in a large industrial hall with displacement ventilation. Proceedings ofROOMVENT, 3, (1996) pp.363-370.

COMPUTATIONAL FLUID DYNAMICS SIMULATION OF THE DISPERSION OF AIRBORNE CONTAMINANTS IN A WORK ENVIRONMENT S.M. SALIM Occupational

Health Department,

18 Havelock Road #05-01, Ministry of Manpower, 059764 E-mail: [email protected]

Singapore

GEORGE XU Institute of High Performance

Computing, 1 Science Park Road, #01-01 The Capricorn, Science Park II, Singapore 117528 E-mail: [email protected]

Singapore

There is an increasing trend of exposure to solvents and other volatile chemicals in indoor environments especially within "open concept" air-conditioned premises. This apparent difference between a flexible workspace and a relatively rigid ventilation system could lead to problems in the distribution of air for the occupants of the building. It would therefore be very beneficial if the air distribution and dispersion of airborne contaminant levels could be reliably predicted at the design stage so that the problem could be tackled early. There are three broad categories for evaluating dispersion of airborne contaminants namely measurement, mathematical models and computer simulation. This study used all three approaches to compare the results obtained. Field data collected from a laboratory consisting of a tray of evaporating solvent were compared with Computational Fluid Dynamics (CFD) simulations. The results indicated that mathematical models of evaporation could be coupled with CFD simulations to produce reasonably accurate predictions of airborne contaminant levels. The airflow pattern within the room is primarily determined by the room layout and the position of the air supply diffusers. Variations in ventilation rate did not alter the airflow pattern thus generating a characteristic concentration profile of the airborne contaminants. The simulations also showed that the volatility and the molecular weight of the solvent affected the concentration levels but not the concentration profile.

1

Introduction

Arising from the increasing trend of excessive exposure to solvents, more factories and industrial hygiene (IH) professionals are realising the importance of techniques for predicting exposure levels. Such a technique would allow IH professionals to adopt a more quantitative approach in hazard anticipation at the design stage of a new process or during the planning stage of a process. Such an intervention at the earliest possible stage could also result in significant cost savings when compared with the more expensive alternative of retrofitting the process during operations. Many workers have studied various approaches for evaluating the dispersion of airborne contaminants and indoor air quality. Generally, these approaches can be classified into three broad categories namely direct measurement, use of empirical models and computer simulation. In spite of its potential, CFD remains limited in its use in day-today airborne contaminant evaluation. Perhaps one of its biggest drawbacks is the need for reliable source strength estimations. However, it would be practically impossible to create a large enough set of emission rates that would be sufficient to cover the almost infinite permutation of processes and conditions faced by engineers.

159

160 For CFD to be of greater use, generalised and robust source models could be coupled with CFD simulations that would generate airborne concentration levels of the contaminant. 2

Methods

In this study, isopropyl alcohol (IPA) is used as a proxy to evaluate the airflow patterns in an indoor work environment. IPA is used here in liquid form and allowed to evaporate from a tray. The tray of IPA can thus be treated as an open pool of evaporating solvent.

2. /

Mathematical Model

The evaporation rate calculated by Hummel et al. (1996) is given as:[Eq. 1] 0.01375*MW*VP \DABVX 2, Evap.rate{g I sec cm ) = J— is - JL T V Ax Where MW is the molecular weight of evaporating liquid, VP is the vapour pressure (atm), T is the temperature (K), DAB is the diffusion coefficient of liquid vapour in air (cm2/s), Vx is the air velocity (cm/s) and Ax is the pool length along flow direction (cm). Equation (1) is generalised for dilute systems in a flowing air stream. Inserting the Chapman-Enskog kinetic theory into the equation gave the following expression. [Eq. 2] where p is the overall pressure in atmospheres. 8.79 x 1(T5 ( W ) 0 B ! (VP) x (—!— + —)' MW 29 Evap.mte(g I cm -s)= ^ 1

In addition to equation (2), this study also considered two other evaporation rate equations that are described below. Braun Equation: Based on regression analysis for a selected set of chemicals.[Eq. 3] Evap. rate (g/s.cm2) = 1.47 x 10"6 (MW)(VP)(V°625) Mckav's Equation: used by US EPA (Mackay et al, 1973) [Eq. 4] 2.2xl0"4(v™iMW% \vP) Evap.rate(g I sec cm2) =

2.2

Experimental Setup

Field measurements are conducted in an actual laboratory within a building that is airconditioned using a central air-conditioning system. The room is basically rectangular in shape with a height of 2.1 meter. The sampling grid consists of a total of 20 points distributed in a 5 by 4 configuration in the room. Each of these points in turn has sampling points located at 3 levels (0.5m, 0.75m and 1.55m).

161

2.3

CFD Problem Setup

In this study, the CFD simulations were carried out using a commercial, multipurpose CFD code called "Star-CD". The simulations were conducted using a set of IBM Beowulf PC cluster configured in the form of DDDM (distributed data and distributed memory) running STAR-HPC (a version of the STAR-CD code designed specifically for running on parallel computers and workstation clusters). The Beowulf cluster was built by combining 16 client nodes with identical configurations together with a master node via Ethernet cards. 3

Results

A total of 18 experimental runs were carried out to ascertain the dispersion of IPA in the room. Each of these experimental runs was conducted under varying air supply flow rates (10 to 25 m3/min). Upon examination of the concentration profile in Figure 1 (air supply rate = 25 mVmin), It was noted the concentration in the room was not evenly distributed which resulted in the formation of four peaks near sampling points 2, 8, 13 and 17. This concentration profile is evident through practically all the sampling runs that were conducted. Figure 2 gives an example of the contour plots that were generated. When these simulation results (with the three source models) were compared with the field measurement data, it could be clearly seen from Figures 3 and 4 that the concentration profile generated from equation 4 underestimated the field data whilst the profiles generated by equations 2 and 3 overestimated the field data. 4

Conclusion

It could be concluded that computational fluid dynamics (CFD) coupled with source models for evaporating solvents could be successfully used to model the dispersion of airborne contaminants. The results have indicated a consistent concentration profile in the test laboratory where this study was based. Furthermore, it has also shown that the CFD simulations were at least as accurate (within an order of magnitude) as traditional field measurement techniques. It appears from this work that when carrying out CFD simulations, equation (4) should be used to produce the lower limits whilst either equations (2) or (3) should be used to produce the upper limits of the airborne contaminant concentration range. Acknowledgements We thank Ng Lee Penn and Oei Hun Ping for their assistance in field data collection and the Occupational Health Department for giving us the opportunity to conduct this study.

162

References 1. Andersson, I.M. and Alenius, S. A Comparison between Measured and Numerically Calculated Styrene Concentrations in Hand Lay-up Moulding, Occupational Hygiene, 3,pp.399-415. 1996. 2. Bemer, D., Dessagne, J.M. and Aubertin, G. Evaluation of the Emission Rate from a Gaseous Source: Development of a Method Using Helium Tracer, American Industrial Hygiene Association Journal, 60, pp.354-362. 1999. 3. Hummel, A.A., Braun, K.O., Catherine, M. and Fehrenbacher, M.C. Evaporation of a Liquid in a Flowing Airstream, American Industrial Hygiene Association Journal, 57, pp.519-525. 1996. 4. Mackay, D. and Matsugu, R.S. Evaporation Rates of Liquid Hydrocarbon Spills on Land and Water, Canadian Journal of Chemical Engineering, 51, pp.424, 1973. Appendix

L ,

J

'*

Figure 1. Concentration profile obtained from field Figure 2. IPA concentration contour plot for C19 measurements and CFD simulation for C19 boundary boundary conditions at 1.55m conditions.

-Mackay •--*

Braun -

J 300 E

Sample Position Numbor

Figure 3. Comparison between the source models from equations 2, 3 and 4 used for simulating airborne D?A concentration at a height of 1.55m in the room.

Figure 3. Comparison between DPA concentration obtained from simulation and field measurements at a height of 1.55m in the room,

NUMERICAL SIMULATION OF LOW REYNOLDS NUMBER CHANNEL FLOW OVER DIMPLED SURFACE Z.WANG*, K.S.YEO** AND B.C.KHOO** *Temasek Laboratories, National University of Singapore, Singapore 117576 E-mail: tslwz(3).nus. edu. se: **Department of Mechanical Engineering, National University of Singapore, Singapore 119260 E-mail: mveveoks(a)nus. edu.ss: mpekbc(a>,nus.edu.ss A parallel multiblock unsteady incompressible flow solver has been developed based on the fractional step method similar to Kiris & Kwak [3]. The low Reynolds number laminar channel flow over dimpled surface has been carried out using this fractional step flow solver. Computational results reveal the complex 3D flow structure inside a dimple. Flow structures inside dimple change rapidly with the increase of Reynolds number. These results confirm the observations of Grovmov et al [4] and promote our understanding of the evolutions of 3D flow structures inside dimples.

1

Introduction

Dimples on a flat surface may generate complex unsteady flow structures that have a profound effect on the development of the external flow. Significantly, the vortices that spring and shed periodically from such dimples may drastically alter heat and momentum transfer in the external flow [1,2]. To promote our understanding of the evolutions of 3D flow structures inside dimples, the numerical simulation of laminar channel flow over dimpled surface is performed and reported in this paper. 2

Methods

For incompressible flow, momentum and mass conservation law can be written as, duk

|

8(uku,)

dt

=

dp k

dx'

dx

1

|

d\

Re dx'dx1

^ r =0 dx"

(2)

A two stage fractional step algorithm based on a pressure-correction method is used. Time is advanced by a second-order backward scheme, 3«;-4«;-mr1 , g(»X) 2A/ dx' d dn k

1 S2uk Re dx'dx'

3 du\

=

dp" dxk

(3)

(4)

k

dx dx ~ 2At dxk

163

164

where, m+,

* 2At dn 3 dxk

pm+l=pm+K

(6)

One may find that this method is very much like the fully implicit method developed by

3

Kiris & Kwak [3] except the different coefficient — used in Eq (4). Here, we emphasize 3 that omitting this coefficient — may cause the change of pressure derivative from 8pm+1 3 dpm+l 1 dpm 3 —t—r- to —7— + —£-r- if we multiply Eq(5) in Kiris & Kwak [3] with - and dxk 2 dxk 2dxk 2 add Eq(4). This change may perhaps affect the stability of the method. Based on aforementioned fractional step method, a parallel multiblock unsteady flow solver has been written. Second-order central-difference scheme is adopted in our fractional step code. Momentum equations Eq(3) are integrated using ADI method. While the pressure correction equation Eq(4) is solved using geometric multigrid solver [5]. This code has been tested and shown to be second-order accurate in both space and time.

3

Computational Results and Discussions

Nonuniform grid clustering in both wall direction and streamwise directions is generated to place more mesh points around dimple (Fig. 1). Grid independent test shows grid number 121X61X81 is adequate to accurately resolve flow structure inside the dimple. In this study, channel height (H) and maximum streamwise velocity (umax) are chosen as reference length and velocity, respectively. Printing diameter (D) of the dimple is the same as the height of the channel. Depth of dimple (h) is 1/5 of D, The Reynolds numberis thus given by ReD =ReH= umax H/v. For channel flows over single dimpled surface, the steady solutions have been achieved up to ReD=4500. Flow structures inside dimple change tremendously with the increase of Reynolds number (Fig. 2a-d). When ReD <50, the flow simply 'dives' into dimple without causing separation. When 50 < Reo < 380, a separation zone on front bottom of dimple can be observed. Flow slightly changes its direction during its passage inside the dimple. The length of this separation zone increases with increase of Reynolds number. The separation zone occupies the whold dimple when the Re number approachs about 380. At this Re number, two vortices start to form from the two side edges of the dimple. With further increase in Re number, the centers of the two vortices move towards one another. These results confirm the observations of Grovmov et al [4]. Additionally, the iso-surfaces of pressure and total vorticity are also presented in Fig.3 to show the 3D flow structure inside a single dimple. Currently, simulations of developing channel flow over multiple-dimples are being performed using grid number up to 1.68X107.

165

Sectionl:L,=f5/12L,)

Section 2: U=(2/r2L)

Section3: U=(5/12L)

Fig. 1 Nonuniform grid for channel flow over single dimple

— :::_;:.:r r.: i^ta SsisE::*—- ~ •:" f *; : ~ I:r :::::::~ •zzzzzz^rzzi?]' :iu^=-.rvr .,;'SK3: : : : : : : • "

• : :

•

:-

.4:i^^r r=

. : : . . _ - •

:i

Ti** 1 .!

:::::—::.-..-—-._;::nrm::::::;:::^-::;:-::: s==:yj==2iggr.: ias^ r^jjit:: • ' ^ s s : " -T^r::^=:-~-"y snn=^::^=jS;^=ZI :::::.

^ ~ " "

^g=^:

?••—a»t;{li-

: . « « — : ; : : : ;: « • =:•:

, , . ,

:

:—:::

—

0.6 =!

EE;rrrr™:T~i=:EEEE:iE:?=E: X

(a)

(b)

MHM

(c) (d) Fig. 2 Vortex structures near bottom of single dimple for different Reynolds number (a) Re=38, (b) Re=100, (c) Re=720 and (d) Re=2000.

166

4

Acknowledgements

We thank Professor Ligrani P.M. and Professor Shih T.I-P. for sharing with us their collections of dimple papers. References 1. Ligrani P.M., Harrison J.L., Mahmmod G.I., Hill M.L., Flow structure due to dimple depressions on a channel surface, Physics of Fluids, vol. 13., no.l 1, (2001), pp. 1065. 2. Lin Y.L., Shih T.I-P., Chyu M.K., Computations of flow and heat transfer in a channel with rows of hemispherical cavities, ASME paper, (1999), 99-GT-263. 3. Kiris C, Kwak D., Numerical solution of incompressible Navier-stokes equations using afractional-stepapproach, Computers & Fluids, vol.30, (2001), pp. 829. 4. Gromov P.R., Zobnin A.B., Rabinovich M.I., Sushchik M.M., Creation of solitary vortices in a flow around shallow spherical depressions, Sov. Tech. Phys. Lett., 12. (11), (1986), pp.547. 5. Wesseling P., Oosterlee C.W., Geometric multigrid with applications to computational fluid dynamics, Journal of Computational and Applied Mathematics, vol.128,(2001), pp.311

CFD MODELLING OF PARTICLE TRANSPORT AND BIOLOGICAL REACTIONS IN A MIXED WASTEWATER TREATMENT VESSEL MATTHEW BRANNOCK, TONY HOWES AND MIKE JOHNS School of Engineering,

The University of Queensland, Queensland, 4072, E-mail: [email protected]

Australia

BOB DE CLERCQ Department of Applied Mathematics, Biometrics and Process Control, Ghent University, Belgium E-mail: [email protected]

B-9000,

JURG KELLER Advanced Wastewater Management

Centre, The University of Queensland, Queensland, Australia E-mail: [email protected]

4072,

The biological reactions utilised to remove pollutants from the wastewater occur primarily within socalled bio-catalytic particles. These biocatalysts are suspended and subsequently transported, along with soluble reactants, through the reactor vessels. The transport of particles and reactants, and the resulting pollutant removal rates, are governed by the hydrodynamics, which is determined by the reactor design. Current methods of designing treatment vessels are largely based on empirical and heuristic techniques. This makes it difficult to predict how vessel design (e.g. size and position of inlets, baffles or mixers) affects hydrodynamics, hence overall performance. Computational Fluid Dynamics (CFD) provides a method for prediction of the effect these features have on the hydrodynamics from a fundamental level. The CFD model developed here couples hydrodynamics with scalar transport equations that predict particle and reactant transport. Comparison of the CFD model with an extensive set of point velocity measurements and tracer studies is made.

1

Introduction

Current methods of designing wastewater treatment vessels are largely based on empirical and heuristic techniques. This makes it difficult to predict how vessel design (e.g. size and position of inlets, baffles or mixers) affects hydrodynamics, hence overall performance. Computational Fluid Dynamics (CFD) provides a method for prediction of the effect these features have on the hydrodynamics from a fundamental level. The biological reactions utilised to remove pollutants from the wastewater occur primarily within bio-catalytic particles. The CFD model developed here couples the hydrodynamics with scalar transport equations that predict particle and reactant transport. Biochemical reactions are included as source/sink terms in the latter equations. A real-scale system is used for calibration and validation of the model. The system used for this purpose is an anoxic section of a bioreactor located at the Luggage Point Wastewater Treatment Facility, Brisbane, Australia. In this part of the reactor nitrate is biologically converted to nitrogen gas, through consumption of soluble carbon species. The dimensions of the channel are shown in Figure 1. It is mixed by three 1.5kW Flygtt propeller-like mixers directed about 25 degrees clock-wise from the channel's longitudinal axis. Each mixer has propeller blades 0.3 metres in diameter and are positioned 0.5 metres above the channel floor.

167

168

Inlet 1 (Main inlet)'

Figure 1. Diagram of wastewater treatment channel being modelled.

2

Methods

The CFD model is composed of the core hydrodynamic model along with the sludge and species, i.e. reactants, transport models. Figure 2 below outlines how the different fields of knowledge are interconnected in the model. CFD Model of a Wastewater Vessel

*""~

"*' Hydrodynamics

-«

"*~~~--^ Mixer Characterisation

Sludge Transport

^_^ Boundary Conditions and Operating

Species Transport

,—-~^

.

Sources & Sinks (Biological Reactions)

Figure 2. Schematic of model components showing influences. All these transport processes can be described by a general conservation equation [1]:

_(*) +,-=!'£_(^,) = g_

+s

where

) includes sedimentation, i.e. incorporation of

169

a settling flux within the vertical convection term. Due to high concentrations of sludge the so-called hindered particle settling velocity can be described by the Vesilind function [3]: us =uoe where us is the settling velocity, u0 is the maximum settling velocity, rh is the hindered settling zone parameter and Cs is the local sludge concentration. The necessary parameters were experimentally determined (data not shown). The species transport model describes the movement of the soluble carbon substrate, and nitrate through the system. These species are the most relevant under anoxic conditions. Again the general conservation equation is used where the transport quantity

Results

The first validation exercise compares fluid velocities. The determination coefficient R2 for the magnitude of velocities between the CFD model and the real system is 0.74. This value is good considering experimental error and model limitations due to process simplification. Other similar CFD studies show coefficients between measured and simulated velocities that are much lower. In this respect, Peterson [5] only obtained a determination coefficient of 0.52. To validate the flow field, the residence time distribution (RTD) can be used also. The latter expresses the frequency of fluid elements spending different periods of time inside the vessel. Experiments were conducted in twofold to measure the RTD; the nitrate recycle was turned off. A good comparison is found between the simulated and experimentally determined RTDs. The correlation coefficient was greater than 0.94. To demonstrate that the CFD model is sensitive to channel operation, Figure 3 shows RTD simulations for different cases.

Simulation - same config as experiments

'_ O

', fit

^fc

A

Simulation - mixers turned off

|'TN I

Experiment 2: R2 = 0.946

Simulation - nitrate recycle on

'•> ••' V ^

Non-dimsnslonal time (?)

Figure 3. Comparison between measured and simulated RTD results (maximum of the 'mixer turned off simulation at (t',C) = (0.22,2.7) not shown). The validated CFD model can be subsequently used for an investigation into how design affects channel performance. The results shown in Figure 4 demonstrate how turning off Mixer 1 might affect sludge suspension. The conversion of nitrate along the length of the channel is shown also.

170

* Recycle Nitrate 1 mg/L • Recycle Nitrate 2 mg/L

•

A Recycle Nitrate 5 mg/L

•

n a n

D

a D a [

o Recycle Nitrate 10 mg/l •

D

• A

A A A

. A A A A ,, ,

A A

• • : • • m

* * -

At A A

o o o

35 ° °

0 u u

"

°

°

a A

0.0

5.0

10.0

15.0

20.0

25.0

Reactor Length Position (m)

Figure 4. a) Iso-surface of an arbitrary sludge concentration of 3.05g/L and b) Nitrate Conversion along the length of the reactor at different recycle inlet nitrate concentrations.

From Figure 4a) it is evident that Mixer 1 being inoperable encourages a build up of sludge in the first section. This part of the channel is largely stagnant and proper suspension of biocatalyst particles does not take place. Figure 4b) shows that the inlet area is 'well-mixed', after which a channeling effect takes place in the stagnant zone. This first section appears to act like a plug-flow reactor (due to channeling next to a stagnant zone) where most consumption of nitrate takes place until it becomes rate limited in the second section. 4

Conclusions

The results, from both the point velocity and RTD comparisons, indicate that the presented CFD model is able to simulate realistic conditions. It is now possible to investigate how vessel design affects the hydrodynamics quickly and easily through the use of the developed CFD model. Acknowledgements The Authors wish to thank Brisbane Water for a number of reasons. Firstly, for making available tracer studies they performed at their facility, secondly, for allowing access to their Luggage Point facility for further validation experiments, and thirdly, for their support during these experimental undertakings. References 1. S. V. Patankar, Numerical heat transfer and fluid flow, xiii, 197 p, Hemisphere Pub. Corp.; New York : McGraw-Hill, Washington, 1980. 2. FLUENT, FLUENT 5 User's Guide, Fluent Incorporated, Lebanon, USA, 1998. 3. P. A. Vesilind, "Theoretical considerations: Design of prototype thickeners from batch settling tests," Water and Sewage Works, 115, 1968. 4. M. Henze, C. Grady Jr, W. Gujer, T. Marias and T. Matsuo, Activated Sludge Model No. 1, Arrowsmifh Ltd, Bristol, 1987. 5. E. L. Peterson, J. A. Harris and L. C. Wadhwa, "CFD modelling pond dynamic processes," Aquacultural Engineering, 23, 61-93, 2000.

A COMPUTER ANALYSIS OF TURBULENT FLOW OF ACID IN THE PICKLING OF STEEL STRIPS B.C. KHOO National University of Singapore, 10 Kent Ridge Cres., Singapore 119260, mpekbc @ nus. edu.se

E-mail:

D.XU Institute of High Performance Computing, Science Park Road, #01-01 The Capricorn, 117528, E-mail: [email protected].

Singapore

W.Y.D. YUEN BHP Steel Research Laboratories, PO Box 202, Port Kembla, NSW 2505, Australia, Daniel. Yuen @ bhpsteel. com

E-mail:

W.K. SOH University ofWollongong,

NSW 2522, Australia, E-mail:

wksoh@

now.edit.au

This paper presents computational results from a realistic three-dimensional computer model for the flow of acid in a pickling tank where acid is used to remove oxides scale on the steel strip. Special focus will be placed on the kinetic energy of the turbulent flow around the moving strip. The distribution of the turbulent kinetic energy, as an indication of the pickling rate, is analysed. Various processing conditions will be presented, including the effects of strip vibrations on the turbulent flow.

1

Introduction

In steel processing, after the steel strip has undergone hot rolling, the oxides (wustite, magnetite and haematite) on the strip surface are removed by acid in a process known as "pickling". A continuous strip passes through tanks of hydrochloric acid solution whereby the oxides on the strip surface are removed through chemical and electro-chemical reactions (e.g. see Chen & Yuen [1]). There are many factors that govern the rate of oxide removal. Chief among these is the extent of cracks developed on the oxide layers, as well as the turbulence level of the pickling solution generated on the strip surface. Cracks in the oxide layers will increase the area of contact between the oxides and the acid, and allow the acid solution to penetrate the oxide layer to dissolve the scale at the oxide/substrate interface ~ a mechanism known as "undercutting" as was discussed by Goode et al. [2]. In an earlier study, Yamaguchi et al. [3] concluded that the efficiency of the oxide removal can be correlated to the heat transfer coefficient and the turbulence level of the acid around the strip. It is postulated that the turbulent kinetic energy of the pickling fluid around the

171

172

strip surface, thus promoting the chemical reactions to dissolve the oxide scales. The acid flow around a moving strip is a class of fluid mechanics problem that involves a moving boundary. In a simplified flow situation, the Couette flow, Afzal [4] has presented a detailed analysis on the turbulent structure of a straight moving wall. On the other hand, Warnatz [5] has presented a computer technique for the simulation of chemical reaction in the presence of turbulent flow. In addition, computer study of the pickling process has been carried out by Yamaguchi et al. [3] and Dornemann [6]. It must be noted that these computer models are insufficient for detail studies.. This paper discusses a realistic three-dimensional flow model for the evaluation of the pickling performances in a deep pickling tank. The work has been carried out through a collaboration between IHPC, NUS, BHP Steel Research Laboratories and UoW. Special focus will be placed on the distributions of the kinetic energy of the turbulent flow around the moving strip as an indication of the pickling efficiency. Various processing conditions will be considered, including the effects of strip vibrations on the turbulent flow. 2

The Computer Method

For deep tank pickling, the tank has a typical dimension of 25 m in length, 2.0 m in width and 1.5 m in depth. The strip enters and exits from the top at the ends of the tank. Between the supports the strip forms a catenary as it hangs down into the tank. Since the length to depth ratio (25m to 1.5m) is large, the shape of the strip is conveniently represented by one half of a sine wave. The commercial software, FLUENT, is employed to model both steady (time-independent) and unsteady (time-dependent) turbulent flows. The FLUENT models fluid flow by solving a set of conservation equations (governing equations). The pickling solution is considered as incompressible, Newtonian fluid. The governing equations being solved are briefly discussed below: The conservation equations for mass and momentum are given by: £ ^ = 0 and « ^ . 3 (P+2 \ a I { M + ^ J*i

J

t>xj

pdxX

3y

) dXi[K

'\dXj

a*,Jj

where U is the velocity vector; p is the density, P is the pressure, v is the kinematic viscosity and vt is the turbulent kinematic viscosity which can

173

be calculated from vt = C^k2 Is, where k and e are the turbulent kinetic energy and the dissipation rate of turbulent kinetic energy respectively. These parameters are determined from the turbulence model. In the present study the well-known k-e turbulence model is used. The turbulent kinetic energy and its dissipation rate are given by: du, Ui dxj

U;

de

v+-

dXi

3*

v+—

Ok)

V

and 2 ay.- Ce2 r e

d7~ T The boundary conditions are given as below: On all walls of the tank. Uw = 0, Tw = 363K and Tslrip = 293K dxi

)°Xi]

k

dxj

dXi

Note also that the well-known wall function is employed to determine the values of turbulence variables on the walls. The fluid velocity on the strip surface is the same as the strip, which will be given later in the case studies. In the present study, there is a symmetric plane which divides the pickling tank into two identical halves. A linear boundary condition is applied on the free surface of the acid solution. Calculations were performed for various configurations: inlet velocity ranges of the acid ranges from 0.0 m/s to 1.0 m/s; strip speed varies from 0.0 m/s to 10.0 m/s; frequencies of strip vibration from 0.0 Hz to 40 Hz with amplitudes of up to 1.0 m. The inlet turbulent kinetic energy and dissipation rate of turbulent kinetic energy are set to j. and „ I " into If _

= c'JAkV2n respectively, where / is the ratio of turbulent velocity to the average free stream velocity and £ is the characteristic length. 3

Results and Discussion

It was found that the inlet velocity of the acid into the tank, which is low in general, has no significant effects on the flow properties on the strip. Figure 1 shows a sample of the results comparing the flow fields with cases where the strip vibrates with a frequency of 40 Hz and that without vibration. One of the important considerations in pickling is the distribution of the turbulent kinetic energy on the steel strip. As the turbulent kinetic energy is directly related to the pickling rate (rate of oxides removal), the

174

efficiency of the pickling process can be measured by the distribution of the lengthwise integration of the turbulent kinetic energy across the strip width. Figure 3 shows the width-wise distributions of the integrated turbulent kinetic energy for two strip speeds: 1.0 m/s and 10.0 m/s with and without strip vibrations. :

9.50B-O2

9.000-02^ 8.50B-02 "•• " 8.0QB-D2

: 7.508-02 :

7.00a-02

" 6.508-O2 6.00e-02 5.508-02 : 5.009-02 : 4.508-02 : 4.00B-02

~ 1.300*00 1208+00 1.108400

: i.oo»*oo : 3j»s-oi : 8.oc*-oi

~ 3.S08-02

" 7.01 B - 0 1

3,008-02

6.01 e - 0 1

"

2509-02

" 5.01 B-01

:

Z.OOa-02

:

:

1.50e-02

:

4.oiB~oi 3.016-01

~ 2.01 B-01 -

1 ; OlB-01

" B.06B-D4

•

i.ooe-oa

"

S.00e-03

" 4.048-06

V

' C o n t o u r s o l Turbulent Kinetic Energy (k) (m2/s2)

Velocity V e c t o r s C o l o r e d By Velocity M a g n i t u d e (nVs)

Figure 1 velocity vector and turbulent kinetic energy of steady flow at v= lOm/s

2.00e«O0

i.OOe-01

1.908+00

9.508-02

1.80e+00

9.008-02^

1.708+00

8 , 5 0 8 - 0 2 '"•-

1.608+00

a.00a-02

1.50s+00

7.508-02

1,408+00

7.008-02

1.308+00

6.508-02

1.208+00

6.008-02

flwrw

5.508-02

1.10B+00 1.008+00

S.OOe-02

e.00e-01

4.SQ8-02

B.00B-01

4.008-02

7.008-01

3.508-02

B.ooe-01

3.008-02

S.OQe-OI

2.508-02

4.008-01

2.008-02

3.008-01

1.508-02

2.008-01

1.008-02

1.008-01 2.51B-04.

t

5.008-03 Y

3.248-06

Y

C o n t o u r s of T u r b u l e n t Kinetic E n e r g y ( k ) ( m 2 / s 2 ) { T i m e - 1 . 4 5 1 2 e + 0 l ) 0021,2002 F L U E N T 6.0 (3d, segregated, s k e , unsteady)

Figure 2 velocity vector and turbulent kinetic energy of unsteady flow at v= lOm/s, f=40 Hz

From the plots in Figure 2, it is evident that the increase of the strip speed form 1.0 m/s to 10.0 m/s has a stronger contribution to the increase of the turbulent kinetic energy than the vibration in the trip. On the other hand, the distribution of turbulent kinetic energy at higher strip speed (10.0 m/s) is non-uniform, in contrast to the distribution at low strip speed. This non-uniform distribution of the turbulent kinetic energy could lead to nonuniform pickling across the strip and is an undesirable operating condition. The effects of the strip speed and other factors such as temperature, heat and mass transfer on the strip, which are not discussed here, are the parameters that will be investigated in the computer study to achieve an optimized operating condition for the pickling process.

175

10'

f

0

0

0

0

o

9

© 8

9

©

L 41 10<>

0 0 0 A

A

0 o 0

2 oA

A

A

&

A

A

A 10'

,

0

i

0.2

.

•

1 >

,

0.4

i

,

0.6

,

i

0.8

i

,

i , . ,

1

_2xiuj
Figure 2 The distribution of turbulent kinetic energy along the width of the strip. O: 1.0 m/s strip speed; O: 10.0 m/s ; A : 1.0 m/s at 40 Hz; and • : 10 m/s at 40 Hz.

Acknowledgement The authors thank the managements of IHPC and BHP Steel Research Laboratories for the permission to publish the materials contain in the paper. References 1. Chen, .Y. & Yuen, w.Y.D., On the pickling mechanism of hot rolled steel strip, 44th MWSP Conf. Proc. I.S.S, XL (2002) pp. 1077-1089. 2. Goode, B. J., Jones, R. D. & Howells, J. N. H., Kinetics of pickling of low carbon steel Ironmaking and Steelmaking, 23 2 (1996), pp 164 170. 3. Yamaguchi, S., Yoshida, T. & Saito, T., Improvement in descaling of hot strip by hydrochloric acid, 75// International 34 (1994) pp. 670678. 4. Afzal, N., Asymptotic analysis of turbulence Couette flow, Fluid Dynamics Research 12 (1993) pp. 163-171. 5. Warnatz, J., Modelling and simulation of reacting flows including detailed chemical reaction, Computational Fluid Dynamics Review (1995) pp. 715-730. 6. Dornemann, A., Model for turbulence pickling lines, Metallurgical Plant and Technology International 2 (1997) pp.70-75.

FINITE ELEMENT ANALYSIS OF NON LINEAR FLUID STRUCTURE INTERACTION IN HYDRODYNAMICS USING MIXED LAGRANGIANEULERIAN METHOD N.M. SUDHARSAN, K. MURALI AND KURICHI KUMAR Institute of High Performance Computing, I, Science Park Road, #01-01 The Singapore Science Park II, Singapore II Singapore 117528. E-mail: natteri @ ihpc. as tar. edu. sg

Capricorn,

Fluid structure Interaction (FSI) is an important area of research. FSI capabilities can be used in the design of: (1) artificial valves and pumps for hearts (cardio-vascular dynamics); (2) Underwater/ Offshore structures; (3) Aircraft wings and turbine blade designs; (4) Design of tall structures/buildings; (5) high speed hard disk drives. At present, such capabilities are being incorporated, if any, only to a limited extent in commercial software leaving the users to develop their own code for FSI problems. It is therefore necessary to develop simulation methods for solving important FSI problems. Traditionally, the hydrodynamics and the structural dynamics are solved using finite difference/boundary element methods and finite element method (FEM) respectively. Moreover, the fluid equations are solved using a fixed or Eulerian reference frame and the structural equations are typically solved using a moving or Lagrangian reference frame. In this research the fluid and structural equations are both solved using the finite element method using a mixed Eulerian-Lagrangian scheme, where, fluid mesh moves and adapts to new free surface and structural positions.

1

Introduction

In this work the fluid structure interaction (FSI) of a thin plate subjected to a sinusoidal fluid flow with a non-linear free surface is considered. Such problems are of practical interest in offshore structures. The forces exerted by the fluid on the structure causes a structural response, and conversely there is a change in the fluid forces due to the new structural position. Solutions to these problems are of interest due to the coupled nature of the fluid-structure system. In the sequential coupling, the fluid forces are computed at the initial time step and the structural response is obtained due to these forces. The new change in structural position appears as a boundary condition for the fluid solver in the next time step. Moreover the presence of a non-linear free surface adds complexity to the computation. The fluid domain is traditionally solved in an Eulerian framework and the structural domain in a Lagrangian framework. A mixed LagrangianEulerian method is made use of for solving the FSI problem. The fluid pressures are obtained in an Eulerian framework and the structural position and the new free surface position is computed in a Lagrangian Framework. A finite element formulation is chosen to solve the coupled system. 2

Fluid Structure Interaction

The Fluid-structure problems are generally solved using two types of formulations. They are the monolithic formulation and the sequential formulation. In monolithic formulation, the fluid and the structural equations are combined and the resulting equations are solved by an iterative method. In a sequential formulation the fluid and structural solutions are obtained in a staggered manner in time. A review of various

176

177 formulations is available [1]. Everstine [2] suggested a velocity potential formulation in which the velocity potential is the nodal variable for the fluid and displacement for the structure. In the present work the above formulation is used to solve the FSI problem sequentially, with constant updating of the boundary conditions. The fluid - structure interface have common nodes for transfer of the nodal unknowns. This interface satisfies the boundary condition dtj>/ax = BU_x/dtand d<j>/dy = dU_v / d t . This condition ensures that the velocity of the fluid particle is equal to the structural displacement (U) at a given time. The mesh quality is also checked periodically and smoothed. 3

Fluid Equation

The fluid is assumed to be incompressible and inviscid, and the flow is assumed to be irrotational. This simplifies the hydrodynamic problem to solving Laplace equation involving a velocity potential <|) as given below: V2=0 (1) 3.1

Boundary Conditions:

Figure 1 shows a schematic of the computational domain with a structure placed vertically at an arbitrary location. The tank flume length (L) is 40 times the water depth (h). The structure is placed \2h from the left-hand side boundary and is modeled as a beam of unit width. The domain is bounded by a body surface (50), a free surface (Sy), a sea-bed (Sh) and two radiation boundaries (5/). The boundary conditions on the free surface Sf can be written in Lagrangian form as follows: ^T = sn-^V0V0-v(*)0 dt 2 dx dti dy dd>

(2a)

-r=ir>j=-r-v{x)y-

(2b)

dt ax dt dy Where, g is the acceleration due to gravity and T\ is the free surface elevation (the vertical y-axis originates from the free-surface). It should be noted that a damping term v has been added to incorporate an absorbing boundary condition on the right-hand side based on Tanizawa [3]. Clement [4] suggested the coupling of a piston type boundary condition, which would help in absorbing the low frequency waves. The implementation of the absorbing piston boundary condition has also been presented by Grilli et al. [5]. A modified approach is considered here. The boundary equation is written in the mixed form as d/dn = M(t + At) + S. Here, M represents the coefficients multiplying the unknown potential at the future time (analogous to a source/sink term) and S represents the known values corresponding to the previous time interval (velocity term). This implicit coupling was found to be very stable without numerical disturbance and could be termed as a source-sink boundary rather than a piston boundary condition, moreover the piston is not moved as it is done in [5].

178

Omega (Non-dimenaionaO * 1 Omag* (Non-4inwnak>nal) - 0.5

10

20

30

FLUME LENGTH

Figure 1 Schematic Representation of an FSI problem.

1

Figure 2 Free surface Profile after 4 cycles for various to

* 0.75

0.75

>-

0.5

0.5

0.25

0.25

"l

i ..

11.76

,

i

12 X

J-JL.

1

11.75

2-25

Figure 3 Plate postion at an arbitrary time step

_i

i

i

12 X

i

i

I

-i

i

,225

Figure 4 Plate postion at an arbitrary time step .

Omega (Non-dimonatonal) - 1 Omega (Non-dimwwbnal) • 0,5

Figure 5 Force plot for various co*

179 On the body surface 5 0 , the boundary condition is dtji/dn- fa where,/, is the normal velocity of the body surface and n is the outward normal. The boundary condition on the bed Sf is d<j> I dn = 0 . Finally, the left-hand side boundary has a wave maker whose corresponding Neumann condition is given as (j>x = 0)S0 sin cot assuming displacement 8 =<50(l-cosG)f). 3.2

a

piston

Solution

The non-linear free surface profile obtained without the structure and with a pitching plate matched with those in the literature [6] and [7] respectively. The FSI problem with a plate located 12d from the left hand side is considered. The non-dimensional circular frequencies [6] considered are a/=l and 0.5, and the displacement of the piston 5Q=5% of water depth. The force on the plate is computed based on the unsteady Bernoulli's equation as given in [8]. 4

Results and Discussion

The free surface profile after 4 wave cycles is presented in figure 2. The free surface is affected due to the structural interaction, compared to one without [6]. The profile of the plate at two wave extremes can be seen form figures 3 & 4. The force on the structure is also sinusoidal matching with a peak occurring at each cycle (figure 5). The plate is placed at 12d from the left wall and for a a>*=l the wavelength A=5.2d the plate is 2.3 cycles away. The first peak can be observed around here. Similarly for a>* =0.5, ^=12, the plate is one cycle away and the peak can be found to occur here for this case. Reference 1. Sudharsan, N.M., Murali, K. and Kumar,K., Non-linear Fluid Structure Interaction of an Offshore Structure. Journal of Engineering Mechanics, (under review) 2. Everstine G.C., A Symmetric Potential Formulation for Fluid Structure Interaction, J. Sound and Vibration, 79, (1981) pp. 157-160. 3. Tanizawa, K., Long time fully Non-linear Simulation of a Floating Body Motions with Artificial Damping Zone, Journal of the Society of Naval Architecture of Japan, 180, (1996) pp311-319. 4. Clement,A., Coupling of Two Absorbing Boundary Conditions for 2D Time-Domain Simulations of Free Surface Gravity Waves, Journal of Computational Physics, 126, (1996) pp. 139-151. 5. Grilli,S.T. and Horrillo, J., Numerical Generation and Absorption of Fully Nonlinear Periodic Waves, Journal of Engineering Mechanics, 123(10), (1997) pp.10601069. 6. Wu, G.X. and Taylor, E., Time Stepping Solutions of the Two-dimensional Nonlinear wave radiation problem, Ocean Engineering, 22, (1995) pp. 785-793. 7. Yip,T.L. and Chwang, A.T., Water wave control by a pitching plate, Journal of Engineering Mechanics, ASCE, 123(8), (1997) pp. 800-807 8. Newman, J.N., Marine Hydrodynamics, (1977), MIT Press, Cambridge, M.A., U.S.A.

NUMERICAL STUDY OF FLUID FLOW THROUGH MULTIPLE BELLSHAPED CONSTRICTIONS IN A TUBE LIAO WEI, LI GENGCAI AND TS LEE(*) Mechanical Engineering Department National University of Singapore, Singapore 119260 Fax: (65) 6779 1459; E-mail: [email protected] (*) Corresponding Author

The effects of steady fluid flow through multiple bell-shaped constrictions in tubes were investigated numerically with a boundary mesh refinement method for the Reynolds number range of 5 to 400. The multiple constrictions studied were for constrictions of 1/3, 1/2 and 2/3 of various combinations. A dimensionless constriction spacing of 1.0 was considered. 1.

Introduction

An early numerical work on single constriction in tube was first done by Lee & Fung(1970). Recently, studies were made by Lee et al (2002) on double bell-shaped constriction flows. In this present work, the detailed flow behaviour in a triple bellshaped constricted tubes [Figure 1(a)] are studied. The preliminary findings of the present study on single constriction flow were also compared with available experimental and numerical work of other investigators.

2.

Governing Equations and Numerical Methods

The unsteady governing equations are used to solve for the steady state flow fields considered in this study. Constant fluid properties are assumed. The flow is considered axis-symmetric and laminar. The dimensionless governing equations in the form of stream function-vorticity relations expressed in cylindrical co-ordinates (r, 0, z) for the axissymmetrical flow in the constricted tubes were solved numerically. The governing equation for the pressure solution is given by the Poisson equation. Non slip boundary conditions are used. The inflow conditions are specified and the outflow is assumed fully developed. In the present study, a mesh refinement method [Figure 1(b)] is used for the computations of boundary characteristics of the flow field. In order to provide accuracy numerical solutions and better resolutions for flow field near the constrictions, the physical domain of the bell-shaped constricted tube is first transformed into a rectangular solution domain [Figure 1(b)] using a generalized mapping function. A progressively dense mesh function was adopted near the wall region to carefully track the flow information near the wall region as well as to better capture the shear flow velocity profile at the wall. Other details of the numerical methods used are described in Lee et al (1990, 1994, 2002) and will not be repeated here.

180

181

MTVjij^ljsisiiwiAis*}*

F^are I M altipte Bgfl-Sft8j>fti Cmstridiafti fa n T»t*

Iff?

1

t

1

^a

IT •

.^3

.

/WW*

IL^_

VfejT ->

^ ./^..y^^....

-/^ Wl(.Ws.l4(^l

Figure 2 (i) Vdocftjr «
182

:

,

:

,

.

•

,

:

*

yyy-:y -Iv..">:^.-" G ^ l " ; : : 3; ;:' :5%?':-;:|-H-^:^>::

.ii^i^;$^

•4 '$lo#1s€^r»tlaB - ^ 4 " j ^ t i ^ i H ! i e i i ^ •'.•'

,

183 3.

Results and Conclusions

Characteristics of (i) velocity vectors, (ii) streamline contours, (iii) pressure contours, (iv) centreline and wall pressure distribution, (v) vorticity contours, (vi) wall vorticity distribution and their associated recirculating flow region, maximum wall vorticity and pressure losses characteristics are investigated for Case(a): Case(b): cx = 2 / 3 , c 2 = l / 2 , c 3 = 1 / 3

and Case(c): cx =l/3,c2

c, = c2 = c 3 = 1 / 2 , =l/2,c3

=2/3

respectively for Re=5 to 400.. Sample of the results are shown in Figure 2. Most of the existing related analytical, numerical and experimental studies are for fluid flow through single constrictions. Very few considered the influence of the upstream constriction on the flow field near the downstream constriction. Thus, for the purpose of validations, the present model was initially analysed for single constriction flow. Comparisons of the present method of computations with the experimental and numerical work of others showed very good agreement [Figure 3]. Figure 4 shows the separation and reattachment points of the recirculation eddies formed downstream of each of the constrictions for different constriction ratios. It can be seen that when the Reynolds number is increased, the separation point on the surface of the constriction where the recirculation eddy begins to form, moves slightly upstream of the throat. The reattachment point where the recirculation eddy terminates on the surface of the constricted tube, spread downstream of the throat. When a steady recirculation region is established between the two constrictions, there is then little change to the separation and reattachment points for the flow between the valley region. However, the reattachment point of the downstream constriction spread further as the Reynolds number is increase and will eventually approaches that of a single constriction corresponding to S=~.

References 1. Lee J.S. and Fung Y.C., "Flow in locally constricted tubes at low Reynolds Numbers", J. Appl. Mech. Vol. 37,1970, p. 9-16. 2. Lee, TS, "Numerical Studies of Fluid Flow Through Tubes with Double 3. Constrictions.", International Journal for Numerical Methods in Fluids. Vol.11, 1990, pp.1113-1126. 4. Lee, TS, "A False Transient Approach to Steady State Solution of Fluid Row Through Vascular Constrictions", Journal of Computational Mechanics. Vol.7, pp.269-277, 1991. 5. Lee, TS and Low, HT, "Separation and Reattachment of Fluid Flow Through Series Vascular Constriction", Journal of Finite Elements in Analysis and Design. Vol.18, Dec. 1994. pp.365377 6. Lee, TS, "Steady Laminar Fluid Flow Through Variable Constrictions in Vascular Tubes", ASME Journal of Fluids Engineering, March 1994, Vol.116, pp.66-71. 7. Lee, TS "Numerical Study of Fluid Flow Through Double Bell-Shaped Constrictions in a Tube", International J. of Numerical Methods for Heat & Fluid Row. Vol.12, No.3, June 2002, pp.258-289.

A MIXING INTERFACE TREATMENT FOR NUMERICAL SIMULATION OF TIP CLEARANCE FLOW OF A SMALL AXIAL FLOW FAN RENJING CAO Department of Mechanical Engineering, The Hong Kong Polytechnic Hung Horn, Kowloon, Hong Kong E-mail: Renjing. Cao @polyu. edu.hk

University

In this paper, a new numerical method was proposed typically concerning on the domain decomposition for rotating and fixed components. The mixing interface is treated between rotor tip and the outer casing designated for mass, momentum and energy exchange. The interface treatment is meshed in unstructured hybrid element for the sake of accounting for the complex geometry around the blade tip region. The prediction of fully three-dimensional, steady and unsteady flows was conducted. The fully detailed vortex structure inside the clearance and static pressure field are clearly captured and discussed in detail.

1 Introduction The goal of this paper is to focus on the small axial flow fan with diameter of 60 mm, which is popularly integrated into cooling devices of electric appliances. In such a category of cooling devices, the noise level is becoming an important measure to high product quality. It is common sense that the tip clearance flow is dominant to the aerodynamic performance and noise radiation due to the unsteady forces induced by the interactions between various tip vortex and end-wall [1-3]. But what is it for the small cooling devices? As the small axial flow fan is characterized not only by its small dimensions, but also by the low speed, hence viscous effects usually become dominant over the whole flow field [4]. This brings to the difficulties in obtaining the flow field of high resolution. Such a challenging topic has been well documented in the past, whilst it is rarely to find the open literatures on the small axial flow fans. 2

Physical Model

For the flow-structure or vortices interaction problem induced by rotating component and fixed casing addressed in fluid machinery, it is normally assumed that the flow domain is decomposed into physically reasonably sub-domain to ease the mesh generation, toward achieving

184

185

accurate physical conditions. In Fig.l, a typical axial flow compressor with rotating rotor and fixed casing is illustrated, and the tip clearance between the rotor and fixed casing is notably remained for avoiding scratch to the fixed casing. The Computational Fluid Dynamics (CFD) is capable of achieving the full 3D steady and unsteady flow field with the-state-of-arts of super computer and advanced numerical algorithms, but a great deal of computer power and some times a lot of period is necessary due to the fact that a physical jump exist. In this paper, sSsiiaP :-v:M^0 a new computational model is : proposed, in which a mixing .~.X-; >'/:.^•^.^f^^iptj?r"^-.;^':"r^S^.i'~ ^*N^>^=<^vf^""interface is applied for computational domain decomposition toward to saving the computer resources, and speed up the computing efficiency. Figure 1. Schematic representation of mixing interface model in fluid machinery

3

Computational Model

Governing equations. The unsteady compressible three-dimensional Reynolds-averaged Navier-Stokes (RANS) equations in conservative forms are applied. The RNG (ReNormalization Group) k-e turbulence model and wall function are used. Computational mesh. Due to the complex structure of the computational model, the unstructured hybrid mesh is added which is shown as below. Fig. 2(a) shows the cross-section view of computational mesh and the zoom-in view for mixing interface at tip clearance. The stream surface view of computational mesh is shown in Fig. 2(b), the small relative space is applied near the wall surface in order to capture the detailed flow structure. Boundary conditions. The boundary conditions specified as shown in Fig.3 are important measure in numerical flow simulation. Static pressure is specified at the extended outflow boundary. At the extended inflow boundary, the stagnation temperature and stagnation pressure are specified together with the flow direction. Periodic conditions are specified from one blade to the next. The boundary conditions on the

186

blade surfaces are the slip boundary but the effect of viscous stress will be taken into account by the body force representation.

v^U'S^.

(a) (b) Figure 2. Mesh view of computational domain, (a) cross section view, and (b) stream surface view Outlet Inlet

Periodic

Wall Figure 3. Schematic view of computational boundary conditions

4

Results and Discussion

The simulation is conducted for an axial flow fan with 4 blade, 60 mm diameter and 1 mm clearance height, 0.45 hub/tip ratio and rotating speed of 15000 rpm. As compared with the conventional fully 3D N-S solver, the advantages with fast convergence and accurate capturing ability at tip clearance are achieved by using the proposed mixing interface treatment. The velocity vector at tip clearance is shown in Fig.4, in which the strong vortex is generated due to the pressure difference between pressure side and suction side. Fig.5 provides the calculated static pressure field for stream surface and cross sectional view. It is seen in Fig. 5(a) that the flow stagnation point is located at leading edge, and a small region with lower pressure is found at suction

187

surface near trailing edge indicating the main loss sources. From Fig. 5(b), the left side denotes pressure side and right side is the suction side of blade, the flow at tip clearance is captured, and lower static pressure region is mainly located at clearance and suction side, in addition, a flow jetted from pressure side toward suction side at clearance is also calculated, which is the main y^^m power for secondary flow and losses generation. Figure 4. Velocity vector at tip clearance Statle Pressure Oiatribution in SladttoStadt Section Static fieeiraie Distribution in Bl»d« p u t a g * Section

(a) (b) Figure 5. Static pressure contour at 70% span, (a) stream surface view, and (b) cross section view

References 1. Storer, J.A., and Cumpsty, N.A., 1991, Tip leakage flow in axial compressors, ASME J. ofTurbomachinery, Vol. 113, pp. 252-259. 2. Stauter, R.C., 1993, Measurement of the three-dimensional tip region flow field in an axial compressor, ASME J. Turbomachinery, Vol. 115, pp. 468-476. 3. Kameier, F, Neise, W., 1997, Experimental study of tip clearance losses and noise in axial turbomachines and their reduction, ASME J. Turbomachinery, Vol. 119, pp. 460-471.

CFD SIMULATION OF PRECIPITATION PROCESS C.N.LIM Institute of Chemical and Engineering Sciences Ltd Block 28, Ayer Rajah Crescent, #02-08, Singapore 139959 E-mail: [email protected]

The influence of hydrodynamics on the precipitation process of barium sulphate has been studied using a commercially available CFD package (FLUENT 4.5). The velocity and energy dissipation fields are solved by Navier-Stokes equations along with the standard k-e turbulence model, while generation and development of the solid crystal phase are characterised by the moment transformation of the population balance. These equations and the precipitation kinetics are implemented directly into the CFD code using Fortran User-Defined Subroutines. This model is able to predict the effects of inlet velocity ratio on the mean particle size and the coefficient of variation of the particle size distribution.

1

Introduction

Precipitation of sparingly soluble materials from two aqueous ionic solutions is widely used in chemical and pharmaceutical industries to produce desired substances due to its ease of operation, reduced significance of heat transfer effects and reduction of byproducts. When functional solid materials have to be prepared using precipitation process, control of the local level of supersaturation is crucial for tailoring the product characteristics. Mixing is of primary importance in determining the local supersaturation level. Its role arises from the interplay between the kinetics of chemical reaction, nucleation and crystal growth on one hand and the rates of mixing on the other. However, how this processes interplay is still unclear, as a consequence, product properties still remain difficult to predict. Computational fluid dynamics (CFD) is a powerful technique to obtain a complete picture of the behaviour of fluids under given conditions. It is not only a software for fluid flow calculations but also a tool to simulate more complicated systems involving heat transfer, chemical reactions and multi-phase phenomena. It can, therefore, play a major role in the design of unit operations and experiments, scale-up, commissioning, developments of new processes and plant upgrade. Some pioneer CFD studies have been carried by Seckler et al. [1] and Van Leeuwen et al. [2] to simulate the processes in simplified precipitators. Garside and Wei [3] were the first to explore the application of CFD to model the precipitation process in a stirred tank. More recently, Al-Rashed and Jones [4] studied gas-liquid reactive precipitation in a batch reactor using a two-dimensional CFD model while Rousseaux et al. [5] examined the steady-state precipitation of pseudo-boehmite in the sliding mixing device using a CFD package. 2

Modelling Approach

In this study, a two-dimensional continuous double-jet reactor is employed as a benchmark case to study the influence of flow characteristics in this operating mode on the crystal properties. The precipitation process of barium sulphate is simulated using the commercial CFD package, FLUENT 4.5. This program is based on a finite volume procedure in a Eulerian frame of reference. It does not only solve the mass, momentum

188

189 and species concentrations balance equations but also allows the implementation of additional balance equations or kinetic models through Fortran User Defined Subroutines. In this way, the interaction between hydrodynamics and precipitation can be modelled with the following assumptions: (1) The solid concentration is low that its presence does not affect the flow field and the crystals follow the streamlines due to their small diameter. (2) Turbulent flow at steady state conditions. (3) Linear growth rate with second order dependency on supersaturation but independent of particle size. (4) Aggregation, breakage and dissolution of particles are neglected. (5) Prefect micro-mixing is assumed to exist in each computational cell and no local segregation of dissolved components is encountered. To describe the behaviour of liquid phase in the reactor, the laws of conservation of mass, momentum and chemical species are applied to each infinitesimal volume element. On the other hand, the moment transformation of the particle population balance is used to characterize the solid phase, dutm0

dx.

-J = 0 -pmp^G

(1) = 0,

P=l,2,...

(2)

where mp is the/? moment of the distribution (p=0,l,2,...) and 7=1,2,3 are the directions in Cartesian coordinates. These derivatives represent the convective transport of moments. The term J expresses local change in mo due to nucleation and the term pmp^G represents the local changes in mp due to crystal growth. The partial differential equations are solved by direct finite difference method. This method is based on Gauss Elimination, whereby the unknowns of the system are modified in successive steps until a solution is found. The moments of the distribution are directly related to properties of the distribution, such as total number concentration {N), specific length (L), specific area (A) and mass fraction of particles (XCrySt), N=m0;

L = m\,

A = kam2;

Xciyst= kvpcrysm?,lp

(3)

The third moment equation (equation 3 with p=3) expresses the mass conservation of solid phase. Also, the area particle mean size and the coefficient of variation of the crystal size distribution can be calculated from the moments of distribution, davg = mi/m2,

C.V. = (/MI/M3/W22-1)1/2

The precipitation reaction of barium sulphate can be described as follows, BaCl2 + Na 2 S0 4 -> BaS0 4 i + 2NaCl

(4)

190 The activity based supersaturation ratio, which represents the thermodynamic driving force of the precipitation process, is defined as,

[Ba2+ls042})

Sa

'2

*r±

(5)

j

where y± is the ion activity. As agglomeration does not take place, the precipitation of barium sulphate can be characterized using a combination of nucleation and crystal growth, J

=

kjxSam

(6)

G

=

kgxSan

(7)

The production rate of barium sulphate particles, Rs, is calculated from the local linear growth rate and the local surface area available for growth (equation (2) with p=2 and equation (3)). The production rate of chemical species in the liquid phase is related to Rs by the stoichiometry of the chemical reaction. Rs R so4

= =

3 x (£Ai) x pcryst xGxA -{mwS0Jmws)xRs

RBa

=

- (mwBa I mws) x Rs

(8)

where mw represents the molecular weight of individual chemical species. The constants in equations (5), (6), (7) and (8), as shown in Table 1, are obtained from literature data on barium sulphate precipitation [2]. Table 1. Solubility and kinetic parameters for the precipitation of barium sulphate. [2]

ATs=lxlO"10 m = 23

3

£, = 4X10 7 n=2

/fcg = 7xl0" n >tv = 0.18

ka=l.52

Simulation Conditions

Barium sulphate precipitation was achieved by adding two reactant streams via two separate inlets into the continuous reactor, as shown in Figure 1. No mixing device was present. All simulations were carried out using a Cartesian grid in a two-dimensional domain, thus simplifying the analysis of the results. The grid was fine enough to obtain a grid-independent solution. The simulations were carried out in two steps. First, only the flow variables were calculated. Once convergence of flow field was reached, the precipitation kinetics was added to the model to resolve the chemical species and the

191 moments equations. The hydrodynamics of this precipitation process was studied by changing the concentration and velocity ratio of the inlets while keeping the total throughput constant. In this way, the residence time in the reactor and the total input of chemical species were maintained but the flow pattern varied. 100 mm

Sodium Sulphate

Sodium Sulphate

Barium Chloride Figure 1. Schematic diagram of the double-jet continuous reactor.

References Seckler MM., Bruinsma O.S.L. and van Rosmalen G.M., Influence of hydrodynamics on precipitation: a computational study. Chemical Engineering Communication 135 (1995) pp.113-131. Van Leeuwen M.L.J., Bruinsma O.S.L. and van Rosmalen G.M., Influence of mixing on the product quality in precipitation. Chemical Engineering Science 51(11) (1996) pp.2595-2600. Wei H. and Garside J., Application of CFD modelling to precipitation systems. Journal of Chemical Engineering Research and Design, Transactions of the Institution of Chemical Engineers. 75 (1997) pp.219-227. Al-Rashed M.H. and Jones AG., CFD modeling of gas-liquid reactive precipitation. Chemical Engineering Science 54 (1999) pp.4779-4784. Rousseaux J.-M., Vial C, Muhr H. and Plasari E., CFD simulation of precipitation in the sliding-surface mixing device. Chemical Engineering Science 56 (2001) pp. 16771685.

EFFECT OF R O T O R ' S ASPECT RATIO ON E N T R O P Y GENERATION IN A ROTOR-CASING ASSEMBLY Bassam Abu-Hijleh, Jiyuan Tu, and Aleksandar Subic Department of Mechanical & Manufacturing Engineering, RMIT University PO Box 71 Bundoora, Vic 3083 - AUSTRALIA Tel. (++61-3) 9925-6110, Fax (++61-3) 9925-6108 Bassam.Abu-Hiileh(a),RMIT.edu.au. Jiyuan.TufajRMIT.edu.au, [email protected] ABSTRACT Computational Fluid Dynamics (CFD) is used to simulate the flow filed in a rotor-casing assembly for different elliptic rotor aspect ratios and inlet flow velocities. The flow was simulated with the rotor fixed at its extreme positions, i.e. vertical and horizontal arrangement. The flow field results were used to ascertain the changes in the efficiency of a rotor-casing assembly. This included: inlet pressure, maximum velocity, and maximum turbulence values. A more fundamental quantity, integrated entropy generation was also calculated. Entropy generation is based on the second law of thermodynamic and accounts for all types of irreversibilties within the assembly. Of the different traditional quantities calculated, only the inlet pressure results were inline with the entropy generation results. KEYWORDS Rotor aspect-ratio, CFD, Entropy generation, Optimisation, Supercharger. INTRODUCTION Engine power rating represents an important selling point in the automotive industry. The need for more engine power in the automotive industry has traditionally been fulfilled by using bigger engines. During the majority of actual driving conditions the engine is required to generate a small fraction of its rated peak power. The use of big engines to develop the small power needed for day-to-day motoring conditions is an expensive and inefficient practice. The high level of air pollutants produced by the automobile engines has further exasperated this issue [1]. The pollutant production is, among other factors, proportional to the size of the engine and its operating conditions. Thus the automotive industry requires a small engine that can deliver the peaking high power performance of large engines yet still be economical and environmentally friendly at the nominal operating conditions of day-to-day driving. The answer to this issue has been the addition of different types of power-boosting devices to small engines. The main idea behind power boosting comes from the basics and restrictions of internal combustion (IC) engines. The amount of fuel that can be burned properly in the engine is a function of the mass of the oxygen that enters the cylinder [2]. This is achieved by forcing denser air into the engine's cylinders. Increasing the air density is achieved primarily by compressing the atmospheric air before it is fed into the engine's cylinders. This can be achieved in many ways but the most common methods are turbocharging and supercharging [2].

192

193 The effect of the rotor's aspect ratio on the energy losses in a rotor-casing assembly, used as a model of a power-boosting device, is addressed numerically in this paper. This work is part of a major, multi-disciplinary, supercharger research program at the Department of Mechanical Engineering - RMIT. The new designs will be used in conjunction with current and future supercharger prototypes under investigation. FLUENT, a Computational Fluid Dynamics (CFD) package is used to simulate the flow field. The turbulent flow is solved using the K-E based Renormalised Group Theory (RNG) turbulence model over a Finite Volume (FV) grid. A User Defined Function (UDF) was coupled with FLUENT and used to calculate the local and the integrated total entropy generation in the flow field based on the detailed velocity field calculations. MATHEMATICAL MODEL Figure (1) shows a schematic of the flow field under consideration for a vertical orientation case. All length scales in Fig. (1) are in centimetres but the schematic is not to scale. The casing radius was fixed at 10 cm. The rotor cross-section is an ellipse. The rotor-casing clearance (CL) was varied by changing the rotor's major radius (a = 8 and 9.8 cm). This translated to a CL of 20% and 2%, based on the casing radius. Due to symmetry, only the top half of the domain was solved for. The parameters studied in this paper included: rotor aspect ratio (AR= a/b = 1, 1.25, 2, 5, 10), inlet velocity (Vjn = 1, 5, 10 m/s), and rotorcasing clearance (CL = 2% & 20%). At AR=1, the ellipse is a circle. This value is used to show the low limit of AR. The study was conducted at the two extreme rotor positions (vertical & horizontal). The CFD package FLUENT was used to simulate the flow. FLUENT solves the Reynolds-Averaged Navier-Stokes (RANS) equations over a finite volume grid. The RNG turbulence model was used to model the turbulence quantities that arise from the time averaging process. RNG is a major modification to the standard k-e turbulence model [3]. At high Reynolds number, RNG is similar to the standard K-E, but RNG has special treatment for the effective viscosity at low Reynolds number. This is particularly important in flows where there are flow separation, reattachment, and recirculation zones [4]. In modelling the outlet as a pressure-outflow boundary condition, the location of the outlet was placed far enough not to affect the flow flied in the casing. Other modelling checks included mesh independence and residual convergence value. The same size mesh, 10400 cells, was used for all configurations. The residual convergence value used was 1E-4. Smaller residual values did not result in any noticeable change of the calculated results but required significantly more computation time. Entropy calculation was used to assess the thermodynamic efficiency of the different inlet/outlet fillet configurations. Entropy is a fundamental property based on the second law of thermodynamics and can account for all types of irreversibility in a system. The 2D local entropy generation equation can be written as [5]: S=-

5T 9x

(an. + {dy

9u

T

2 9x

l J

2

+2

OV

(du dv + —+—

\dy

(1)

dx

Where § is the rate of entropy generation per unit area, for the 2D case, k is the thermal conductivity, T is temperature (constant for the current case without heat transfer), and (i is the dynamic viscosity. The first term of the RHS represents entropy generation due to

194

thermal effects while the second term represents entropy generation due to viscous effects. Thermal effects were not considered in this study and thus only the second term was calculated. The total entropy generation was calculated by integrating the local values over the entire flow filed domain between the inlet and outlet. RESULTS The effect of rotor aspect ratio on the overall flow field was studied by focusing on the changes in the pressure difference between inlet & outlet, maximum velocity, maximum turbulence levels, and total entropy generation. Since the inlet and outlet were of the same dimensions, there was no difference in the static and total pressure differential between the inlet and the exit. All results were partially normalised by the proper factor of the inlet velocity, e.g. inlet static pressure (Pjn) was normalised by Vi„2 only without using 0.5 p, where p is the density. The maximum velocity (VmaX), maximum turbulence (KEmax), and total entropy generation (Stot) were normalised using Vjn, Vjn2, and Vjn3, respectively. Figure 2 shows the changes in the normalized Pjn, Vmax, KEmax, and Stot as a function of the rotor aspect ratio for the case of CL=20%. In the legend, H and V indicate the rotor in the horizontal and vertical positions, respectively. The number next to H and V indicates the value of the inlet velocity (Vjn). The results for the case CL=2% were similar and are not shown to save space. This figure shows that the use of high aspect ratio rotors, AR>2, results in higher thermodynamic efficiency. It can be seen that only Pj„ trend is compatible with that of Stot. At times, Vmax and KEmax give contradicting results to those of P;„, Stot, and even each other. The problem with these two traditional quantities is that they focus on a small segment of the flow field without due consideration of the remainder of the flow field. For example, KEmax gives an indication of the maximum turbulence at some point within the flow field but without consideration of the turbulence intensity distribution in the reminder of the flow field. A high level of localized turbulence (KEmax) coupled with low turbulence levels in the reminder of the flow field is less problematic than a somewhat lower KEmax that is almost uniform across the flow field. This is a clear indication of the need for a global and fundamental quantity that can account for all types of losses throughout the flow field. For simple geometries without heat transfer, inlet static pressure can provide such a measure. But in today's complex systems, which integrate several components, static pressure cannot be used to evaluate the efficiency of such systems. Also static pressure does not give any insight as to the locations and/or components that are responsible for high entropy generation rates, i.e. the least efficient. The contours of the local entropy generation can identify regions and/or components responsible for high rates of entropy generation. This is true in all types of flow fields even those with heat transfer. REFERENCES: 1) Takei, N. and Takabe, S. (1997). JSAE Review, 18:331-338. 2) Heisler, H. (2001). Vehicle and Engine Technology 2nd edition, SAE publications, Warrendale. 3) Mohammadi B. and Pironnean O. (1993). Analysis of the K-Epsilon Turbulence Model, John Wiley & Sons, Chichester. 4) Abu-Hijleh, B. A/K (2000). Journal of Computers and Fluids, 29: 261-273.

195 5) Bejan A. (1982). Entropy Generation Through Heat and Fluid Flow, John Wiley & Sons, New York. I

P

Vi n = 1,5,10 m/s

exit=0Gage

Figure 1 Schematic of the rotor-casing assembly including relevant dimensions.

—*> rrrt t.,-.::::::-----"'-'--' * • " *

t

. ' '

•

,*M»

-*-««

.

j**^

:::»

• m- v;t - • - V;5

•

J

^^

-;;:;;:::::::

{j?^^t^—~^1 '

'

*

Figure 2 Changes in the normalized inlet pressure (Pi„), maximum velocity (Vmax), maximum kinetic energy (KEmax), and total entropy generation (Stot) as a function of the rotor aspect ratio (AR) for the case CL=20%.

LATTICE BOLTZMANN METHOD ON NONUNIFORM MESH M. CHENG AND K. C. HUNG Institute of High Performance Computing,] Science Park Road, #01-01, The Singapore Science Park II, Singapore E-mail: chengm @ ihpc. a-star. edu. sg

Capricorn

A lattice Boltzmann method on nonuniform mesh is developed for the simulation of incompressible flows. The computation mesh is uncoupled from the lattice. The particle distribution function on the mesh is obtained using the Lagrange interpolation technique at each time step. Comparison is made between using the uniform and nonuniform mesh for the two-dimensional driven cavity flow. The result shows that the present scheme not only can predict the flow separation characteristic satisfactorily with much less mesh but also increase the computational stability. The influence of the mesh structure and interpolation on the results will also be discussed.

1

Introduction

The lattice Boltzmann method (LBM) has become an alternative to the conventional computational fluid dynamics methods for solving Navier-Stokes equation. Unlike traditional numerical methods which solve for the macroscopic variables, such as velocity and density. The LBM is based on the microscopic kinetic equation for the particle distribution function. The macroscopic quantities are obtained through moment integration of the distribution function. Recent numerical simulation of complex flows has shown that the LBM is quantitatively accurate and computationally efficient on a uniform lattice space. However, the use of uniform lattice mesh is quite restrictive in many applications involving flows where a large gradient exists only in a small region while the domain of interest is large. Some effort has been devoted to the development of the nonuniform mesh in LBM. The earliest attempt to use the one-dimensional quadratic interpolation for a nonuniform mesh grid had been made by He, Luo and Dembo [1]. They proposed the interpolationsupplemented scheme to update the particle distribution function on the coarse portion of the grid after the propagation step. Recently, a similar scheme has been extended to simulate three-dimensional flows in complex geometries [2]. Their results show that with the nonuniform grid approach, the total CPU time required for a simulation of the flow in a stirred tank can be reduced by roughly 75% and still provide the same spatial accuracy as would be obtained with a uniform high-resolution grid. An explicit LBM was developed by Shu, Niu and Chew [3] to simulate flows in an arbitrary geometry. The method is based on the standard LBM, Taylor-series expansion and the least-squares approach. An alternative method for complex problems, the finite-volume LBM was proposed by Succi, Amati and Benzi [4]. These schemes show a good capability in real application.

In this paper, a LBM on nonuniform mesh is developed to improve the calculation efficiency for practical flow problems. The computational mesh is uncoupled from the discretization of momentum space. Collisions still take place on the grid points of the computational mesh. After a collision, the density distributions move according to their velocities and

196

197

the new distribution function is determined on the mesh using the Lagrange interpolation. 2

Basic Algorithm

The physical space is divided into a regular lattice and the velocity space is discretized into a finite set of velocities {ca}, the Boltzmann equation with BGK approximation can be discretized as /„(*, +caAt,t + A)-fa(x„t)

= -\fa(xl,t)-f?(x„t)]/T,

( 1)

Where At and caAt are time and space increments, respectively. fa is the single-particle velocity distribution function along the ath direction. fjq is the equilibrium distribution function, and ris the single relaxation time. For simplicity and without loss of generality, we consider the twodimensional square lattice with nine velocities, the D2Q9 model: C,,3= ( ±C, 0 ), C 2 ) 4 = ( 0, ±C ), C 5 ,6,7,8= ( ±C, ±C ), C9 = (0, 0).

(2 )

For athermal fluids, the equilibrium distribution function for D2Q9 model is given by / < r = / w e [ l + 3c^(c 0 -H) + 4.5^(c a -ii) 2 -1.5 C - 2 (B-B)],

(3)

with w\= H>3 = H>5 = H>7 = 1/9, W2 = W4 = W(, = w% = 1/36 and W9 = 4/9. The macroscopic density p and velocity u are related to the distribution function by

p=ifa,

p»=ifaca-

(4)

Using the Chapman-Enskog expansion, the equation (1) can recover the Navier-Stokes equation to the second order of accuracy, with the kinematic viscosity given by v = (r-0.5)c2At/3. (5) In Standard LBM, equation (1) has a two-step process: streaming and collision. A particle is initially at the grid point A(x, y), this particle will stream to the position A\x+caxAt, y+ cayAi) along the a direction, as shown in figure 1. For a lattice mesh, A' is at the grid point. However, A' is usually not at the mesh point when mesh is uncoupled from lattice. In the numerical simulation, we are only interested in the density distribution function at the mesh point. In the present study, the following Lagrange interpolation is adopted to get the density distribution function at the nonuniform mesh point at time t+At. 3

3

m=l n =l

3

3

P=l P*i

q=\ 1*j

Equation ( 1) can be solved in the following three steps:

198 Collision: 7a(*,.') = / „ ( * , . 0 = - k , ( * ( . 0 - / o , f ( * ( . 0 / T ] ,

(7)

fa(x,+caAt,t

(8)

Streaming: + At) = fa(x„t),

Interpolation: fa(xiJ

+ At) = ±fjlJ(xl-x,p)/(xtm-xXp)fJ(x7-x2q)/(x2n-x2g)fa m=l n=l

3

p=l

.(9 )

9=1

Numerical Simulation and Comparisons

The two-dimensional flow in a square cavity whose top wall moves with a uniform velocity in its own plane, as shown in figure 1, has served as a model problem for testing and evaluating the present numerical scheme. Boundary condition commonly used at solid wall is the no-slip condition for which velocities vanish at the wall. This is implemented in the lattice Boltzmann method with the bounce-back rule in which all particles hitting the wall are reflected back in the direction from which they come. Figure 2 shows a comparison of the vorticity contours for Reynolds number Re = 1000 and different £(where Sis the ratio of the maximum mash size to lattice size). When 8 — 1 and a 65x65 uniform mesh, the mesh is not dense enough to capture the flow near the boundary (figure 2 (a)). When 8-2 and a 65x65 nonuniform mesh, the calculated vorticity contour values in full flow field agree better with the solution obtained by the finite difference method (FDM) [5](figure 2 (b) and (d)). It is found that only when uniform mesh increases to 121x121 (figure 2(c)), one can achieve as good as the results obtained by FDM at the same Reynolds number. On the other hand, compared with previous LBM studies on the same problem at a higher Reynolds number, the required number of nonuniform mesh grid is reduced at least by a factor of a few hundred. Figure 3 shows the relative global error of stream function and vorticity against the different 8 when the iteration satisfies the convergence criterion. The error increases with the increase of S for a fixed mesh number. When 8> 10, the error increases remarkably. This is similar to the property of finite difference approximation on nonuniform meshes. When S < 5, the stream function and vorticity errors are below 6% and 15%, respectively. If 8is limited below 5, a satisfactory accuracy can be achieved in the flow simulation. Figure 4 plots the regions of stability and instability in the LBM computation for two-dimensional cavity flow. It is interesting to note that numerical stability increases with increasing S.

4

Conclusion

The numerical results show that the present method is capable of satisfactorily simulating the flows where a large gradient exists only in a small region while the domain of interest is large. The algorithm is simple and the intrinsic parallelism of the algorithm maintains unchanged. The extension of the present method to three-dimensional problems is straightforward. The present scheme also increases the computational stability.

199

References 1. He X. Luo L. S. and Dembo M., Some progress in lattice Boltzmann method Part I. Nonumiform mesh grids. J Comput Phys., 129 (1996) pp.357-363. 2. Lu Z. Liao Y. Oian D. McLaughlin J. B. Derksen J. J. and Kontomaris K., Large eddy simulations of a stirred tank using the lattice Boltzmann method on a nonuniform grid. /. Comput. Phys., 181 (2002) pp.675-704. 3. Shu C. Niu D. X. and Chew Y. T., Taylor-series expansion and leastsquares-based lattice Boltzmann method: Two-dimensional formulation and its applications. Phys. Rev. E, 65 (2001) pp. 1-13. 4. Succi S. Amati G. and Benzi R., Challenges in lattice Boltzmann computing. J. Stat. Phys., 81 (1995) pp.5016. 5. Ghia U. Ghia K. N. and Shin C. T., High-Re solution for incompressible flow using the Navier-Stokes equations and a multigrid method. J. Comput. Phys., 48 (1982) pp.387-411.

NUMERICAL STUDY OF THE EFFECTS OF CHECK VALVE CLOSURE FLOW CONDITIONS ON PRESSURE SURGES IN COMPLEX FLUID SYSTEMS WITH AIR ENTRAINMENT T.S. LEE Mechanical Engineering Department, NUS, Singapore 119260 Fax: (65) 779 1459; E-mail: mpeleets® nus.edu.sg A modified MOC method with variable time step has been proposed in this work to better compute the characteristics of pressure surges when check valves closed at different flow conditions in a complex fluid system. A numerical computational model for the transient fluid system with air entrainment is also implemented in this study. Gas absorption and release under transient pressure conditions are being considered in the present numerical computations. 1

Introduction

Most of the earlier studies on check valve closure in transient flow were on experimental work. Thorley (1989) compiled these work and showed that the effects of check valve closure on the pressure transients are predominantly dependent on the magnitude and gradient of the flow velocities immediately downstream of the check valve at the instant of valve closure. Few numerical studies were done on check valve responses in unsteady flow. Previous numerical studies [Lee et al (1990, 1995] assumed that the check valves closed when the reverse flow approaches zero in a fluid system. However, this is not true for most practical transient flow problems involving pumps shutdown and with check valves installed in the unsteady fluid system operations.

2

Governing Equations and Numerical Methods

For the present study, when the system shutdown, the flow occurs at the check valve is assumed responding according to the modelled dynamic characteristics of the check valves given by Figure 1(a). The characteristics of the flow through the check valve [Figure 1(b)] are numerically modelled here as VR/V0 = D, + D2(A*) + D3(A*)2 + D4(A*)3 where A* = ldV/dtl/[(V0)2/D] is the reverse flow deceleration parameter through the check valve, V0 is the steady state flow velocity, dV/dt is the reverse flow velocity gradient, D is the check valve nominal diameter, Di, D2 and D3 are the characteristic parameters of the type of check valve and D4 is the characteristic parameter of the check valve due the effect of air entrainment [Lee and Pejovic (1996)]. At any instant of time t=kAt and at any location x=iAx along the pipeline, the local wave speed aj at an absolute pressure Pj and air volumetric void fraction of ei is modelled as a local variable with gas absorption and release [Lee & Pejovic(1996)] -1/2

fl.* =

V

t i l

£•

'h K

nP

cD^

eE

, where £ * = / ( J > * )

J

200

201

'.'/.[,:.•••• %i

^\h^~

:M^:

- • : . : . . 4:..:' : ::. :.

^J^GS^^

I*»^stpwB^|vife<^^ipii*^rti<w: :

*W»isiirfW; VrfmtWsej

fe^rwift^iiiifiij:

; %*e;i;; ; I f t ^

M ^ W i f i M t k a i i l M t t fi«M»;il«««»t«t

;;

202

l

4

IVJIIIf-tWSWES**:-:-.-;

'fr;v; ;:i--'s:.-:

::.: «:

;

.cm

:

:•'•!:.::\-::l.

c* :•:•';; ! : ^ 8 * T , ; K # ' ^E:¥fl^: E •: ^i^jjiiSii^^;::

'*$ ::":;i;:^|p::\ ™:

. :. ,:.*H»e;ft K::

'W"

: : ::

' ' "•' i

:

./::,«

Illilii

t : , : i X a ^ t " - T *•••?•• *:-:ri.r:*.-:

Itim*^ : :

{») ;• # r t / ^ i p : l i i t i % i a i » i i r t : •: •

te:-

203

The characteristics equations applied to the pressure transient problem with the above modelled variable wave speed ai can be described by the respective C+and C~ characteristic equations. With the variable local wave speed, the above characteristics equations are solved in a variable time step domain. The irregular t-grid and regular x-grid is denoted by i , the regular x-mesh point value at location x = (iAx) and k denotes the irregular time level corresponding to the time at t k = V (Atk J. The value of the time step Atk at each time level is determined by the CFL criterion Atk = min kjAx/l V; +a ; J i = 0, 1, 3.

for

, N, where kj is a constant less than 1.0. Results and Conclusions

The flow system and boundary conditions considered here are similar to that described in Lee (1995) except that in the present study, the check valves are allowed to have different flow characteristics as described by Figure 1(a). Figures 2 shows the typical pressure transients at the lowest point A [maximum pressure] and the peak point B [minimum pressure] of the pipeline contour with check valve closure at different flow conditions [CV1, CV2, CV3, CV4, CV5, CV6]. The highest pressure surge observed in this study occurs at the peak reverse flow velocity [ VR in Figure 1(b)], while the pressure surges were relatively low for positive or near zero flow velocity through the check valve when the check valve closed. With a fluid system operating within the critical range of air entrainment values, the present analysis showed that there is a possibility of "high pressure surges" when the check valves closed at various conditions. This phenomenon was confirmed through field observations. This study thus concluded that a detailed numerical transient analysis of the fluid system with various assumed amount of entrained air is necessary whenever there is a possibility of air entrainment into the fluid system and that the flow conditions at the instant of check valve closure need to be modelled. References 1. Thorley, R.D., 'Check valve behaviour under transient flow conditions: a state-of-the-art review', ASME J. Fluids Eng. Vol. 111, pp.178-183 (1989). 2. Lee, T.S., 'Numerical computation of fluid transients in pumping installations with air entrainment', Int. J. for Numerical Methods in Fluids, Vol. 12, pp.747-763 (1992). 3. Lee, T.S., 'Numerical studies on effect of check valve performance on pressure surges during pump trip in pumping systems with air entrainment', Int. J. for Numerical Methods in Fluids, Vol. 21, pp.337-.48 (1995). 4. Lee, T.S. and Pejovic, S., "Air influence on Similarity of Hydraulic Transients and Vibrations", ASME Journal of Fluids Engineering", Vol.118, No.4, pp.706-709. (1996) 5. Lee, T.S., "Visiting Scholar- Sabbatical Leave at University of Michigan, Ann Arbor, USA", Report, Faculty of Engineering, National University of Singapore, October 2001.

NUMERICAL SIMULATION OF CZOCHRALSKI CRYSTAL GROWTH BY FIXED GRID TECHNIQUE M A SURIADI* D. XU** B.C. KHOO*** * Data Storage Institute, Singapore. **Institute of High Performance Computing, Singapore. ***Department of Mechanical and Production Engineering, National University of Singapore. The fixed grid technique is used to simulate the moving solid/melt interface problem in the Czochralski crystal growth process in the paper. The latent heat related with phase-change phenomena in the crystal growth process is incorporated to the global heat balance. The NavierStokes and energy equations in melt, and the heat conduction equation in crystal are solved using control volume method. The effects of the various growth parameters to the melt-solid interface at different growth stages are investigated in detail.

1. Introduction In the last decades, the Czochralski (CZ) crystal growth method has been the dominant technique for the commercial production of semiconductor and oxide crystals [1]. In CZ growth, the melt flow is usually considered as a laminar flow regime, and is mainly a combination of the natural and the forced convection. Different heat transfer mechanisms exist in the process. The melting or solidification process also appears in CZ growth, and the solid/melt interface curve is determined by thermal conditions. Different numerical methods[2-8] have been used to simulate the CZ crystal growth process. Various growth models have been studied. The bulk-flow model emphasizes on the melt flow [4, 9-12]. The hydrodynamic thermal-capillary model (HTCM) includes the solid crystal part and is aimed to capture the interaction between heat transfer and interfacial phenomena in the system [2, 3, 8]. In the HTCM model, the shape of the melt-solid interface is a part of the computation result. There are various methods to track moving melt/solid interface [13, 14]. In the present study, we applied a fixed non-uniform grid configuration, with fine grids located near the boundaries, where large variation of the various variables is expected. The fixed grid formulation is an efficient method to predict the melt-solid interface shape and to give an insight of the melt flow pattern. Since the melt-solid interface moves very slow in comparison to the fluid flow velocity, the steady Navier-Stokes and energy equations are used in the present study. Different melt heights (deep, medium and shallow) are employed to represent three different stages of the growth process. The effect of different growth parameters (heat input, crystal rotation) is studied. 2. Governing Equations and Boundary Conditions In CZ growth process, the melt flow is assumed to be axisymmetric laminar flow and melt itself is modeled as an incompressible Newtonian Boussinesq fluid. The non-dimensional Navier-Stokes equations and boundary conditions in the cylindrical coordinates are: „ . . Continuity equation:

1 d(ru) dw _ 5 —-H = U.

r dr

Momentum equations:

du

\-W

du

(1) v2

dp _, = ——+Vw dt dr dz r dr dv dv dv uv _2 2 v uv + uv = i-,? — + Muv — + W—V V - v^ + U — + W— + — = V v — dt dr dz r r Yu

du

dz

u r-

_2

dw dw dw dp _ , ^ „, —- + u—- + w— = —— + Vw + GrT dt dr dz dz dT dT dT 1 _. •

Energy equation:

\-U

dt •

hw

dr

= — V 1

dz

(2)

r

(3)

(4) (5)

Pr

Boundary conditions:

u = v = dw/dr = dT/dr = 0 u = w = 0, v = Re c , T = Tc u = w = 0, v = r R e c , T = Tc u = v = 0, v = rRex, T = Tx du/dz = dv/dz = w = 0

for for for for for

r = 0, 0
Radiation at the melt and solid surfaces is included in the heat flux equation:

204

(6) (7) (8) (9) (10)

205

-k,%. = eofc-Tj)

(ID

For solid part, only heat conduction equation is considered

( dT3T\ + — r— 0 dz \ dz heat flux balance equation at the melt-solid

a

Stefan's by:

r— dr

-klVTl.fi = -kyTs.ri-PsVgAH

(12) interface, which takes into account the latent heat, is given

(13)

The control volume method is used to discretize the governing equations and boundary conditions. Strongly Implicit Procedure (SIP) is adopted. The pressure-velocity coupling method and the implicit pressure-correction SIMPLE (Semi-Implicit Method for Pressure Linked Equation) method are chosen in the present work. 3. Results and Discussions Crystal growth from Germanium melt is numerically studied in the present paper. Various test cases are performed to give a good insight of the effect of different growth parameters (heat input, crystal rotation) to a small/large diameter crystal, and the depth of melt. A non-uniform fixed grid configuration with fine grids located near the boundaries is employed and represented in Fig. 1. 3.1 Small diameter crystal cases( AR = 0.34) Some results are presented in Fig.2 and Fig.3. It can be seen that at a small temperature difference ( d T = 1 OK ), the interface shape stays concave at various melt heights. The concavity is less pronounced when the melt is deep, and it gets stronger when the melt is shallow. This concavity change-attitude agrees with the results of Kopetsch [3]. Fig.4 shows the results from a case with large heat input ( d T = 30K). It can be seen that the interface shape is convex. Kopetsch confirmed our computation works for cases without crystal rotation. In the present study, it is found that the interface shape depends more closely to the heat input than to the crystal rotation rate and the melt height. The convexity increased when the melt height decreased and this change is more pronounced during the first half of the growth process. It is also noted the existence of the small isolated clockwise vortex due to the perpendicular angle at the crucible bottom corner. For the small crystal configuration, the numerical results show that the crystal rotation rate has no significant effect to the change-attitude during the growth process. It is noticed that the crystal rotation had no strong effect to the interface concavity/convexity. However, it changes the melt flow pattern significantly as shown in Fig.5 and Fig.6. The crystal rotation also affects the isothermal distributions as shown in Fig.7. With the presence of the clockwise vortex due to the crystal rotation, the melt is more stirred. The melt from the hot region near the crucible wall is transported near to the melt-crystal interface region. This can be seen from the isothermals that are pressed up towards the melt-crystal interface. However, our results demonstrated that the change of melt isothermal distribution does not affect the melt-crystal interface concavity/convexity. So the interface shape depends more to the heat input than to the imposed crystal/crucible rotation rate, but the crystal rotation can enhance the crystal quality. 3.2 Large diameter crystal case ( A R = 0.68) The computation results based on large diameter crystal demonstrated that the heat-input magnitude is still the determining factor of the melt-crystal interface shape. The interface shape stays concave in case with low heat input, and convex in case with high heat input. The interface shape change-attitude during the growth process is also similar. However, in the large crystal case, the effect of crystal rotation to the melt flow pattern is more pronounced as shown in Fig.8 and 9. The results show that when a crystal rotation rate is imposed, the melt flow is dominated by the forced convection vortex, and when no crystal rotation rate is imposed, the flow is dominated by the counter-clockwise buoyancy-driven vortex. 4. Conclusions The finite volume method performed on the fixed grid configuration has successfully been employed to simulate the Czochralski crystal growth based on the hydrodynamic thermo-capillary model. The computation results provide correct prediction of the crystal-melt interface shape as well as the flow pattern in the melt. The numerical results demonstrated that with a good grid point distribution, the method is able to avoid the need to use too fine grid and still capture the correct interface shape within a reasonable computation time.

206 Numerical results demonstrate that interface shape depends more to the heat input than to the crystal rotation rate. When the temperature difference between the crucible walls and crystal interface is small, a concave melt/solid interface is formed. Inversely, when the heat input is strong, the interface shape is expected to be convex. The interface shape (concave/convex) stays the same during the whole growth process. However, the magnitude of the concavity/convexity changes during the growth process and its change is more significant during the first half of the growth process. Once the thermal equilibrium of the melt-crystal system is reached, small changes are observed. References [I] Hurle, D.T.J. Crystal pulling from the melt, Springer-Verlag, 1993. [2] Kobayashi, N. and T. Arizumi. The numerical analyses of the solid-liquid interface shapes dyring the crystal growth by Czochralski method, Japanese Journal of Applied Physics Vol.9 No.4, pp.361-367, 1970. [3] Kopetsch, H. A numerical method for the time-dependent Stefan problem in Czochralski crystal growth, J. Crystal Growth 88, pp.71-86, 1988. [4] Xu, D„ C. Shu and B.C. Khoo. Numerical simulation of flows in Czochralski crystal growth by second-order QUICK scheme, J. Crystal Growth 173, pp.123-131, 1997. [5] Peric, M. and H.J. Leister. Numerical simulation of 3D Czochralski-melt flow by a finite volume multigridalgorithm, J. Crystal Growth 123, pp.567-574, 1992. [6] Crochet, M.J., P.J. Wouters, F.T. Geyling and A.S. Jordan. Finite-element simulation of Czochralski bulk flow, J. Crystal Growth 65, pp.153-165, 1983. [7] Derby, J.J. and R.A. Brown. On the dynamics of Czochralski crystal growth, J. Crystal Growth 83, pp. 137151, 1987. [8] Van den Bogaert, N. and F. Dupret. Dynamic global simulation of the Czochralski process - 1 . Principles of the method, J. Crystal Growth 171, pp.65-76, 1997. [9] Langlois, W.E. Convection in Czochralski growth melts, Physico Chemical Hydrodynamics Vol.2 no.4, pp.245-261, 1981. [10] Wheeler, A.A. Four test problems for the numerical simulation of flow in Czochralski crystal growth, J. Crystal Growth 102, pp.691-695, 1990. [II] Buckle, U. and M. Schafer. Benchmark results for the numerical simulation of flow in Czochralski crystal growth, J. Crystal Growth 126, pp.682-694, 1993. [12] Xiao, Q. and J.J. Derby. Buk-flow versus thermal-capillary models for Czochralski growth of semiconductors, J. Crystal Growth 129, pp.593-609, 1993. [13] Shyy, W., H.S. Udaykumar, M M . Rao and R.W. Smith. Computational fluid dynamics with moving boundaries, Taylor & Francis, 1996. [14] Sethian, J.A. and J. Strain. Crystal growth and dendritic solidification, J. Comput. Phys. 98, pp.231-253, 1992.

Crucible bottom Fig.l Typical mesh distribution

207

0.045 0.06 0.075 009

0.015 0.03 0.Q4S 0.06 (1.075 0.119

> a.015 a m aois

Fig.2 Isothermals (Small crystal - Low heat input - No crystal rotation case)

o o.ai5 am 0.045 am &075

0 0.013 am 0.01s am ams

0 aois mo aiMS a w ams am

Fig.3 Streamlines (Small crystal - Low heat input - No crystal rotation case)

0 0015 0.03 0045 0.06 0.075 0.09

0 0015 0.03 0045 0.04 0075 0.0*

o aois o.o3 0.045 o.ot am am

Fig.4 Streamlines (Small crystal - High heat input - No crystal rotation case)

o

aoisacBaon

aois am aots aot 0.075 0.

0 aots am aots am am am

Fig.5 Streamlines (Small crystal - Low heat input - with 0-20-40rpm crystal rotation)

208

e

0015 &03 0.045 aw aoTS am

Fig.6 Streamlines (Small crystal - High heat input - with 0-20-40rpm crystal rotation)

0 0.015 0.03 0.04S 0.06 0.075 0.09

o aoi5 a m am

OM OOTS am

o 0.015 am aw

OM 0075 am

Fig.7 Isothermals (Small crystal - Low heat input - with 0-20-40rpm crystal rotation)

1

oo/5 003 0.04s 0.06 aim am

0M15 0.03 0045 0.06 0075 009

0 0015 0.03 0.045 OM

Fig.8 Streamlines (Large crystal - Low heat input - with 0-20-40rpm crystal rotation)

0 aOIS 0.03 0.045 0.06 0.075 0.09

o 0.015 om aois am am

am

o aois ao3 not; am am

aw

Fig.9 Streamlines (Large crystal - High heat input - No crystal rotation)

A PRE/POST PROCESSOR FOR FINITE ELEMENT MODELLING FOR COAL MINING APPLICATIONS S.G. CHEN, S. CRAIG, H. GUO AND D.P. ADHIKARY CSIRO, Po Box 883, Kenmore QLD 4069, Australia Email: [email protected] A pre/post processor has been developed to prepare input data for and present the computational results from analyses using a coupled mechanical/two phase fluid flow finite element program, called COSFLOW. An example demonstrates the pre/post processor's functionality.

1

Introduction

A project called 'Predevelopment Studies for Mine Methane Management and Utilization' is being undertaken at CSIRO to reduce the environmental impacts of mines by developing new recovery and utilization technologies for fugitive greenhouse gases [1]. As a part of the project work, a 3D/2D finite element code called COSFLOW has been developed to predict permeability changes due to rock deformation together with water and methane gas flow during mining in order to select and plan the most suitable mining, gas management and utilization strategies for new mining areas. It uses a Cosserat model to efficiently simulate the deformation of interbedded layered rock, as found in coal measures, and includes models of coal gas desorbtion. A pre/post processor has been developed to prepare input data for COSFLOW and present the computational results. It provides an integrated environment for the COSFLOW analyses. 2

The pre/post processor overview

The pre/post processor is developed using VC++, MFC and OpenGL [2, 3] in which the VC++ is the development environment, the MFC, as a library, provides pre-written C++ classes, and the OpenGL is used to draw 3D/2D graphs. It integrates two main parts: the preprocessor and the post processor as shown in Figure 1. The preprocessor component includes the File

Mesh

Display

Colour

Search

Export

RUN View

Result

COSFLOW Pre/Post Processor

i

CSIRO Exploration and Mining

j

Figure 1. Menus included in the pre/post processor.

209

Range

Help

210

menus labelled File, Mesh, Display, Colour, Search and Export and is used to prepare the input data and generate the necessary input files for the COSFLOW analyses. The post processor includes the menus labelled View, Result and Range for presenting the computational results. COSFLOW can be run using the menu labelled RUN. 3

The preprocessor

COSFLOW analyses are performed in a series of steps. During each step, a number of mechanical and/or fluid flow cycles are performed to attempt to achieve mechanical equilibration or to simulate water or gas flow within the rock mass. Between steps, a number of parameters may be changed. Usually, elements are removed to simulate mining, but there may be other changes, such as the addition of elements to simulate reinforcement or rockmass support or changes to boundary conditions or material properties. There are a total of 21 categories of parameters, as shown in Figure 2. The preprocessor allows all of these to be specified for the first step and changed for each subsequent step. The File menu allows problem specifications to be saved and restored. Mesh generation (or importing externally generated meshes) is handled with the Mesh menu and described in more detail in [4]. The model can be visualized in many ways including subzones, mesh, materials, zone number, node number and element number through the commands in the Display menu. The colors of background, mesh and material could be C ontrol param e

^D

: rn

Node coordinates^. Rock property data

M esiLdata

.Element component

/ F l u i d property data r> Initial pore press u r e V . _ _ ^ /

al stress

( 9 J)influ x

data

°32)

Initial saturatic

C 1 1 J) Routine

^

( j j j ) Fixi ty

boundary

Pressure b o u n d a r y

Free b o u n d a r y

Cumuflow b o u n d a r y l

Fluid b o u n d a r y

Velocity b o u n d a r y

[16 Figure 2. Input datafilespossibly called by each cycle-step file.

-"Histor y record (Excavation data

(S

NC ycle param eter O ther output

CD

y

211

changed through the commands in the Colour menu. The user could check the model by searching for information on nodes, elements, specific material types, boundary conditions, excavations, etc. through the commands in the Search menu. The commands in the Export menu are used to export the drawing to the Clipboard or generate the necessary data files for COSFLOW analyses. The data in each of the categories in Figure 2 for each step is translated to COSFLOW commands and placed in up to 21 individual files for each step. The export command also creates a single master file, which lists the step master files in sequence. Each step master file lists up to 21 files to be processed in the step, as shown in Figure 2. 4

The post processor

Once all the input data files are generated, COSFLOW may be run through the command in the RUN menu yielding the computational results. There are two types of output from COSFLOW: history data, which lists the history of specified quantities and snap-shot data, which gives a particular type of data (e.g. displacement, pore pressure) for all nodes or elements in the mesh. History variables are displayed as graphs. All the snap-shot results can be shown by the post processor including displacement, stress, failure, velocity, pore pressure, saturation, flow vector, permeability, permeability change, gas concentration, historical results and water/gas flow rate. The commands in the View menu allow the user to specify the type of display (deformed mesh vector or/and color contour), the result type and which set of results to show. The commands in the Range menu specify the range of the model or the section to be shown. 5

An application example

Using this pre/post processor, an example is carried out to simulate the methane emission during the roadway excavation in an underground coal mine typical of central Queensland, Australia. The roadway is located at a

Figure 3. Evolution of coal seam gas content following roadway excavation.

212

depth of 400m and the 2D model is truncated 40m above the coal seam, 20m below and 40m to the left and right of the 5.2m roadway. All the input data are prepared using the preprocessor. Figure 3 shows the predicted gas content in the coal seam (reduced from its initial value of 10) around the roadway 1, 5 and 10 days after the excavation of the roadway. It can be seen that the gas content in the coal seam decreases with the time due to the gas emission to the roadway, which is consistent with the observed behaviour. 6

Discussion and conclusions

A pre/post processor was developed using VC++, MFC, and OpenGL to prepare data for and present the computational results from the COSFLOW. An example that considers the methane emission due to roadway excavation in a coal mine is presented, showing that it is functional and user-friendly. As a part of the COSFLOW package, the current version of the pre/post processor will be upgraded to be compatible with the further development of the COSFLOW code. Further improvement of the prepost processor will be undertaken, as more cases will be carried out. 7

Acknowledgements

The authors would like to thank NEDO, JCOAL and CSIRO for providing the funds for conducting this research work. The authors also wish to express their thanks to Dr Baotang Shen and Mr Brett Poulsen for their comments on the paper. References 1. Guo H., Ishihara N., Fujioka M. Adhikary D. and Mallett C , Integrated Simulation of Deep Coal Seam Mining-Optimisation of Mining and Gas Management. Australia-Japan Technology Exchange Workshop in Coal Mining 2001, December, 2001. 2. Jon B. and Tim T., Practical Visual C++® 6. QUE, 1999, pp832. 3. Ron F., OpenGL™ Programming for Windows® 95 and Windows NT™. Addison Wesley, 2000, pp259. 4. Chen S.G., Craig S., Adhikary D.P. and Guo H., A study of threedimensional mesh generation for coal mining modelling. Proceedings of IC-SEC 2002, Singapore, December 2002.

EVALUATION OF TURBULENCE MODELS FOR HYDROFOIL TURBULENT BOUNDARY LAYER FLOW AT HIGH REYNOLDS NUMBERS N. MULVANY AND J.Y. TU School of Aerospace, Mechanical & Manufacturing Engineering RMIT University, PO Box 71 Bundoora VIC 3083, Australia E-mail: Jiyuan. Tu @ rmit. edu. au L. CHEN AND B. ANDERSON Maritime Platforms Division, AMRL Defence Science and Technology Organisation, PO Box 4331, Melbourne VIC 3001,

Australia

Evaluation is presented on the capability of the turbulence models available in the commercial CFD code FLUENT version 6, for their application to lifting surface turbulent separated shear flows at high Reynolds numbers. Four popularly used two-equation turbulence models are assessed by comparison with the experimental data at the Reynolds numbers of 8.28xl06 and 1.66xl07 in this study: the Standard k-e model, the Realizable k-e model, the k-co model and the shear-StressTransport (SST) k-u) model. It was concluded that the Realizable k-e turbulence model consistently showed superior performance both in the prediction of the boundary layer velocity profile as well as at the trailing edge in this flow application.

1

Introduction

The separated turbulent boundary layer flows near the trailing edge of hydrofoil surfaces is considered as one of the main sources of hydroacoustic noise. These flows can typically exceed Reynolds numbers of 10 , with flows associated with high-speed vessels reaching between 10 and 1010. The resulting turbulent flows at solid boundaries impact not only on the usual lifting surface parameters of lift, drag and pressure loss, but also hydroacoustic noise and vibration. Despite the intensive research effort of the past century, a defined numerical solution to the problem of turbulence remains elusive, particularly for flows at high Reynolds numbers. On the other hand, commercial CFD codes, such as FLUENT and CFX, are widely used by research institutes, universities and industries for a variety of R&D activities and industrial designs. The question remains whether the various turbulence models built in these CFD codes are applicable for the specific problems under investigation or which one is the best for the problem such as the high Reynolds number turbulent flows around hydrofoils which are particularly of interest to the defence. Little work of addressing this kind of question has been reported in literature. This study is to assess the capability of the turbulence models available in the commercial CFD code FLUENT version 6, for their application to lifting surface turbulent separated shear flows by comparison with the latest experimental data in the literature [1]. 213

214

2

Turbulence Models in FLUENT 6

Two-equation turbulent models may be regarded as the minimum level of closure offering any scope for generally in complex flow conditions involving complex strain and separation. It is based on the fact that both averaged and stochastic scales to which the turbulent viscosity is related are determined from the transport equations which account for history effects and which relate the scales to local conditions as they evolve in the computational solution. The most conventional transport equations are for turbulent kinetic energy (k) and the rate of dissipation (e), but there are some other alternatives eg a turbulent vorticity 0) = elk . The Standard ke, Realizable k-e, Standard k-co and the Shear-Stress-Transport (SST) k-co two-equation models for turbulence are briefly discussed below. The further details regarding these turbulence models can be found from the reference [2]. 2.1 The Standard k-e Turbulence Model The Standard k-e turbulence model is the most widely used two-equation turbulence model in practical engineering flows because it is robust, economic (i.e. not too CPU intensive) and provides reasonable accuracy for a wide range of flows. The transport equation for k is physically correct, however the transport equation for e is heavily modelled. The serious disadvantages of this model are that it poorly predicts near stagnation points and in adverse pressure gradients. 2.2 The Realizable k-e Turbulence Model The Realizable k-e turbulence model differs from the Standard k-e model in two important ways. Firstly it contains a new formulation for the turbulent viscosity, and secondly, a new transport equation for e has been derived from an exact equation for the transport of the mean-square vorticity function. The Realizable k-e model is expected to provide superior performance for flows involving boundary layers under strong adverse pressure gradients, separation and recirculation. 2.3 The Standard k-co Turbulence Model The Standard k-co turbulence model incorporates modifications for lowReynolds number effects and shear flow spreading. The shear flow spreading rates are in close agreement with measurements for far wakes, mixing layers, and plane, round, and radial jets, and is thus applicable to wall-bounded flows and free shear flows.

215

2.4 The Shear-Stress-Transport (SST) k-co Turbulence Model The SST k-co turbulence model is a conglomeration of the robust and accurate formulation of the Standard k-co model in the near-wall region, while the Standard k-e in the far field. The SST k-co is more accurate and reliable for a wider class of flows, including adverse pressure gradient flows, than the Standard k-co. 3

Hydrofoil Experiment

As mentioned previously the numerical study herein is compared with experimental data reported in [1]. The experiment was conducted at the Large Cavitation Tunnel in Memphis, USA. The test model was a foil of dimension 2.134m chord (C), 0.171m maximum thickness (t) with span of 3.05m. Measurements were made at several locations about the foil and flow speeds covering the range of Reynolds numbers from 8.28xl06 to 1.66x10 using laser anemometry and pressure transducers. 4

Results and Discussion

4.1 Grid Independence Analysis Grid independence was assessed by comparison with experimental data of the hydrofoil's co-efficient of pressure distribution, the coefficients of lift and drag, and the resolution of the pressure surface boundary layer. Four meshes (Comprising total cells: 41661, 97856, 207230, and 242330 respectively) were constructed, for the purpose of grid independence analysis, using the same computational domain and discretization method. The characteristics of the above-mentioned distributions of the second mesh are well comparable to the above refined meshes. Thus, the above second mesh has been used for evaluation of various turbulence models. 4.2 Boundary Layer Prediction Figure 1 illustrates the numerical and experimental pressure surface boundary layer profiles at 93%C (inverted for clarity), at a free-stream velocity of 3m/s. Each turbulence model predicted a differing velocity profile with varying correlation. Both the SST k-co and the Realizable k-8 turbulence models correlated well with the experimental velocity profile, however, the performance of the SST k-co model degraded with increasing distance from the pressure surface. The velocity profiles produced by the Standard k-s and Standard k-co turbulence models largely under-predicted

216

the experimental data. It is also shown that the Realizable k-8 model produced the best response of the boundary layer to the adverse pressure gradient. si <

:

i

t

i

——— Realizable k-e

0.2

i

0.3

0.4

0.5

,11

'-•

1 wrf i

Experimental Theoretical

0.1

?

rtt i , i

I

SSTk-w

0

A I ,1 I

! J 1

~ " Standard k-w

•

i \

&-+••.

0.6 U/Uref

0.7

O.B

0.9

1

1.1

1.2

Figure 1. Pressure surface boundary layer normalised mean velocity profile at 93%C, at Uoo= 3m/s.

Figure 2 illustrates the normalised streamwise mean velocity within the trailing edge region at the tip of the hydrofoil (i.e. 100%C). It is shown that neither of these models, the Standard k-s, Standard k-oo and SST k-co turbulence models predicted the steep velocity gradients that occurred within the free-shear layer above the hydrofoil's trailing edge surface. However, the Realizable k-e turbulence model showed very good performance predicting the magnitude of the flow reversal, the steep velocity gradient within the free-shear layer above the trailing edge surface, and the pressure surface boundary layer at its separation point. 0.04-

:

0.03 0.02 0;01 -

-mja^;;~*h'rf:L

espsSSSTj „

>

0)1

— -0.02 -0.03

0;2

0i3

oL

0.5

0I6

Standard k-e Realizable k-e

: -

~ ' - Standard k-w SSTk-w

-0.04-

•

Experimental

;

Figure 2. Normalised streamwise mean velocity at 100%C, at U,»=3m/s.

References 1. Bourgoyne, D.A., Ceccio, S.L., Dowling, D.R., Jessup, S., Park, J., Brewer, W., and Pankajakshan, R. 2000, "Hydrofoil turbulent boundary layer separation at high Reynolds numbers", 23rd Symposium on Naval Hydrodynamics, Val de Reuil, France. 2. Fluent Inc. 2001, "FLUENT 6 User's Guide", NH, USA.

M e s h Adaptation for T i m e - A c c u r a t e Viscous Compressible Fluid Flow O. Hassan, K. A. Sorensen, K. Morgan and N. P. Weatherill Civil & Computational Engineering Centre, University of Wales Swansea, Swansea SA2 8PP, Wales, U.K. E-mail: [email protected] An adaptive finite element method for the simulation of transient 3D compressible viscous flow is presented. The method is based upon an arbitrary Lagrangian-Eulerian formulation implemented on unstructured hybrid meshes. The addition of a mesh adaptivity procedure allows for the simulation of transient problems involving moving boundaries. For large movements, local remeshing is applied. The solution is advanced in time implicitly employing multigrid acceleration where the coarse meshes are automatically generated by an agglomeration procedure. Two examples are presented to illustrate the improvements in computational efficiency that can be achieved using the proposed method.

1

Introduction

Fluid flow problems involve moving or changing geometries. Examples of such problems are flutter, manoeuvering and store release. In this paper, we present an approach for the simulation of such problems. Spatial discretisation of the Navier stokes equation is achieved by the adoption of an unstructured edge based finite volume method. In addition, the edge base data structure allows the use of hybrid meshes of tetrahedra, pyramids, prisms and hexahedra. The restrictions imposed by explicit time-stepping to guarantee time accuracy, have been entirely eliminated by formulating the discrete equations in an implicit fashion. To avoid the requirement for working with large matrices, a multigrid scheme with Runge-Kutta relaxation and local timestepping is employed. The coarse meshes required for the multigrid scheme are generated in an automatic way by agglomeration. There are a number of different approaches that can be employed for handling the meshing issues that arise in the simulation of problems involving moving geometries. The method utilised in this work can be applied to problems involving both small and large movement of the geometry. The mesh is first moved, retaining the connectivities, as dictated by the geometry deflection. A mesh quality indicator is applied to decide if there are regions of bad mesh quality. Such regions are then removed, creating holes which are remeshed using the unstructured mesh generator. The practical examples which are included show that the method is robust and applicable to geometries of a complication level experienced in industry. 2

Governing Equations

For modelling turbulent flows, the governing equations are taken to be the time dependent, Favre averaged 1 , compressible Navier-Stokes equations. Here it is assumed that the time averaging suppresses the instantaneous fluctuations in the flow field caused by turbulence, while still being able to capture time dependency in the time scales of interest. For problems in which the control volumes may change

217

218 with time, the resulting equations can be expressed in integral form, on a threedimensional Cartesian domain H(£), with closed surface d£l(t), as -^ /

Udx+

(Fj-vjl/)njdx=

f

G j n j da;

/

(1)

for i,j = 1,2,3 and v = {vi,V2,vs) is the velocity of the control volume boundary. In this equation, U denotes the averaged vector of the conserved variables and F3 and G3 are the averaged inviscid and viscous flux components in direction Xj respectively. The equation set is closed by assuming the fluid to be calorically perfect and the turbulent viscosity is determined by the Spalart-Allmaras model 2 . 3

Numerical Solution

To produce a solution algorithm for equation (1), the computational domain is discretised into an unstructured assembly of of tetrahedra, pyramids, prisms and hexahedra in 3D 5 . On the resulting mesh, a discrete form is achieved using a finitevolume method, in which the unknowns are located at the vertices of the mesh and the numerical integration is performed over dual mesh interfaces 9 . When an edge based representation of the mesh is employed, the numerical integration of the inviscid fluxes, over the dual mesh segment associated with an edge, is performed by assuming the flux to be constant over the segment and equal to its approximated value at the midpoint of the edge. For a general node / , this integral is then evaluated as

Jan

F*nj dx^j^^1 '

(Fi+FJ)

+E

JeA,

D F3

u i

(2)

JeAf

where A/ denotes the set of nodes connected to node / by an edge and Af denotes the set of nodes connected to node I by an edge on the computational boundary and C\j and D3JJ are the edge weight coefficients in the j direction. The second term on the right hand side thus only contributes if / is a boundary node. Boundary terms are treated, in the classical finite volume manner, by using a local midpoint rule. The viscous terms are approximated by

Jac

GS- Ax « Yl CijGu + E J£A,

D GJ

u i

W

j€Af

For a simplex mesh, these procedures produce a discrete equation that is equivalent to a mass-lumped linear finite element Galerkin scheme at interior nodes of the mesh. Stabilisation and discontinuity capturing are achieved by replacing the physical convective flux function over each edge by a consistent numerical flux function of the JST type 6 . For stabilisation, the fourth-order diffusion operator is constructed in a form that preserves a linear field 3 , while discontinuity capturing is realised by the addition of a pressure-switched second order diffusion 6 .

219 The spatially discretised equation at node / on the mesh is of the form ^

(VTUJ)

= Ri

(4)

where Vj is the control volume associated with node I and Ri denotes the assembled edge contributions to this node. A three level second order accurate time discretisation is employed. However, a first order backward time discretisation scheme is used for simulations which require the use of remeshing. At every time step, the discretisation procedure results in a numerical solution of an equation system and this solution is obtained by the FAS multigrid scheme 8 . Assuming nested meshes, the restriction operator that maps the coarse mesh quantities to the fine mesh is constructed as a linear restriction mapping. For prolongation, a simple point injection scheme is adopted. The relaxation scheme that is employed in conjunction with the multigrid procedure is a three-stage Runge-Kutta approach with local time stepping. The edge-based formulation provides a method for merging control volumes at a given grid level, thus automatically creating a nested coarser mesh for use in the multigrid procedure 4 . An important property of equation (1) which should be maintained for the numerical scheme, is that of geometric conservation, which ensures that the controlvolume movement itself has no direct effect on the fluxes. Here, the approach adopted by Nkonga and Guillard u has been extended to handle dual meshes constructed from hybrid meshes Parallelisation of the solution procedure is necessary to make best use of multiprocessor computer platforms. For parallelisation, the mesh is created sequentially and then split into the desired number of computational domains using the METIS library 12 . The edges of the global mesh are placed in the domain of the nodes with the smallest domain number. On the inter-domain boundaries, the nodes are duplicated so that each domain stores a copy of the nodes appearing in the edge list. A global parallelisation approach has been followed, in which agglomeration is allowed across domain boundaries. While requiring communication to be performed in the intergrid mappings, this approach yields an implementation equivalent to a sequential code. 4

M e s h Adaptivity

For the simulation of a problem involving moving boundaries, the mesh must deform to take account of the movement. This is usually achieved by fixing the mesh on the far field boundary, while moving the mesh nodes on the geometry in accordance with the movement. A spring analogy approach is then used to move the interior mesh nodes to achieve the desired mesh quality. With this approach, it is also possible to obtain a certain level of control over the mesh movement, by varying the spring constants of the edges. For viscous problems, the first 50% of the layers are treated similarly to the solid boundary, hence, no changes in the mesh quality will take place for these layers. However, it is impossible to avoid remeshing if the necessary mesh quality is to be maintained. The regions in which mesh movement is inappropriate are, usually,

220

relatively small, however, and this is utilised by applying local remeshing only. The regions that need remeshing are found by using a mesh quality indicator. According to the values produced by this indicator, certain elements are removed, creating one or several holes in the mesh. Each of these holes is meshed in the normal manner by using a hybrid Delaunay scheme, with the triangulation of the hole surface being taken as the surface triangulation 10 . In certain circumstances, it may not be possible to recover the hole surface triangulation and, in this case, another layer of elements is removed from the original mesh and the process is repeated. The unknown field is transferred from the previous mesh level by linear interpolation. 5

Numerical Examples

The first example considers the turbulent transonic flow around a NACA-64A010 aerofoil. The Mach number is 0.796 and the Reynolds number is 1.3107. The initial angle of attack is zero degrees and the aerofoil is prescribed a sinusoidal movement with amplitude of 1.01 degrees and Strouhal number of 15.567. Experimental results are available for this testcase 7 . The caculation was performed by stacking a two dimensional hybrid mesh five times, creating hexahedra and prisms. The resulting mesh consists of 173, 720 nodes, 54, 784 hexahedra and 166,920 prisms. 32 physical time steps per cycle were employed and 300 multigrid cycles was performed for every physical time step. Figure 1 shows a comparison between the computed lift polar and that of the experiments. Also shown is a comparison of the pressure coefficients on the aerofoil surface at different phase angle. The calculation was performed using 16 CPU's for the solver and one processor for the mesh movement and preprocessor stages. One complete cycle required about five hours of wall clock time. The estimated speedup using multigrid accelerated time stepping compared to explicit time stepping is about 175 for this case.

Figure 1: Turbulent flow over an oscillating NACA64A010, comparison between experimental and c o m p u t e d lift polar and surface pressure distribution at 90 degree phase angle

The last example considers turbulent flow over the separation of a generic shuttle vehicle and a rocket booster. An initial steady state simulation is performed, at a free stream Mach number of 0.9 and at zero degrees angle of attack. The mesh employed for this portion of the simulation consists of 2.9 million elements. The two bodies begin to separate, according to a prescribed translation in the symmetry plane, together with a prescribed rotation of the shuttle about a point on its tail. The computation proceeds until the shuttle rotates through 15 degrees. This

221 required 20 physical timestep and 20 local remeshing steps. The mesh at the final stage consists of 3.3 million elements. A view the computed distribution of the contours of pressure, at end of the first and last physical time step, are shown in

Figure 2.

iP* ^

Figure 2: Surface pressure distribution over the separation of a generic shuttle vehicle and a rocket booster

6

Conclusions

A adaptive method for solving compressible aerodynamic flow problems involving moving geometries on hybrid unstructured meshes has been described. A parallelised multigrid implicit time stepping approach is adopted which, when coupled with a mesh-movement/local remeshing capability, provides an accurate and robust technique for solving practical aerodynamic problems. References 1. A. Favre, "Equation des gaz turbulents compressibles", Journal de Mechanique, 4, 361-390, 1965. 2. P. R. Spalart and S. R. Allmaras, "A one equation turbulent model for aerodynamic flows", AIAA Paper 92-0439, 1992. 3. P. I. Crumpton, P. Moinier and M. B. Giles, "An unstructured algorithm for high Reynolds number flows on highly-stretched grids", in Proceedings of the 10th International Conference on Numerical Methods for Laminar and Turbulent Flow, 1997. 4. M. H. Lallemand, H. Steve and A. Dervieux, "Unstructured multigridding by volume agglomeration: current status", Computers & Fluids 2 1 , 397-433, 1992. 5. O. Hassan, K. Morgan, E. J. Probert and J. Peraire, Unstructured tetrahedral mesh generation for three-dimensional viscous flows, International Journal for Numerical Methods in Engineering, 39, 549-567, 1996 6. A. Jameson, W. Schmidt and E. Turkel, "Numerical simulation of the Euler equations by finite volume methods using Runge-Kutta time stepping schemes", AIAA Paper 81-1259, 1981.

222

7. S. S. Davies, "NACA64010 Oscillatory Pitching", AGARD Report R-702, 1982. 8. A. Brandt, "Multi-level adaptive solutions to boundary value problems", Mathematics of Computation 2 1 , 333-390, 1977. 9. K. A. S0rensen, "A multigrid accelerated procedure for the solution of compressible fluid flows on unstructured hybrid meshes", PhD Thesis, C/Ph/251/01, University of Wales Swansea, 2002. 10. O. Hassan, E. J. Probert, K. Morgan and N. P. Weatherill, "Unsteady flow simulation using unstructured meshes", Computer Methods for Applied Mechanics and Engineering, 189, 1247-1275, 2000. 11. B. Nkonga and H. Guillard, "Godunov type method on non-structured meshes for three-dimensional moving boundary problems", Computational Methods in Applied Mechanical Engineering, 113, 183-204, 1994. 12. G. Karypis and V. Kumar, "Multilevel k-way partitioning scheme for irregular grids", Journal of Parallel and Distributed Computing, 48, 96-129, 1998.

A NUMERICAL STUDY ON BUBBLE STRUCTURE INTERACTION IN UNDERWATER EXPLOSIONS HUNG K.C, WANG C, KLASEBOER E. AND WANG C.W. Institute of High Performance Computing, 1 Science Park Road #01-01, The Capricorn, Singapore 117528 E-mail: [email protected] KHOO B.C. Department of Mechanical Engineering, National University of Singapore 10 Kent Ridge Crescent, Singapore 119260 or Singapore - MIT A lliance, 4 Engineering Drive 3, Singapore 117576 E-mail: [email protected] A numerical study of the effects of an underwater explosion on a nearby structure is presented here. Not only the dynamics of the resulting explosion bubble with jet formation is studied, but also the deformation of a solid plate due to the pressure caused by the explosion is included. The 'wet' part, consisting of the water and bubble is calculated using an in-house developed boundary integral method. The advantage of a boundary integral method is that only the boundaries need to be meshed, reducing the degree of the problem by one. The 'structural' part is done by the commercial code Pamcrash, which can calculate large non linear deformations of the structure. A linkage has recently been developed between the two codes; the bubble code passes on pressure on the structure to the structural code. In return, the structural code gives back displacements and velocities of the structure to the bubble code. This paper shows the latest results concerning this coupling: growth, collapse, jet formation and toroidal shape of the bubble have all been simulated as well as the deformation of a submerged structure and the attraction of the bubble towards the structure.

1

Introduction

The past few decades have seen a growing interest in the non-spherical behaviour of oscillating bubbles. For example, in underwater explosions, the combustion of the explosive products will result in a big expanding bubble. As it grows it will generally stay more or less spherical. Due to inertia it will over-expand and collapse again. It has become clear now that bubbles oscillating near surfaces can develop jets in the collapse phase. If the surface is a solid structure (submarine, ship or ocean bottom) the jet will be directed towards the solid structure. If the surface is a free interface (the water-air interface of the ocean) the jet will be directed away from this structure. Numerous studies have been done in the past concerning bubbles near a solid rigid structure. With the coupling of our bubble code to Pamcrash (previously used to simulate car crashes) a new window of opportunities has been opened. From now on it is possible to determine the deformation on a nearby structure due to the explosion. Shock waves are not included in our model. Pressures, due to shock waves can be very high, however, the duration of the shock waves is very short compared to the duration of the bubble pressures. Therefore, the neglect of shock waves seems to be a reasonable first approximation. In the present paper, some examples of the capabilities of the new coupled PamcrashBubble code are explored.

223

224 2

Modeling

The flow is assumed to be incompressible and inviscid. It satisfies the Laplace equation V 2 O=0, the velocity vector can be derived from the velocity potential as: y=V. If Green's theorem is applied the following equation can be obtained.

dG(y,x) 0(x)—f^dS=.

f

cp(y)®(y)+

j Bubble+Solid

r

j

d<&(x) G(y,x)—±>-dS

1)

Bubble+Solid

This equation is the basis for the Boundary Integral Method (BIM). The above equation states that the potential can be expressed as an integral over the boundaries of the problem only. C(p) is the solid angle viewed from a point at y, n is the normal vector, d/dn is the normal derivative at the boundary and G is Green's function defined as: G(y,x)=r,

if

2)

If the potential is known on the interface, the normal velocity 3 0 / 9 n can be calculated from Equation 1. The bubble is assumed to behave in an adiabatic manner:

(y V Pb =P b.O

' 0

3)

The constants V0 and p b 0 can be obtained from empirical relationships for TNT. The potemial is updated at the bubble interface by using: £>0 1 - ,2

P-^r

= Px-Pb+^P\Y\

+ Pgz

4)

If y is situated at some collocation points on the surface a matrix equation can be obtained that can be solved, p is the density of water, p,*, is the reference pressure at the depth of the explosion, g the gravity acceleration pointing in the z-direction. D/Dt is the material derivative. With Equation 4) the potential on the bubble for the next time step can be determined. The normal velocity on the solid is given by Pamcrash. The bubble and the solid are discretized using triangular elements. A system of equations can now be set up using Equation 1). The unknowns are the normal velocity on the bubble and the potential on the solid. The velocity vectors on the nodes of the bubble can be solved, since we know the normal velocity and the potentials on neighboring nodes. For a more complete description of the Bubble code see Zhang et al. [1] and Wang [2]. The position vectors on the bubble are then updated every time step with: Dx/Dt=VO. With Equation 4) the pressure on the solid is calculated in every node. Once Pamcrash is finished it passes the updated co-ordinates of the solid nodes and normal velocities to the Bubble code. This is passed on to the Pamcrash code. With a leapfrogging time stepping scheme the calculations are performed by Pamcrash and the Bubble code alternately. Stresses, displacements etc, are now available for analysis.

3

Results and Discussion

A typical example is shown in Figure 1. A charge is placed at a certain distance from a square resilient plate in such a way that the maximum bubble radius will be about 1.5 times this distance. The plate is fixed on the outer boundaries.

225

Figure 1. A typical test configuration, a bubble initiated near a initially flat plate

/' tHr*

* *

<-

'

--X----*3

Figure 2. Typical results obtained for the configuration of Figure 1. In Figure 2 some typical results are shown. The first image of Figure 2 shows the initial condition. Very rapidly the bubble starts to grow. The plate deform due to the high pressure exerted on it by the bubble. In the second image the bubble has reached its

226

maximum volume. The deformation is now even larger. The pressure in the bubble is now very low (the over expansion due to inertia causes a very low pressure, much lower than the initial reference pressure). Therefore the plate will gradually move upwards again (the pressure below the plate is now higher than above the plate). The bubble starts shrinking again in the third image, the top side of the bubble can be seen to flatten. This top side is now moving very rapidly towards the plate, forming a high-speed jet. This jet can very clearly be seen in the last image of Figure 2. A toroidal bubble will now be formed (not shown here), with the associated second high pressure peak. The plate will show large downwards deflections again. Other simulations have been done, including nonaxisymmetrical configurations. For example the case of an explosion bubble near a vertical plate in a gravity field. Very interesting phenomena can then be seen: depending on the parameters the jet can completely miss the plate.

Figure 3. Explosion next to a circular plate.

Another example is shown in Figure 3, where an explosion near an initially flat circular plate is shown. The first image shows the initial conditions. The second image shows the maximum bubble size, the plate has a large deformation. In the last image the bubble is shrinking again, however, the deformation of the plate is still large. In this case the plate is not moving back as it did in Figure 2.

4

Conclusions

The succesful coupling of a bubble code and a structural code has enabled us to simulate the effects of an explosion on a nearby structure. This includes impacting jets, toroidal bubbles and plate deflections and stresses. Developments planned in the near future are the testing of the code with experimental data and the application to other engineering area's (for example the influence of ultrasound-induced bubbles on biological tissue in medical applications or cavitation bubbles in ship propellors).

References 1. 2.

Zhang YL, Yeo KS, Khoo BC, Wang, C. 3D jet impact and toroidal bubbles. Journal of Computational Physics 166, pp. 336-360, (2001). Wang QX, The evolution of a gas bubble near an inclined wall, Theoret. Comput. Fluid Dynamics 12, pp. 29-51, (1998).

DELAY COMPUTATION OF LARGE INTERCONNECT NETWORK YI CAO AND ENG CHONG TAN School of Computer Engineering, Nanyang Technological University, Nanyang Avenue, Singapore 639798, Singapore E-mail: [email protected] Poles and zeros are important parameters to characterize a system. However, for a large interconnect network, computations of poles and zeros are extremely time consuming. In this paper, we will compare the time and accuracy performances of an exact method versus two approximate fast methods in evaluating the delay of an RLC circuit.

1

Introduction

An integrated electronic network generally contains large numbers of lumped resistances, capacitances and inductances as well as distributed elements such as transmission lines. The timing analysis of the network plays a vital role in tuning the performance of the system. To perform delay estimation of an electronic circuit, its transfer function, or its poles and zeros, must be derived. While it is desirable to obtain the exact numbers and values of the poles and zeros of a given circuit, it is generally impractical for a very highorder interconnect circuit because the execution time can be prohibitively large. There exist some efficient algorithms which attempt to approximate the original system behavior with a lower order model. This paper investigates the computational complexity of two of those methods against an exact method. The two methods are the asymptotic waveform evaluation (AWE) technique [1, 2] and the Pade approximation [5]. The exact method is based on wave-chain matrix multiplications [4].

(a)

(b)

Figure 1. The network under study: (a) an odd-order circuit, (b) an even-order circuit.

2

Exact Method

The RLC circuit under study is shown in Fig.l and can be considered to consist of a series of two-port cascades. Each two-port is represented by a wave-chain matrix [4], T,, and the overall wave-chain matrix is T = T,T 2 •••T,,, where n is the order of the circuit whose transfer function is given by H(s) = 1/(^2)total • The delay of the circuit, TD , is defined as the time at which the transient step response would reach 50% of its final value [3]. The step response in frequency-domain is given by F(s) = H(s)/s . Applying partial-

227

228

fraction expansion, F(s) = N(s) I D(s) = ^

_ [K,•/{s - s,)].

The roots of D(s)

and

N(s) are the poles and zeros of F(s), respectively. The constants,/^, ...Kn, are the residues of F(s). If we apply the inverse Laplace transform to F(s), the step response is obtained as / ( / ) = y

3

KjeSi' , t > 0 , from which the delay can be computed.

Approximate Method 1: AWE

The asymptotic waveform evaluation (AWE) technique seeks to capture the actual circuit behavior by approximating the dominant poles of the circuit with a reduced order representation which is matched to the moments of the original circuit. The AWE consists of two main parts: moment generation and moment matching. It approximates the transient response of an RLC circuit by first expanding the system equations in moments (coefficients) of a Taylor expansion about s = 0 (Maclaurin series) in the frequency domain, and then matching the first 2q moments of the series, as well as the initial time conditions, to a low-order ^-pole model.

4

Approximate Method 2: Pade Approximation

A [L, M] Pade approximation [5] to H(s)

Z

g,s' /y

is expressed as H(s) = PL(s)IQM(s)

h :Sj . The coefficients of A (s) and QM(s)

=

are obtained uniquely

from the coefficients of the Maclaurin series expansion of H{s). L + M moments are required to match L + M unknowns in order to solve hj . The p • can thus be obtained from the roots of QM{s).

For a higher order network, many possible choices of L and M

exist. Our objective is to select an appropriate set of L and M which can make [L/M] as close as possible to the actual function as possible. Pade approximation attempts to extract pole and residue information from the moments. It extracts far fewer approximate poles than are actually in the network. The nature of Pade approximation is that they may produce both accurate and inaccurate poles in the same approximation. Pade approximation does not give good results when there exist poles farther away and of larger strength than those closer to the origin. Some poles which are far away from the expansion point are not accurate. We can get many sets of approximate poles from different values of L and M. Therefore, we need some regulations to choose appropriate L and M in the Pade table [5] to get a series of approximate poles which make the approximate transfer function more accurate, and remove those inaccurate poles.

5

Numerical Examples

The parameters (Fig. 1) are chosen based on Butterworth low-pass filtering [4]. The number of poles is equal to the order of the filter, n. All algorithms were implemented using MATLAB [6]. Table 1 shows the results using the exact method, i.e. the transfer

229 functions are computed from the wave-chain method. Tables 2 and 3 show the results using the AWE technique and the Pade approximation, respectively; both approaches attempted to approximate the delay values by selecting only certain number of poles, i.e. those dominant poles. It can be seen that the exact method used up a lot of computation time while the AWE is the most efficient with good accuracy. Table 1. Actual delays using the exact method Delay 2.13511165347s 3.49604898629s 5.49371910733s 6.81167589004s 8.12315116913s 10.08212460197s 12.03419663770s 13.33275483251s 16.57188944157s 19.80339558289s 23.02955150212s 26.24746784440s 29.46594669559s 32.68868940305s 35.90016705448s 39.11144125192s

n (Actua!1 pole: number) 3 5 8 10 12 15 18 20 25 30 35 40 45 50 55 60

Computation time 1.302s 4.757s 10.075s 15.913s 24.154s 46.135s 38.125s 55.803s lm 25.439s Im 57.720s 4m 04.939s 5m 37.429s 7m 26.572s 12m 35.736s 29m 11.807s 38m 15.751s

Table 2. Approximate delays using the AWE technique n (Actual pole number) 3 5 8 10 12 15 18 20 25 30 35 40 45 50 55 60

6

Appr. pole num. 3 5 5 5 10 10 10 10 10 10 10 10 10 10 10 10

Delay 2.13511165347s 3.49604898629s 5.60120043422s 6.71257842920s 8.13315204203s 10.09819511001s 12.18215288747s 13.46419975809s 16.52356282032s 19.56945076279s 22.63906202308s 25.72920576324s 28.83431024908s 31.95015736961s 35.07389167805s 38.20350379171s

Computation time 0.410s 0.631s 0.681s 0.671s 1.362s 1.412s 1.492s 1.563s 1.562s 1.702s 1.943s 1.956s 1.963s 1.863s 1.953s 2.143s

Analysis and Conclusion

The AWE has a number of fundamental numerical limitations. In particular, each run of AWE produces only a fairly small number of accurate poles and zeros. There is no theoretically solid procedure to predict the accuracy of the approximating reduced-order model. Since the information carried by the moments is accurate in the low frequency poles of the circuit, the AWE technique will be efficient in extracting the low frequency poles of the circuit. At relatively higher frequencies, the AWE technique becomes

230 inefficient. Pade approximation employs a rational approximant. The applicability of Pade approximation to network transfer functions is particularly appealing since these functions are in rational form, and the poles of the Pade approximant approach the poles of the system rather well. The residue strength of a typical electrical network is concentrated in a small region of the complex plane, making the approximation practical and accurate for these types of networks. Pade approximation is useful when it is more efficient to obtain the moments of a function rather than to evaluate the function in closed form. However, Pade approximation has the numerical instability problem. It generates positive poles (i.e. in the right-half plane) for passive systems, although the problem can be solved to some extent by utilizing extended precision. Table 3. Approximate delays using the Pade approximation pole number) 3 5 8 10 12 15 18 20 25 30 35 40 45 50 55 60

(L, M) (3,3) (4,5) (8,8) (9, 10) (9, 10) (9, 10) (7,8) (7.7) (8,8) (9,9) (9,9) (9,9) (9,9) (9,9) (9,9) (9,9)

Delay 2.13511165347s 3.49604898629s 5.49371910734s 6.81167589062s 8.13315204203s 10.09819511001s 12.02843443523s 13.33800669445s 16.59144477649s 19.73067320974s 22.79265701963s 25.88425415669s 28.99539831427s 32.11984544883s 35.25369I05049s 38.39440502663s

Computation time 1.773s 5.708s 15.592s 27.750s 32.497s 20.680s 24.806s 43.820s 59.085s lm 12.734s lm 35.177s lm 47.815s 2m 03.998s 2m 28.293s 2m 50.985s 3m 20.739s

References 1. Pillage L. T. and Rohrer R. A., Asymptotic waveform evaluation for timing analysis. IEEE Trans. Computer-Aided Design, 9 (1990) pp. 352-366. 2. Chiprout E. and Nakhla M S. Asymptotic Waveform Evaluation and Moment Matching for Interconnect Analysis (Kluwer Academic, Boston, 1994). 3. Siebert W. C. Circuits, Signals, and Systems (MIT Press, McGraw-Hill, New York, 1986). 4. Lam Y. F. Analog and digital filters design and realization (Prentice-Hall, New Jersey, 1979). 5. Gerald C. F. and Wheatley P. O. Applied Numerical Analysis (Addison-Wesley, Reading, M.A., 1994). 6. Fausett L. V. Applied Numerical Analysis using Matlab (Prentice-Hall, New Jersey, 1999).

M O D E L I N G O F ON-CHIP B U S E S F O R P L A C E M E N T O P T I M I Z A T I O N IN INTEGRATED CIRCUITS

O. PEYRAN Institute of High Performance Computing I Science Park Road, #01-01 The Capricorn, Singapore 117528, Singapore E-mail: olivierCdijhpc. a-star. edu. sg As semiconductor technology scales down and gate density dramatically increases, hierarchical design methodologies are becoming a necessity. In this context, chip planning, which aims at optimizing the chip level physical topology of block-based designs, is quickly being recognized in the industry as one of the critical steps of physical synthesis. Concurrently to new design methodologies, designers also tend to adopt new interconnection schemes like on-chip buses. If buses permit higher speed through larger bandwidth compared to dedicated point-to-point connections, they also complicate greatly the optimization process. In this paper, we propose a methodology to integrate buses into a chip planning optimization tool. The example of a LEON processor using AMBA buses is presented

1

Introduction

As semiconductor technology scales down and gate density dramatically increases, handling design in the traditional flat manner gets more and more difficult. Partitioning a design into a hierarchy of manageable macro blocks has become a necessity [ 1 ] [2]. The physical assembly of these interconnected blocks is nevertheless not a straightforward task. Though macro blocks are locally optimized to meet certain timing constraints, their integration bads to new long interconnection delays that may modify the timing closure reached locally. For this reason, chip planning, which aims at optimizing the chip level physical topology, is quickly being recognized in the industry as one of the critical steps of physical synthesis [3][4|. At early design stage, the system can be viewed as a set of interconnected blocks, being IP blocks or black boxes not yet synthesized. If IP blocks have a fixed shape, pin assignment, electrical specification, etc, black boxs are considered as flexible blocks: though their area can be estimated, their aspect ratio and pin assignment is not fixed. From this block-based system specification, chip planning aims at finding the best block-level topology in order to minimize criteria such as top-level interconnection delay, top-level routing congestion, chip area or chip aspect ratio. In order to optimize interconnection delay, high performance designs do not rely any longer on dedicated point-to-point interconnections. Just like in printed circuit boards, shared high-speed interconnections through buses are becoming more and more popular. If on-chip buses provide an undeniable leverage on performance, they also modify the environment of chip planning optimization. Indeed, though a bus aims at interconnecting blocks, it doesnt have the flexibility of a dedicated wire and cannot be modeled in the same way. In this paper, we propose a methodology to model buses for chip planning optimization. The remaining of the paper is organized as follows: section 2 formally defines the chip planning problem and presents our methodology for the modeling of onchip buses. A case-study is presented in section 3, based on a LEON processor using AMBA buses [5].

231

232

2 2.1

Modeling buses for chip planning Chip Pla nning

The chip planning problem can be formally defined as follows: Given a set of hard orflexible interconnected blocks, find the best shape and pin assignment for flexible blocks and the best position for all blocks in order to minimize area, chip aspect ratio, interconnection delay and routing congestion. The metric for area evaluation is the area of the smallest rectangle bounding all the blocks. The evaluation of the delay and routing congestion is done using critical-sink Steiner trees for Elmore delay minimization [6]. The critical sinks are the net sinks that belong to a critical path. A path is a succession of pairs (source, sink) called segments, such that the source and the sink of a segment belong to the same net and such that the source of a segment belongs to the same sub-block as the sink of the preceding segment. The critical paths are the paths with longest delay. Routing is not performed explicitly (detailed routing) in the chip planning phase. Instead, global routing is used that only considers the regions that are crossed by the wires, in order to evaluate congestion. Buses have different electrical properties (therefore delays) than dedicated wires. A solution to model wires could be to use different metrics for the pins attached to a bus. Unfortunately, the connection delay from the blocks to the bus also has to be considered and, more importantly, the position of the bus must be considered explicitly, since chip planning is primarily a placement problem. 2.2

Bus modeling

One can see that buses are closer to IP blocks than to wires. Bus protocols are actually IPs, the most popular one being the AMBA bus developed by ARM, the industry's leading provider of 16/32-bit embedded RISC processor technology. For this reason, our approach consists in artificially adding new blocks into the design in order to model the buses. Like any other block, the bus-blocks are defined by their fixed area and their ratio width/height. This ratio is bounded between very small values to make sure that the block will be flat. We have chosen flexible blocks instead of fixed ones because the final dimensions of the design are not known, therefore the ideal dimensions of the bus cannot be pre-defined. Moreover, the pin positions on the bus-blocks cannot be assigned prior to the chip planning optimization. The idea of a bus is that the components connected to the bus are as close as possible to it. In order to make sure that the optimization converges to this type of solutions, we artificially increased the resistanceand capacitance of the bus-block pins. Since the delay is computed based on these electrical values, this artifact amplifies the importance of busblock/block proximity against block/block proximity. Designs often integrate two types of buses: a high-performance one for highbandwidth communication between critical components (processor core, memory controller, etc) and one or several peripheral buses for general purpose peripherals. The connection is done through a bridge. Peripheral and high-performance buses are modeled using distinctive blocks. The bridge is not explicitly defined but the connections between the bus-blocks are also modeled with very large resistance so that the various buses are physically connected.

233

Our approach to chip planning optimization is not to develop specialized heuristic for a given problem/criterion. Instead, we have developed a general chip planning optimization tool based on genetic algorithm. The main advantage is that new optimization criteria can be easily added, since only the solution evaluation (fitness function) is required to drive the optimization process. Though genetic algorithm operators are specialized, their specialization is centered on the general modeling of chip planning problem and not on a particular optimization criterion. The genetic algorithm will not be detailed here. No modification was required to adapt our bus model to our chip planning tool. A simple transformation of the bus definition into blocks was sufficient. Once the optimization is performed, a post-processing phase is required to shape the bus to the exact dimensions of the final design. The next section shows the results obtained for the chip planning of a LEON processor using a high-performance bus and a peripheral bus. 3

Chip Planning forLEON processor

The LEON processor is a 32-bit processor conforming to the SPARC V8 architecture. It is designed for embedded applications with the following features on-chip: separate instruction and data caches, hardware multiplier and divider, interrupt controller, two 24bit timers, two UARTs, power-down function, watchdog, 16-bit I/O port and a flexible memory controller. Fig. 1 shows the block diagram of the processor. Two buses are used (AHB for high-performance, APB for peripherals). For our optimization, the processing core, comprising the cache (cO), integer unit (iuO), coprocessor (cpO), floating point unit (fpO) and register file
LEON SPARC V8 Integer unit

Co-prec

AHB Controller

liqCtri Memory Controller

I'Q port

9/1&'32-bits marnory bus

LI

PROM

SRAM

I

I/O

Figure 1. LEON processor block diagram.

AHBJAPB Bridge

234

The coprocessor and floating point unit were hard IPs, while all the other blocks were flexible. We ran our genetic algorithm for twelve hours. The result of the optimization is shown in Fig. 2 (after the bus resizing). As expected, the buses are physicaly connected and the components connected to the buses are kept close to them. For instance, the peripheral bus aims mostly at communicating with the hlockperiO and with the register of blocks mctrlO and aO, which are all at immediate proximity ofaperiO. The right part of Fig. 2 shows only the blocks with no interconnections, though an accurate optimized pin assignment is performed on each block. For the bus-blocks, the pins represent the bus connectors. The right part of Fig. 2 presents a close-up on the connections between periO and the peripheral bus aperiO.

Figure 2. Optimization results and details of connections between peripheral bus and periO block

4

Conclusion

We have proposed a methodology to integrate explicitly bus placement optimization into the chip planning phase. Buses are modeled as flexible IP blocks with particular constraints in order to concentrate the optimization effort on their connectivity. Results from a real design optimization, a LEON processor using AMBA buses, have been presented. References 1. Synopsys, "Chip Architect White Paper", http:// www.synopsys.com/products/designplanning/chiparch_wp.html 2. Cadence, "SoC Encounter Datasheet", http:// www.cadence.com/datash eets/soc encounter.html 3. R. Otten, "Efficient Floorplan Optimization", in proc. of ICCD83. 4. H.-M. Chen, H. Zhou, F. Y. Young, D. F. Wong, H. H. Yang and N. Sherwani, "Integratedfloorplanningand interconnect planning", in Proc. of ICCAD99. 5. "The LEON processor user's manual version 2.3.7", http://www.gaisler.com, Aug 2001. 6. Boese K.D., Kahng A.B. and Robins G. High Performance Routing Trees With Identified Critical Sinks. Proceedings of the 30th Design Automation Conference, 1993, 182-187.

SIMULATION OF NANOSCALE SINGLE-ELECTRON DEVICES AND CIRCUITS PING BAI, ER-PING L I , RAJENDRA M. PATRIKAR Institute of High Performance Computing 1 Science Park Road, #01-01,The Capricorn Science Park II, Singapore 117528 Email: [email protected], [email protected], [email protected],edu,sg This paper presents a high performance single-electron device and circuit simulation method. Monte Carlo method is used for searching the frequent state space, where only classical tunneling is considered. A crude state frame of system can be reached at very short time. Then start from the frequent states, master equation method is employed to improve the simulation results by looking into relevant rare states which is mainly caused by cotunneling. The proposed hybrid method enables to fully use the functionality of Monte Carlo and master equation to deal with both classic tunneling and cotunneling efficiently. The method has been used to simulate a typical device consists of single-electron transistors, inverters and memory cells. The initial simulation results show that the proposed method can provide accurate simulation results with high speed and it could be a potential candidate for simulation of integrated nanoscale circuit.

1

Introduction

Single-electron devices are believed to be able to replace standard MOSFET in the nanoscale regime and have attracted many researchers' attentions because of their potential for low power consumption and high circuit density. Currently Single-electron device, circuit and logic configuration are still under research. The development attempts have to be examined and each component must be optimized. Simulation is an effective approach to achieve an optimal single-electron architecture. Several simulators for single-electron tunneling circuits have been reported such as MOSES [1], SIMON [2], SENECA [3], ESS and SET-SPICE [4]. They are based on either Monte Carlo method or master equation method. Monte Carlo technique yields the dynamical behaviour, but its major drawback is the dramatic increase in computer time once in the high temperature range where the cotunneling plays significant roles. Conversely, the master equation method directly yields the equilibrium state. However, the number of equations is in principle infinite if no well-characterized approximation is used. This paper focuses on studying the fast and precise device and circuit simulation method to handle the simulations of single-electron quantum tunneling, where Monte Carlo method and master equation method are hybridized to maximize the utilization of each method's functionality. Monte Carlo method is used to deal with the classical tunneling. Master equation method starts from the frequent states to look for all relevant states caused by both classical tunneling and cotunneling. By this way, it saves time by preventing Monte Carlo method to take very long time to handle frequent and rare states at same time. It also avoids dealing with a lot of irrelevant states in the master equation method hence increase simulation speed. In this brief, basic formulation of tunneling is presented first. Then simulation method by hybridized Monte Carlo and master equation techniques is given. Section 3 gives one of simulation examples. Section 4 comes with simple conclusion and future work.

235

236 2

Formulation

Consider an arbitrary circuit consisting of conducting islands, connected to each other and to external electrodes by tunnel junctions and/or capacitance, each junction i is characterized by its capacitance Q and tunnel resistance Rj, provided /?,• >> RQ =7ih/2e2. According to the quasiclassical "orthodox" theory of single-electron tunneling, the tunneling rate at a junction i can be described as 1 2

e -R,

(1)

A£

l-exp(-AE/KBT)

Quantum tunneling through N>1 junctions at a time, or "cotunneling", is also possible though typically at a much lower rate than that of the "classical" tunneling through one junction. For a finite temperature, the rate of an Nth-order inelastic co-tunneling process is approximated as [5]:

r

"=f- ( n(^- s '- f " <4£ " r)

«

where

*=<"""Ui

M

*-i A£\ A£ * N

w

nf;'[(2^ B 7--0 2 + (A£,) 2 ]

AEN kBTJ

AEk is the change in Coulomb energy after the k'h event from initial state and AEN is the final energy difference. Even in the approximation (2), the calculation of the cotunneling rates TN is a time consuming task because the number of parameters necessary to evaluate cotunneling rate scales as N!. For fast estimates of the cotunneling rates before final exact calculation, we use the following definition [3]:

r„w = Y *(7^W)2's-'2'

FN (AEN T)

'

(3)

where

*" { e1 j

(JV-1)

C can be considered as the algorithm tuning parameter. It only takes the capacitances between the closest neighbor islands into account: C=C, 7 , +/ .

3

Algorithm

We cope with single-electron devices with a hybrid method by merging Monte Carlo and master equation methods. Monte Carlo is used to identify the frequent state space.

237

After all frequent states have been identified, the master equation method is used to produce accurate results by considering the cotunneling 3.1

Identify frequent state space

We use a Monte Carlo kernel based on the algorithm described by Wasshuber [2] to search for the frequent state space. To simulate the tunneling of electrons from island to island, one has to determine the rates of all possible tunnel events. The tunnel rate for one tunnel junction is given by (1). To obtain the frequent events quickly, cotunneling effects are ignored at this stage. Once all tunnel rates are known, the actually occurring event is determined with Monte Carlo method. We consider the tunnel events as independent and exponentially distributed processes and a concrete duration to the next tunnel event in a particular junction can be obtained AT= -InR/r. R is a random number evenly distributed in the interval [0, 1]. The event with shortest duration among all possible ones is taken as occurring event. It is treated as a frequent event. The duration for each frequent event is accumulated as total time and the duration for the same event is accumulated also as event time. After one tunnel event, free energy, island charges and island voltages generally change and a new event is calculated again based on the change. This is repeated till the change of event probability, which is decided according to event time and total time, is small enough. By this way, a crude description of the system dynamics is achieved. 3.2

Deal with rare states

Master equation method is used to calculate the probabilities P; of all possible states of the system. It can be written as [6]

^-=I(iy,-w

(4)

Where 1^ is the tunnel rate between states i and j , Pi is the occupation probability of state i. Using this approach, it is possible to handle cases when the tunneling rates of events differ drastically as in classical tunneling and cotunneling cases. We chose an iterative approach. It starts from the frequent states and searches for new states caused by different orders of cotunneling. For each of possible state, the estimated probability Pest is calculated first through estimated tunneling rate (3). Pest - i - ^ , At. At is the iteration step time. Only the state whose estimated probability Pest is greater than a threshold probability Pth(n) is considered as the new state. Pth(n) is the smallest probability of interest and is selected such that each new iteration roughly corresponds to one more order of cotunneling. Limited new state can be found in each iteration. Once the new states are identified. Their tunneling rates are calculated with more accurate formula (2) and tunneling rate matrix in master equation (4) can be filled. Then the operation of probability evolution is performed. It consists of solving the master equation (4) in the step [t^ tk +At] with initial probabilities. The iterative procedure will be stopped by one of three conditions. 1) The relative difference between the values of the probability of the state under search, which were found in two consecutive iterations is less than the accuracy set by user. 2) The current threshold Pth(n) drops below the finite threshold Pth and no state of interest is found. 3) The iteration number n exceeds some threshold.

238

4

Simulation results

In order to demonstrate the capabilities of the proposed hybrid method. Several singleelectron devices and circuits including transistors, inverters and memory cells have been simulated. Figure 1 shows the schematic of a memory cell which consists of an 8-juction trap coupling with an electrometer with a capacitor. The memory cell was modeled with capacitors and tunnel junctions. The simulation results are shown in Figure 2. During the writing cycle, the charges are transferred to the "storage node" one by one with an increase in trap gate voltage (top and middle curves in Figure 2). However, when the voltage decreases, the trapped charges do not leave the "storage node" immediately. Some charges are still trapped in the node after the voltage reduces to 0. The residual charges in the "storage node" can be detected by the current (bottom curve in Figure 2) flowing through the single-electron transistor during the reading cycle. The hysteresis of a memory can be clearly seen. It is noted that only a few minutes are needed to obtain the results.

Figure 1 Circuit diagram of an 8-jnuction memory cell

5

T ""°"" Figure 2. Simulation results of 8-juntion memory cell

Conclusions

Simulation of single-electron devices or circuits is very time consuming due to repeatedly calculation of the charge configuration of all the Coulomb islands in the circuits. In this paper, it is shown that such time-consuming process can be relieved by hybridizing the Monte Carlo and master equation techniques. In the proposed method, Monte Carlo method is used to deal with the frequent tunneling events and master equation technique starts from limited frequent states to search for other relevant state. The merits of Monte Carlo and master equation techniques are exploited ultimately. Preliminary simulation results show that it is potential to reduce the simulation time significantly to achieve a reasonable accuracy. Future research includes enhancing the initial simulation tool with transient simulation and background charge analysis functions. References 1. 2. 3. 4. 5. 6.

Alexander N, et al.,, J APPL PHYS 78(4): pp2520-2530 August 1995 Wasshuber C, et al., IEEE T COMPUT AID D 16 (9): pp937-944 SEP 1997 FONSECA LRC, et al., J APPL PHYS 78 (5): pp3238-3251 SEP 1995 Fujishima M, et al., JPN J APPL PHYS 1 37 (3B): ppl478-1482 MAR 1998 Jensen H, et al., Phys. Rev. B 46, ppl3407-13427 1992 Averin D V, et al., edited by Altshuler B., Elsevier, Amsterdam, 1991

SURFACE ROUGHNESS MODELING RAJENDRA M.PATRIKAR*, KIRUTHIKA RAMANATHAN Institute of High Performance

Computing 1 Science Park Road, #01-01, The Capricorn Park II, Singapore 117528 E-mail *: rajendra @ ihpc. a-star. edu.sg

Science

Microelectronic and molecular devices are formed on the surfaces, which are microscopically rough. To understand how the devices are formed on the rough surfaces and to model their electrical behavior surface modeling has become essential. In this work CAD tool to generate surfaces with roughness has been developed. To represent the surface we have implemented Fast Fourier Transform (FFT), Mandelbrot Weierstrass function, and backpropagation neural networks. FFT method was used because it has been used traditionally for surface modeling. We used the concept of self-similar fractals to model the rough surface (M-W function) because it has been shown that the fractal dimension (D) can quantitatively describe surface microscopic roughness and it is scale independent. We are using Neural Networks to model these surfaces to map the process parameters to roughness parameters.

1

Introduction

Surface properties have become critical in improving the electronics technology on which our modern life is built. In many of the fabrication processes used for electronic devices, molecules from environmental gas or liquid phase come in contact with a material at its surface, and chemical bonds of these approaching molecules are cut and new bonds are formed. In all the deposition methods, self-affine surface gets formed in a growth mode in which average orientation of the surface is maintained but it becomes rough. Thus most of the surfaces grown are microscopically rough although topology may appear smoother on macroscopic level. Electronic devices including modern molecular devices are either grown on such rough surfaces or they are part of the devices. Surface roughness can affect electrical properties of interconnects such as electrical capacity, electronic conductivity, surface energy, critical area, peak electric field, surface tension, sheet resistance etc. [1,2]. Understanding surface phenomenon of these atomistic films has become important from functionality and reliability point of view. It has become necessary to model surfaces to understand microscopic and macroscopic phenomena to develop molecular electronics systems. Various mathematical techniques are being tried out to represent surfaces with roughness. The techniques that are being used for this surface modelling include Fast Fourier Transform and self-similar fractals. Fractals are used recently because it helps to model spatial variation through Mandelbrot-Weierstrass function [1]. However, mapping process parameters to the surface condition is little difficult with classical modelling methods. We are using Neural Networks to model these surfaces to map the process parameters to the roughness.

2

Mandelbrot-Weierstrass function

Numerous descriptors of surface roughness such as height of protuberances, aspect ratio, shape, distribution and density of surface features are used in the literature. They are measured with parameters such as RMS values autocovariance and power spectrum.

239

240

However, these parameters are instrument dependent and do not describe the spatial information. The concept of self-similar fractals to model the rough surface has been proposed. Previous studies [1,3] show that the fractal dimension (D) can quantitatively describe surface microscopic roughness. The rough surface is therefore modeled using the Mandelbrot-Weierstrass function in this work represented by following equation.

/ ( * ) = Idb-n(2-D)[l-kcos(bnx

+ (l))] ...(1)

In this equation f(x) is the rough profile, b is the frequency multiplier value, it varies typically between 1.1 to 3.0. D is the fractal dimension, which has a non-integer value, varies between 1 to 2 and 0 is a randomly generated phase. The parameter k can be used to alter the profile. From this generated topography one can also find the RMS values and other statistical parameters. In fact by adjusting the parameter D and k any profile can be generated. The equation 1 is for two dimensions and can be extended to three dimensions by taking root mean square values of two co-ordinates. Fig.l shows the roughness profiles generated by wire mesh models for three different values of D

(a)

(b)

(c)

Figure 1. Surfaces drawn by Equn.l for different values of D; (a) D=1.3 (b) D=1.5 (c) D=1.8

3

Fast Fourier Transform (FFT) Method

The surface roughness can be simulated with fast fourier transform [4] with the function defined by equation which, represents the height at each point in xy plane. Af-1 N-l

f(x,y)= I I V^7exp k=0 1=0

( Hit

>k,i +

kx ly — +— M N J ...(2)

Where x, y are coordinate of the base plane are taken as integer multiples of units, x=0,l,2 (M-l) and y= 0,1,2 (N-l), (j)^ i is the random phase angles

241

uniformly distributed over 0 and 2n . In order to make this function real standard restrictions are applied. Sk,i is the spectral density used to generate surface and k and 1 are integers with k=0,l,2 ,M/2 and 1=0,1,2 N/2. If the auto correlation function Rrs is given then spectral density is found out and above equation is used for plotting surface. The spectral density is calculated by following equation.

M-lN-l

(

MN r=o s=o

\

1

-ilK

S

k,l ~

4

kr Is \ — +— M N

J

-(3)

Neural Networks

In this back propagation algorithm gradient descent method was used to search the weight solution during learning, and the sigmoid activation function is used to normalize the network output. The error function was calculated by first calculating roughness parameter. This parameter is user defined and could be RMS value, fractal dimension D or average roughness. Fig.2 shows the topology of the network used correct for mapping process parameters to the roughness Hidden Layers

utDut Lavers

Process Darameters

Figure 2. Topology of the network with added input layer of process parameter.

242

The standard least mean square, gradient descent learning procedure was used to update the weights [5]. The parameters such as RMS value or fractal dimension D were calculated in each epoch to calculate energy function. 5

Parasitic Parameter Calculations

Parasitic parameters of these films have been calculated. Namba's model was used to calculate the resistance of rough surface film [6]. By using this surface representation capacitance between two plates have been calculated for various values of D. Fig.3 (a) shows capacitor plates taken for calculations. The top plate is perfectly smooth and maintained at potential +V and bottom plate is rough and placed at distance d which is maintained at potential -V. For simplicity value of V was set to unity. The capacitance per unit area was obtained from Gauss' law by using finite difference method. The value of capacitance as a function of D is also shown in Fig.3.(b) Capacitance increases with roughness of the film.

1

(a)

(b)

Figure 3. (a)Capacitor plates used for simulation (b) capacitance as a function of fractal dimension D (b)

References 1. Lai, E.A. Irene, "Area evaluation of Microscopically rough surface", J. Vac. Sci. Technol. B 17, (1999) pp. 33-39. 2. Mehran, Kardar, "Roughness and ordering of growing film, Physica A, 281,(2000), pp 295-310. 3. Jiunn-Jong Wu," Characterization of Fractal surface", Wear, vol. 239, (2000) 36-47. 4. Jiunn-Jong Wu, "Simulation of rough surfaces with FFT" , Tribology International, vol. 33, (2000), pp 47-58. 5. D. E. Rumelhart et al., Parallel Distributed Processing. Cambridge, MA: MIT Press, (1986). 6. Y.Namba, Resistivity and Temperature Coefficient of Thin Metal Films with rough surface, Jpn. J. Appl. Phys. Vol.9 , (1970) 1326-1329

A NOVEL SCHEME FOR SIMULATIING QUANTUM EFFECTS IN HYDRODYNAMICS MODEL

LIU ENFENG, LI ERPING, BAI PING , HAN RUQI* Institute of High Performance Computing, Singapore 117528 Email: liuef@ ihpc. as tar. edu. sg Institute of Microelectronics, Peking University, P. R. China 100871 We proposed a novel scheme for incorporating quantum effect in classical hydrodynamic model. This scheme can be applied to multidimensional and transient condition. No additional equations are required to solve quantum potential with this scheme, so complexity of equations is largely reduced. Simulation results show consistent with that from Monte Carlo simulation. This technology provides an efficient method for investigating quantum effect in small size semiconductor devices.

1.

Introduction

Quantum effect must be taken into account in numerical simulation when the feature sizes of semiconductor devices are scaled down to sub-micron region, which has significant impact on carrier distribution at Si/Si02 interface, I-V characteristics and C-V characteristics. The accurate solutions of quantum effect must resort to many-body theory in quantum mechanics, simplifications have to be made from the point of view of engineering. Self-consistent solution of Schrodinger coupled Poisson equation attracts widely attentions recently; however, such solution is usually obtained under one dimension. The one-dimensional solution is not sufficient to model the problem as the multi-dimensional effects must be considered due to the complex structure of modern semiconductor devices. Monte Carlo simulation can also be considered as an alternative to the method mentioned above, whereas solution processes consume too expensive CPU time[11. There are also some other methods for implementing quantum effect in the conventional Hydrodynamic(HD) model for improving simulation speed; the most popular one is based on quantum moment theory which have been widely used in academic field' \ However, it is very difficult to be implemented on unstructured grids because second or higher order difference must be calculated. The alternative is to reduce order of difference by using additional equations to calculate quantum potential energy131. However, the complexity of such correction of HD equations increases. In this paper, we proposed a novel scheme for incorporating quantum effects into classical hydrodynamic model. This

243

244

scheme can be applied to multidimensional and transient condition. No additional equations are required to calculate quantum potential in contrast to other methods, so complexity of equations is dramatically reduced and convergence can be achieved with high speed and with low memory requirement This article is organized as follows. In section 2, our new scheme based on quantum potential method is illustrated and discussed. The numerical results are presented in section 3. The conclusions are drawn in section 4.

2.

Equation system for quantum HD model

HD model consists of three balance equations, which express charge, momentum and energy conservation of electron, respectively'41: ^ L _ l v . J „ = -t/ dt q Jn = -qVnnV((p

- ^

)

(1)

+ qDnVn

d(nw„) _ - w„ - wn TT v " - + V - S = £ • J„-Uw„-«— dt T wn

(2)

(3)

where W is electron mean energy, Sn is energy density of electron, Jn is the electron current density, the k„ is thermal diffusion constant, U the is net generation-recombination rate and other symbols have their usual meaning. There are three basic variables in HD equations, electron density n, electric potential (p , and electron temperature Tn Furthermore, Eq. 1 and Eq. 2 are also known as DD equations if we ignore the variation of carrier temperature in Eq. 2. There are various methods for implementing the quantum effect in classical HD equations, the most popular one is based on quantum moment theory'21:

Uq=--^V2ln(n)

(4)

where Uq is quantum potential energy. It is straightforward to calculate Uq on structure grids, but it fails on unstructured grids due to existence of high order difference. The second order differential operator can be reduced to first order by using additional equations'31. However, the complexity of such correction of HD equations increases.

245

We present a new scheme to implement quantum effect into HD equations on unstructured grids, which is also based on the quantum potential energy given by Eq. 4. Generally, carrier density is the exponential function of electric potential

^.expfi^)

(5)

where n;e is a constant and called intrinsic carrier density, (pn is quasi Fermi-level. Substitution of Eq. 5 into Eq. 4, it gives

q

kBT\2m* 2

P

Moreover, according to Poisson equation, V (p —

, Eq. 4 can be transformed into111:

£ q

qh2p kBT\2m

U)

where p is net charge density, e is permittivity. Then quantum potential (p can be obtained from Eq. 7 divided by -q:

Then quantum potential is very easy to incorporate into HD equations because the second-order difference in Eq. 4 is eliminated by Poisson equation. Two new general variables
+ Pa

nnew = nxe"<

< 9)

(10)

The ^>newand nnew replace (p and n in classical HD equations 1-3, where
246

necessary in simulation. Complexity of equations decreases by reducing order of difference. No additional equations are required to calculate the quantum potential comparing with other methods, and solution procedure can speeded up and stabilized using this method. The well-known S-G scheme is used to discrete momentum and energy conservation equations141, stable solution can be obtained using this scheme. Consequently, Newton method is applied to solve the equations for getting fast convergence.

3.

Simulation Results

In order to investigate the accuracy of above scheme, we compare our results with that of Monte Carlo(MC) simulation111. Fig. 1 shows the electrical potential with quantum correction distribution in a nin diode, whose channel length is lOnm, which clearly shows that two simulation results are in well agreement in all regions. It indicates that the scheme described previously is acceptable. Fig. 2 shows the quantum potential distribution in the nin structure. As can be seen, the quantum potential calculated by HD is little bit lower than that by using MC method. A 0.12um SOI-MOSFET is also investigated, and the quantum potential is plotted in Fig.3. The quantum potential varies very fast in the channel region, and the magnitude of it is in order of lOmv. Furthermore, it indicates that quantum not only emerge in direction vertical to channel but also in the direction parallel to channel; that means carriers in channel behave as two-dimensional quantum gas. The difference of static potential between with and without quantum correction is shown in Fig. 4. Though the magnitude of delta potential is small at source and drain regions, however, it exhibits a large spatial variation in the channel region, which demonstrates that the quantum potential can not be ignored in numerical simulations any more.

4.

Conclusion

We presented a novel scheme for implementing the quantum effect in standard simulation tools. This scheme is very easy to be realized. No additional equations are needed to calculate quantum potential, so the solution process is stable and convergence can be achieved with high speed. Further, modification of current program is easy because that the formula is very simple. We have applied the scheme to nin and SOI-MOSFET structures, and the simulated results compare well with that of MC method. It demonstrates that the new scheme is very useful in engineering field.

A CAD TOOL TO STUDY THERMAL DISTRIBUTION FOR BLOCK LEVEL PLACEMENT IN EMBEDDED SYSTEMS RAJENDRA M.PATRIKAR, MURALI. K AND LI ER PING Institute of High Performance

Computing, 1 Science Park Road, #01-01, The Capricorn Park II Singapore 117528

Email addresses: {rajendra, murali, eplee}

Science

@ihpc.a-star.edu.sg

In traditional design flows, the temperature of the chip is assumed to be uniform across the substrate. However, thermal distribution can be a major source of inaccuracy in delay and skew computations, and can have impact on elctromigration reliability for very deep submicron technology. Hence, it has become necessary to obtain design with uniform temperature distribution. The temperature estimation of the circuit is not trivial. In this paper, thermal distribution of single chip embedded system on silicon is discussed. The CAD tool developed in this work enables us to study thermal distribution in the substrate at block level placement.

1

Introduction

CAD systems have virtually replaced the classical engineer's drawing board in many industries. Libraries of standard components are the building blocks for new designs that surpass their predecessors in functionality and complexity. In the microelectronic industry, system and circuit design is now impossible without CAD tools. In traditional design flows, the temperature of the chip is assumed to be uniform across the substrate irrespective of electrical characteristics of these building blocks. However, thermal distribution can be a major source of inaccuracy in delay and skew computations, and can have impact on elctromigration reliability [1-3]. Hence, it has become necessary to obtain design with uniform temperature distribution. In this paper, thermal distribution of single chip embedded system on silicon is discussed. Reliability issues are of main focus for the embedded systems. Unlike desktop and server systems, embedded systems cannot ask for operator help when applications encounter a problem. The CAD tool developed in this work enables us to study thermal distribution in the substrate at block level placement. 2

Temperature Estimation

The operating temperature of the block in a chip can be calculated by following equation

10

xblk_size

Where Tblk is the temperature of the block , Ta is the ambient temperature, Ptotai is the total power dissipation in the block and blk_size is the area of the block. The equation indicates that if the power dissipated in the block is known then the temperature can be calculated fairly easily. However, power estimation in the CMOS circuit is not a trivial task and usually estimated based on some technological parameters and experimental values. The power dissipation in CMOS circuit has several components and they are

247

248

usually estimated based on the device parameters of technology used. The total power in the circuit is given by following equation, which essentially consist of four components [3,4]. *total =* switching +*shortcircuit"'"fstatic ' - ^ leakage

..X^J

Pswitching is the switching component of the power and it is dominating component in these calculations. p x

_ C*VAA *V switching

*~

v

dd

v

*f *ci swing

A

*-**

(11 •••V-'/

Where C is the effective capacitance being switched during the operation. Vdd is the supply voltage and Vswing corresponding to change in the voltage at the output. For CMOS circuits both of these voltages are same and f is the frequency of this operation. PSh0rtcircuit is the power dissipated due to the fact that during the circuit operation PMOS and NMOS transistors of CMOS gate become on simultaneously during the transition at the input level. This term can be calculated as follows *shortcircui ~ *SC

Vdd

..A^v

Isc is usually minimized at device level by conditioning input signal for sharper transitions and longer output transition. The optimal design usually contributes about 10% power to overall power dissipation and for near optimal design would contribute 20%. Pstatjc is the contribution due the biasing current required for the device. Pieakage is the power consumption due to the reverse bias p-n junctions in the circuit and mainly the source and drain junctions for CMOS circuits. Sub threshold currents also contribute to this power factor. The most dominating factor in these components is a switching component and since supply voltage has a quadratic contribution in this component, it is always a candidate for power reduction. However it also reduces speed and noise immunity of the circuit. Thus estimation of this parameter is very important. In general capacitance in the circuit can be decided by parasitic extraction software and also designed according to electrical specifications. Thus the activity factor becomes the most important factor in the calculation. We have calculated this parameter by reading software instructions at assembly level and usage of block for each instruction. The major advantage of this technique is that even if an error is introduced in the power calculations, relative temperature differences will be fairly accurate. For the given technology most of the parameters influencing power dissipation will be common for all the blocks except activity factor. For the reliability and performance issues relative temperature difference is very important compared to absolute one. For embedded system all the software is known and thus activity can be calculated based on instructions. The blocks in the chip becomes the sources of constant temperature heat and heat distribution is calculated thermal diffusion equation, which is described in the next section. Thermal Distribution The heat conduction in a chip substrate and interconnect area is governed by the following heat diffusion equation and the boundary condition [5,6]:

249

p cp

d T

Z

^

^

=

V.[k{x,y,z,t^T(x,y,z,t))+g{x,y,z,t) ...(5)

k(x, y, z,t)

' oVi

+ .htT(x, y, z,t)= ft (x, y, z)

....(6) where p is the density of the material (Kg/m3), Cp is the specific heat (J/(kgoC))T is the temperature(C), k is the thermal conductivity (W/mC), h; is the heat transfer coefficient of the packing component(W/m2 C), f;(z, y, z)is an arbitrary function and rjj is the outward direction normal to the surface i. The time constant heat conduction in the chip is in the range of milliseconds, which is much higher than the clock cycles which is typically in 10s nanoseconds or less. Under steady state condition, ignore the temperature dependence of the thermal conductivity, k this equation can be simplified as follows.

k.V2T + g(x,y,z,t)

=0

...(7)

This is a linear equation where as equation (5) is a nonlinear because k has nonlinear dependence on temperature. This equation can be solved by finite difference method because the geometries involved are regular. The substrate can be the discretized where each node in the 3D mesh represents a small volume of silicon A=8x8y8z. The Finite difference approximation for the spatial derivatives in equn. (7) can be applied and written

_J£J

L_ + _*i

dxlkAx

dxlAx

L + _^

dylAy

+ JL

dylAy

+

_si

i_ + ^A

dzlkAz

l

dzlkAz

- + Ag(x,y,z,t)=0 v

'

...(8) Above equation can be translated to.. (*i ~ lx\ )gi,xl + (*i ~ fx2 )Si,x2 + ki ~ fyl hi,yl

+

(fi ~ ^2 )Si,y2 + {h ~ hi )8i,z\ +

(ti-tz2)gi,z2=&g(x>y>Z>t) ...(9) Where tj is the temperature at node i„ Ax=8y8z , Ay=8xSz, Az=5x5y and gj xl = gj x2 =kA x /5x, gi>yl = gj,y2 =kA y /5x, g u l = glz2 =kA z /8x. 3

Concluding remarks

This tool is developed at block level design for SOC style implementation. W e have implemented the heat calculations and algorithm for activation factor calculations in two

250

parts. The first program reads the input file contains geometrical information about the blocks and look up table with processor instruction with their usage of blocks. The output file of this pogrom contains the temperature for each block and this file is read by another program, which calculates temperature. Figure 1 shows the temperature profiles for the chip with different configuration of blocks. Configuration in Fig. 1 (a) has a better thermal profile than in (b). At this point the optimum placement is done manually and automation is being done.

J,. Interconnect Area,

Circuit Blocks'

Interconnect-

*SWSft^

Silicon

(a)

(b)

Figure 1. Thermal distribution in the substrate and interconnect area for the given chip configurations.

4

Acknowledgements

We thank Oo See Choon for his help in software coding of some part in this work. References 1. Banerjee, K.; Mehrotra, A. "Global (interconnect) warming," IEEE Circuits and Devices Magazine , Volume: 17 Issue: 5 , Sept. 2001 Page(s): 16 -32 2. Ajami, A.H.; Banerjee, K.; Pedram, M. "Analysis of substrate thermal gradient effects on optimal buffer insertion," IEEE/ACM International Conference on Computer Aided Design, 2001. ICCAD 2001., 2001 Page(s): 44 -48 3. Ajami, A.H.; Pedram, M.; Banerjee, K. "Effects of non-uniform substrate temperature on the clock signal integrity in high performance designs," IEEE Conference on. Custom Integrated Circuits, 2001,, 2001 Page(s): 233 -236 4. Bill Moyer, "Low Power Design for Embedded Processors," Proceedings of The IEEE, vol.89, No.ll, Nov. 2001, Page(s): 1576-1587. 5. Yi-Kan Cheng; Sung-Mo Kang "A temperature-aware simulation environment for reliable ULSI chip design" IEEE Trans, on Computer-Aided Design of Integrated Circuits and Systems,, Volume: 19 Issue: 10 , Oct. 2000 Page(s): 1211 -1220 6. Ching-Han Tsai; Sung-Mo Kang "Cell-level placement for improving substrate thermal distribution," IEEE Trans. On Computer-Aided Design of Integrated Circuits and Systems, Volume: 19 Issue: 2 , Feb. 2000 Page(s):253 -266

COMPARATIVE TO FDTD, PSTD AND MRTD METHODS IN STUDIES FOR PLANAR STRATIFIED MEDIA MING-SZE TONG 1 , QUNSHENG CAO 2 , KUMAR K. TAMMA 2 , AND YILONG LU 1 School of Electrical and Electronic Engineering, Nanyang Technological S2-B2a-27, Nanyang Avenue, Singapore 639798 Email address: emstone®ntu.edu,se or [email protected]

University

2 AHPCRC, Institute of Technology, University of Minnesota Suite 101, 1100 Washington Avenue South, Minneapolis, MN 55415, USA Email address: qcao @ cs. untn. edu or ktamma @ tc. umn. edu

The Finite Difference Time Domain (FDTD) method has been diversely applied in the area of computational electromagnetics due to its robustness, versatility and simplicity for implementation. However, it requires a fine grid set in order to achieve a good accuracy. To improve the computational efficiency, two other techniques, the Pseudo-Spectral Time Domain (PSTD) and the Multiresolution Time Domain (MRTD) methods, have recently been proposed. In this study, the basic ideas and concepts of these time domain methods are introduced and discussed. They are respectively applied for studies of planar stratified media, and the obtained propagation characteristics are compared.

1

Introduction

In the area of electromagnetics, it has been a challenge to analytically solve the Maxwell's equations. Thanks to the advances in computer technology, finding a full-wave solution of the Maxwell's equations becomes achievable using numerical techniques. One of them is the well-known Finite Difference Time Domain (FDTD) method [1]. It expands the Maxwell's equations using central finite differences and is widely adopted due to its robustness, versatility and simplicity for implementation. However, to achieve a certain level of accuracy, it is required to adopt a fine set of grids for discretization, which results in a lot of computational space for storage of electromagnetic-field information. To improve the computational efficiency, other techniques have been proposed. One is called the Pseudo-Spectral Time Domain (PSTD) method [2], which applies the properties of the Fourier Transform (FT) to expand spatial derivatives in the Maxwell's equations. Another is known as the Multiresolution Time Domain (MRTD) method [3]. It utilizes a series of basis functions to expand the fields in the Maxwell's equations. In this study, the basic ideas and concepts of the above-mentioned time domain techniques are introduced and discussed. They are respectively applied for studies of planar stratified media, and the obtained propagation characteristics are compared.

251

252

2

Background Theory

In a 1-D TEM-mode case, the Maxwell's equations are expressed as dEv dt

1 (dH, eo"ne

^

* ^ =_ ^ E L dt

(1)

(2)

flQflr dx

if it is assumed that x- is the propagating direction, y- and z- are the directions of electric and magnetic fields, respectively. In FDTD, both spatial and temporal derivatives are expanded by the finite differences. In PSTD, using the FT property, the spatial derivatives are expanded as

^=|-3;'[3,[/(r)]]=3;'L,*ALrtr)]]

P)

dx ox where f(r) is E or are the forward/backward FT. In MRTD, the electromagnetic fields are expanded by

Ey{x,t)= %^E^(x)hn(t)

Hz(x,t)= % .Kh.y^Ky^

(4)

(5)

where function §(x) is a scaling function. 3

Results and Discussions

A 1-D planar stratified medium is set for comparative studies using the above-mentioned time domain techniques. A general configuration is given in Fig. 1, where each dielectric slab has properties of width w = 3cm, and e r = 4, \ir = 1, ae = 0, and are separated at w. Two cases, n -1 and n = 10, are considered in this study. The maximum frequency is set at 10GHz, which corresponds to the maximum allowable grid size Ax = 0.15mm = Axjrin, according to the general rule of FDTD. The discretization is then increased to 2» jcmin, 4* xmin, and 5» ;cmin, and the medium is computed using FDTD, PSTD and MRTD to analyze ISnl and 15211 at xpt and xqh respectively. The time step is

253 (n total)

V 0 xL

point for 15,, I

A-

observation point for IS21I

\Or\

Qr.

a,'-

xR

xpl

w

w

w = 3cm

xv

Nx

Figure 1. Configuration of a planar stratified medium with a series of dielectric slab layers. Planar Stratified Medium (single slab) using FDTD

Planar Stratified Medium (periodical slabs) using FDTD

Frequency (GHz)

Frequency (GHz)

Fig. 2a. IS,il and IS21I forn = 1 using FDTD at all • x. Fig. 2b. \Sn\ and IS21I forn = 10 using FDTD at all • x. Planar Stratified Medium (single slab) using PSTD

Planar Stratified Medium (periodical slabs) using PSTD

Frequency (GHz)

Frequency (GHz)

4

5

6

Fig.3a. ISnl and LS2il forn = 1 using PSTD at all • x. Fig. 3b. ISnl and IS21I forn = 10 using PSTD at all • x. Planar Stratified Medium (single slab) using MRTD

Planar Stratified Medium (periodical slabs) using MRTD

Frequency (GHz)

Frequency (GHz)

Fig. 4a. ISnl and IS21I for n = 1 using MRTD at all • x. Fig. 4b. IS11I and IS21I for n = 10 using MRTD at all • x.

254

kept at At = 1.25ps, and sufficient numbers of time steps are used to ensure the system reaches the steady state. Excitation is realized by a Gaussian pulse from xi to XR, and PML media [4] are used for truncations. The frequency spectra using all three numerical methods are depicted in Figs. 2-4. From the figures, it is seen that the FDTD results deteriorate significantly as • x increases, since any discretization at over • xm\n already violates the general FDTD rule of accuracy. On the other hand, however, both PSTD and MRTD offers accurate results in terms of the location of critical frequencies, and the MRTD out-performs the other two methods thanks to the analytic expansions of the basis functions. 4

Conclusion

Three numerical techniques, viz., FDTD, PSTD and MRTD, were introduced and were applied in a comparative study on planar stratified media. Results show that the PSTD and MRTD offer better accuracy than FDTD in general, and MRTD provides an overall out-performance. Acknowledgements Author Ming-Sze Tong acknowledges the financial support given by the Singapore Millennium Foundation (SMF), and the previous financial support from the Alexander von Humboldt (AvH) Foundation when part of the research work was initially conducted. References 1. K. S. Yee, "Numerical solution of initial boundary value problems involving Maxwell's equations in isotropic media," IEEE Transactions on Antennas and Propagation, vol. AP-14, pp. 302- 307, May 1966. 2. Q. H. Liu, "The PSTD algorithm: a time-domain method requiring only two cells per wavelength," Microwave Optical Technology Letters, vol. 15, pp.158-165, Jun. 1997. 3. M. Krumpholz and L.P.B. Katehi, "MRTD: new time-domain schemes based on multiresolution analysis," IEEE Transactions on Microwave Theory and Techniques, vol. MTT-44, pp. 555-571, Apr. 1996. 4. S. D. Gedney, "An anisotropic perfectly matched layer-absorbing medium for the truncation of FDTD lattices," IEEE Transactions on Antennas and Propagation, vol. AP-44, pp. 1630-1639, Dec. 1996.

FDTD Analysis Effective of Printed Dipole Antenna M. TANGJITJESADA , N. ANANTRASIRICHAI Research Center for Communications and Information Technology (ReCCIT), and Department of Information Engineering, Faculty of Engineering, King Mongkut's Institute of Technology Ladkrabang, Bangkok 10520, Thailand. E-mail: maleeya Kmhotmailcom,

[email protected]

T. WAKABAYASHI School of Engineering, Tokai University Hiratsuka, Kanagawa, 259-1292, Japan

E-mail: wakaba&et u-tokai.ac.jp This paper presents an analysis of the printed dipole antenna. The structure of this antenna is coplanar strip (CPS) dipole excited electromagnetically by microstrip line. This antenna requires matching transformer structure for feed circuit . And use finite differential time domain (FDTD) simulation to analyse the basic antenna characteristics when we adjust width of dipole and width of matching transformer.

1 Introduction In recent years, microstrip antenna have undergone intensive investigation due to their adventages of easy fabrication, compatibility with solid-state device and its physical size is quite small at high microwave frequency. In this paper we design CPS dipole antenna fed by microstrip line with matching transformer . And analyse the basic antenna characteristics when we adjust width of dipole and width of matching transformer by FDTD technique. 2 Antenna configuration The physical configuration of microstrip feed CPS dipole is shown in Fig 1. A matching transformer was design and use to adjust matching for the best result of antenna characteristic. The parameters in the analysis are relative dielectric constant of material (er), the thickness (h), dipole length (Ld), width of dipole (Wd), length of CPS (Lc), width of CPS (W c ), length of matching transformer (L m ), width of matching transformer (W m ), gab between dipole leg (g), width of microstrip line (Lms) and length of microstrip line (W m s ). FDTD analysis conditions are the cell size of dx = 0.42333 e"3, dy = 0.3 e"3, dz = 0.3 e"3 and the space of 20 x 103 x 90 cells. The antenna element length is determined as a half wavelength by using effective dielectric constant eeff in the following calculation. And use resonant frequency at 5 GHz. 1 L

"

=

c

—

7

255

=

256 Where c is the speed of light, eeff is the average of the relative dielectric constant of the dielectric material and air. And f. is the resonant frequency. 2.33 1.69 mm 17.4 mm adjust 7.8 mm 0.6 mm 12 mm adjust 0.6 mm 12.6 mm 1.8 mm

£r

h

u

wd Lc

wc ^m

W g ^ms

w £

h

Fig 1. Configuration of CPS dipole 3 Simulation Result In this section, the simulation results are shown by varying the width of dipole and the width of matching transformer. The result can separate to 3 case. Casel; When we adjust width of matching transformer smaller than length of dipole (Wm < L d ). Fig 2 shown that the best result of the return loss occur closer design frequency when the width of dipole are varied nearly a quarter wavelength. dB

ryr^xTC s

o -5

SL

\

- 1 0 - 1 5

r'v

-2 0

- 2 5

\

-3 0

Frea Fig 2. Show Return Loss ; Wm < Lj Wd 0.6 mm, Wd 7.8 mm

Wd 5.4

\

,

257 Casell ; When we adjust width of matching transformer equal to length of dipole (Wm = L d ). Fig 3 shown return loss characteristics. The best result occur at higher side of design frequency when the width of dipole are varied to narrowest.

, *vryysf^nr. • Hj v 1

Freq Fig 3. Show Return Loss ; Wm = Lj W<j 0.6 mm, —

Wd 5.4 mm,

Wd 7.8 mm

Caselll; When we adjust width of matching transformer longer than length of dipole (Wra > L d ). Fig 4 shown that the best result of the return loss occur when the width of dipole are varied to close a quarter wavelength and the resonant frequency goes higher side of design frequency. dB

Frea Fig 4. Show Return Loss ; Wm > Ld Wd 0.6 mm , Wd 7.8 mm

Wd 5.4 mm,

258

Fig 5 shown the radiation patterns in xy plane (a) and xz plane (b) when consider at center of antenna.

(a) XY plane

(b) XZ plane

Fig 5. Show the radiation pattern in XY and XZ plane 4 Conclusion This paper analyses the characteristics of printed dipole antenna when adjust width of dipole and matching transformer by FDTD method. The result shown that matching transformer can improve matching between antenna and transmission line. And antenna characteristic occur closer design frequency by adjust width of dipole but not over a quarter wavelength.

References [1] Warren L.Stuzman.; Gary A.Thiele, Antenna Thery and design, John Wiley & son, New York, 1981 [2] Yongxi Qian, FDTD Analysis and Design of Microwave Circuit and Antenna, Toyo, 1991 [3] Tilley. K.; Wu. X.D.; and Chang. K, "Coplanar waveguide fed coplanar strip dipole antenna", Electron Lett., Volume 30, pp. 176-177,1994. [4] Lin, Y.D.; and Tsai, S.N., "Coplanar waveguide-fed uniplanar bow-tie antenna", Ieee Trans. Microwave Theory Tech.,MTT-4S, pp.305-306, 1997 [5] Kolsrud, A.T.; Ming-Yi Li; Kai Chang, "Frequency tunable CPW fed CPS dipole antenna using varactors", Antenna and Propagation Sociaty International Symposium, Volume 1, pp. 308-311,1998. [6] A.T. Kolsrud; Ming-Yi Li; and Kai Chang, "Dual-frequency electronically tunable CPW fed CPS dipole antenna", Electronics Letters, Volume34, pp.609-611,1998

R A D I A T I O N A N D S I G N A L I N T E G R I T Y A N A L Y S I S IN I M P E R F E C T L Y DIFFERENTIAL TRANSMISSION LINES WITH FULL-WAVE FDTD M E T H O D

WEILIANG YUAN AND ERPING LI CEE Division, Institute of High Performance Computing, 1 Science Park Road, #01-01, The Capricorn, Singapore 117528 E-mail: yuanwl@ihpc. a-star. edu.sg, eplee@ihpc. a-star. edu.sg The relentless advance of integrated circuit technology eventually changes the nature of integrated circuit design. High speed and high level of integration have led to the growing importance of signal integrity in IC and packaging design, among which interests in interconnect have greatly increased. In the paper, the full-wave finite-difference time-domain (FDTD) technique is used to investigate electrical performance of imperfectly differential transmission lines. Radiation and mode transformations are analyzed in the existence of the discontinuities, that is, bend in strip and gap in ground plane. The results concluded will be useful to optimize interconnect designs for differential signaling transmission for SI and electromagnetic compatibility purposes.

1

Introduction

With rapid developments in microelectronic technology, modern digital designs require control of timings down to the pico-second range with high level of integration. It has led to the growing importance of electromagnetic compatibility and signal integrity, among which, dispersion, resonance, reflection, coupling, and radiation in interconnect have dramatically been becoming critical and even define system performance. In high-speed and high-density regime, differential signaling transmission is increasingly used as a solution to crosstalk reduction and has been extensively studied with different techniques [1, 2]. However, inevitable deviation from perfect situation in practical environments such as bend in strip and gap in ground plane lowers its effectiveness and increase unwanted parasitic effects. It is necessary to investigate them for reliable interconnect designs. With the advent of high-speed circuitry, it is no longer possible to exclusively rely on traditional techniques because they cannot capture most of significant physical effects with good accuracy. Consequently, the need for full-wave solvers has been recognized, among which the finite-difference time-domain (FDTD) method [3] has increasingly become the most popular numerical method. The paper will investigate the imperfectly differential transmission lines with the FDTD method for the purpose of signal integrity.

2

Performance analysis of differential transmission lines with discontinuities

The FDTD technique provides a direct integration of Maxwell's time-dependent equations with two-order accuracy. It divides the space of a solution into a mesh composed of cells, and electric and magnetic fields are alternatively calculated as the time evolves. In unbounded problems, absorbing boundary conditions, the perfectly matched layers, for example, need be introduced to simulate the extension of computational region to infinity. Two conductors per signal with each line driven out of phase lead to differential signaling. Compared to single-ended signaling, it is quiet and offer high signal integrity. However, inevitable deformation from perfectly setup will lower its effectiveness. To help support design trade-off optimization, simulation is necessary to estimate its performance.

259

260

2.I

Bend in differential transmission lines

Ideal differential interconnect shall be a point-to-point connection between two devices over impedance-controlled transmission lines with Port II matching scheme, however, due to layout limitation, it is usually inevitable to introduce the discontinuity Port I of bend in strip, shown in Figure 1 for microstrip configuration. It can support even and odd modes, of which, the former is undesirable from the point of Figure 1. The structure of differential signal integrity. Due to this discontinuity, the modes signaling transmission line with a bend rather than the original in the excitation can be produced. When odd-mode signal voltage is added in the port I by the driver, undesirable eve-mode noise will come into being in both ports, which might cause EMC problems o o (HP)I

©

(HP)

5 -«0

<s -60

0L 3L 6L

ss€^

-80

V.

-70 -80 -90

0.01

0.1

1 10 FrequsncytGHz)

100

Figure 2. Odd-mode reflection coefficients under the odd-mode excitation of the signal

0.1)1

0.1

1 10 Frequency (GHz)

100

Figure 3. Even-mode reflection coefficients under the odd-mode excitation of the signal

Figure 4. Even-mode transmission coefficients under the odd-mode excitation of the signal

because even-mode noise will radiate more electromagnetic field even its less level in amplitude than its counterpart. Numerical analyses are conducted with the FDTD method, where both ports are terminated with the matching impedance for two modes, and oddEM Reft -<^I.OGHz-fr- I.OGHZ-M- 10 GHz EM Trans - ° - 0.1 GHz -*- I.OGHz-o- 10 GHz -20

20

OM Reft - ° ~ Scricsl - * - Series2 - M - Seriss3 OM Trais - o - Series4 - < - Series5 - » - Scres6

0

_-30 f-20

|-40

-40 -60

-60

Figure 5. The change of even-mode reflection and transmission coefficients under the oddmode excitation of the signal with the length

Figure 6. The change of odd-mode reflection and transmission coefficients under the oddmode excitation of the signal with the length

mode time-domain signal is used to excite the port I as source. Total voltages across different traces and ground plane on both ports can be extracted, that is, v u (/)v 2l (?) on the port I and v12(')v22(r) on the port II. Then, time-domain even- and odd- mode voltages can be calculated

v„,(0= k(0-v2,(0]/2

vj)= M)+ v2,(0]/2

And scattering parameters can be determined, shown in Figure 2-4 with the frequency in different lengths of d. In three frequencies, its change with the length are also given in Figure 5 and 6. It can be found that the amplitude of scattering parameters increase linearly with the logarithm of frequency, and odd-to-even mode transformation coefficient

261 is larger than ~20dB in high-frequency end, which indicate that the bend, especially multi bends in the line might cause interference to other traces nearby. The results also imply that scattering parameters are little sensitive to the length of bend. As it is mentioned early, even-mode noise transformed from the odd-mode excitation due to bend will worsen electromagnetic radiation. Figure 7 shows total electrical field above the bend with the height of 62.5mih, from which it can be found that the radiation on the high- Figure 7. Electrical field radiated by frequency end will be increased. It should be noticed in differential lines with the bend under the odd-mode excitation high-speed interconnect design. Frequency (GHi)

2.2

Gap in ground plane in differential signaling transmission lines

In practical circuit, non-completeness in ground plane is often encountered because of, for example, the need for independent digital and analog grounds and through via from ,r - dl O 1GHz ] OC5H* 1G*3!z . O M Rcfl —°— O M Tranm —o—

Port I

•

Port II d2

•

•

Frequency

Figure 8. The structure of differential signaling transmission line with a gap on ground plane.

(GHz)

Figure 9. Reflection and transmission coefficients under the odd-mode excitation of the signal with the distance of d

one signal layer to the other separated by ground. Though both traces can provide return path for signal, minimal impedance for the loop forces part of signal return through ground plane. However, the breakage in ground plane prevents its return, consequently SI problem occurs. Full-wave simulations are implemented for the structure shown in Figure 8. First, symmetric breakage with dt=d2=d will be analyzed, and the results for scattering parameters under the odd-mode excitation with frequency are presented in

Figure 10. Odd-mode reflection coefficients under the odd-mode excitation with frequency

Figure 11. Even-mode reflection coefficients under the odd-mode excitation with frequency

Figure 12. Even-mode transmission coefficients under the odd-mode excitation with the frequency

Figure 9. Mode transformation will occur when non-symmetry in the breakage exists. The simulations are conducted when dl=0,d2=d, and the results are shown in Figure 10-12, where the two first show odd- and even- mode reflection coefficients respectively and the last does even-mode transmission coefficient under the odd-mode excitation with frequency. It can be found that they increase linearly with the logarithm of frequency. At the high-frequency end, even mode component is somewhat strong that it should be

262

considered for radiation and coupling reduction. Figure 13 and 14 show the change of odd- and even- mode scattering parameters under the odd-mode excitation with the £=£=s=S = 2 •

„

*-^—. ^=3

I-

O.lOfe l.oatt ll J «//

EMBofl FM T»n»

oioto l.ocife laafe

-*-*- -*" —*~~

r.-. TU u r „ J „ n •• Figure 13. The change of even-mode reflection (r • . under j the .u odd-mode jj _ ^ and,,transmission coefficients -. .. , •,. ,. .-, c,u excitation of the signal with the distance

Figure 14. The change of odd-mode reflection B & , „. . . . . . and transmission coefficients under the oddmode excitation of the signal with the distance &

distance d, which indicate that small gaps in ground plane are allowed from the SI point, however, the larger might cause SI nightmare. PEF -• — 2L / 6 Even mode noise transformed from the odd-mode 6L —— 10L / excitation due to gap in ground plane will worsen I2 4 If electromagnetic coupling. Below shows z-component Ad 2 // _ ^ T of electrical field above the lines with the height of 62.5/w/Vs compared to that of perfect counterpart, from 001 01 I 10 100 which it can be found that the radiation will increase Frequency (GH/) with the frequency, which should be noticed in highFigure 15. Electrical field radiated speed interconnect design. by differential lines with the gap •>

under the odd-mode excitation

Conclusions In the paper, electrical performance of differential signaling transmission line with discontinuities including bend and gap is extensively investigated with the full-wave numerical technique. Reflection, transmission, mode transformation, and radiation are analyzed. Although the analyses are conducted under both mode excitations, only the results for odd-mode excitation are presented because of the space limitation. The insights acquired here could become the foundations for solving SI problems and optimizing highspeed differential interconnect design. As an undesirable even-mode noise, it can arise SI problems although its small amplitude in practical circuits, and it is necessary to put attentions to odd-to-even mode transformation due to discontinuities in high-speed design. References Kim, N., Sung and etc., Reduction of crosstalk noise in modular jack for high-speed differential signal interconnection, IEEE trans, on advanced packaging, 24 (2001) pp. 260-267 Yuan, W. and Lee, E., Analysis of signal propagation in high-speed differential signal lines with full-wave time-doamin technique, 2002 IEEE EMC international symposium, Minneapolis, USA, Aug 19-23 (2002), pp. 1004-1009. Taflove, A., Computational electromagnetics the finite-difference time-domain method, Boston: Artech House, (1995).

RADIATED EMISSION PREDICTION IN ELECTRONICS CIRCUIT SYSTEM LEVEL JIE GAO, SIANG-BENG WEE AND ER-PING LI Institute of High Performance Computing,1 Science Park Road, #01-01 The Capricorn, Science Park 2, Singapore 117528, Tel: 65-64191224, Fax:65-64191280 E-mail: [email protected] The radiated emission from multi-layer printed circuit board (PCB) in high-speed electronics can lead to electromagnetic interference and deteriorated performance of surrounding electronic systems. There is increasing concern on potential electromagnetic interference (EMI) problems and worldwide tighter regulations on emissions become the requisites. The identification of EMI sources and level in electronics are important in the analysis and understanding for electromagnetic compatibility (EMC). An expert system is used to identify the critical radiation sources and evaluate the emission from a high-speed circuit. Thereafter, numerical simulation is carried out for the critical paths of the circuit within the enclosure to derive shielding effectiveness. The proposed methods are useful in designing PCB board layout issues that are able to meet EMC requirement.

1

Introduction

Electromagnetic compatibility is important in today's high-speed design. A lot of time will be needed for reworks or redesigns if the system fails EMC requirement. Time for the product to market is also delayed. Therefore prediction of radiated emission on the electronics system before prototype and early incorporation of EMC consideration during the initial design stage is helpful in a design cycle [1,2,3]. Full detailed model setup is quite difficult due to the complexity of the electronics system and signal operation. On the other hands, it is not practical to do a full wave simulation for the whole system. Such simulation needs such intensive computational time and resources. In this paper, the proposed methods use EMC expert system as well as numerical techniques to augment electronics system design. The simulation results give a good correlation on the radiation mechanism and level of electronic system when compared to measured results. 2

Proposed Methods

The proposed methods are used to analyze the system radiated emission: (i.) Identify all critical radiation sources of EMI concerns on the PCB by an EMC expert system. Main radiation mechanisms are determined [4]. a) Signal loop identification: Signal loop or differential mode (DM) can be modeled by magnetic dipoles. Analytic electric field (E-field) evaluation formula at the distance r for individual signal loop is expressed by: Em =263xlO-16(/M/)(-) r The DM radiation is proportional to the current (/) the loop area (A), and the square of the frequency (/).

263

264

b) Voltage-driven common mode (CM) radiation: Every input/output trace is examined to check whether there is high-speed net in the vicinity. Noise voltage V//o crosstalked from high-speed net drives the cable antenna with the impedance Zant: F

c)

m _9Vioy„ 0 (/)

377rZ0„, Current-driven CM radiation: This mechanism is due to voltage drops VCM across the CM inductance of the signal return ground. VCM drives parasitic dipole antenna on the circuit board. Far field estimation for E-field at the distance r for the parasitic dipole antenna with impedance Zan, is: V (/) j1^ F m- «

d) Traces that dangle longer than certain length and are driven by clock signal are also identified as a potential radiator, (ii.) Methods of Moment (MoM) is then employed to give more accurate emission result for the critical paths of each above mechanism. The focus is on long dangling wires and voltage-driven cable radiation because these are the main sources from the board, (iii.) After the PCB analysis, critical radiation trace is modeled as a strip line on an actual board using a voltage excitation. The numerical analysis used is finite element method (FEM). Two simulations are done at a certain frequency point. The shielding effectiveness is determined such that Emax is the maximum electric field at 3 meters by a given source without the shield, and £ s is the maximum electric field at 3 meters for the same source within an enclosure. It is given as the ratio between the two results by [5]: SE = 201og1(

3 3.1

Es

Simulated Results and Analysis Signal loop

There are 4 layers in this board. Ground plane is implemented to control signal loop emission. Figure 1 shows the radiation spectrum of the highest frequency net.

Figure 1. Interference spectrum of clock loop without enclosure running at 121.5MHz There is no peak over FCC Class B regulation at 121.5MHz and the harmonics are will within the limits. However, an observation shows that the 3 rd and 5th harmonics are

265

higher than the fundamental by 14dBu.V/m and lOdBuV/m respectively. The reason is due to the radiation efficiency of the signal loop at that particular frequency. The simulated radiated emission from all other signal loops on the board, however, is well controlled within the FCC Class B regulation. 3.2

CM radiation

The total CM radiation of the board is shown in Figure 2. The simulated results give very close correlation to the measurement. Not all electric field over these frequencies are kept within the FCC Class B limits or at the threshold limit of 40dBu.V/m. Above this threshold, they are the main EMI concerns that requires radiation shielding or improved EMC design.

55.296

73.723

81

92.16

108

135

162

189

216

243

Frequency (MHz)

Figure 2. Simulated and measured CM E-field over a range of frequency

Measures are taken to reduce emission from the board suggested from simulation results. A InF capacitor is connected between the ground and at the output of the dangling wire driver. Ferrite clamp is incorporated at the cables near the board connector. The new preliminary test really shows improvement after these measures are taken. 3.3

Enclosure design shielding effectiveness

100

200

300

400

500 600 Frequency: MHz

7O0

800

900

1000

Figure 3. Comparison of shielding effectiveness of enclosure designs

266

Figure 3 shows the computed shielding effectiveness of the three designs on enclosure performance. Shielding effectiveness for original enclosure design is quite poor. The front panel is modified to improve the design. Some metallic plates are used to shield part of the front panel. Numerical simulation also shows that EM energy leakage from gaps between bottom, top, left and right cover of the enclosure degrades shielding effectiveness as well. Therefore the further modified enclosure is obtained by ensuring the bottom, top, left and right cover of the enclosure is electrically contacted, and shielding effectiveness is further improved. 4

Conclusion

The proposed methods to predict radiated emission of high-speed electronics in PCB are presented. Expert system is first used to identify the emission source followed by detailed modeling to quantify the electric field. A field simulator is then used to check on the radiation shielding for the enclosure. They are used to complement each other as a means to predict the radiated emission in the system level. The proposed methods are useful to system designers to obtain an initial understanding for EMC design in high-speed electronics. 5

Acknowledgements

This work originates from the collaboration with Philips Electronics to design the multilayer printed circuit board for high-speed electronics applications in a digital video player (DVD) system. Philips Electronics provides EMI test data. References 1. C.R. Paul, Introduction to Electromagnetic Compatibility, John Wiley & Son, 1992. 2. Ott Henry W., Noise Reduction Techniques in Electronic Systems, John Wiley & Son, 2nd edition, 1987. 3. Montrose M.I., Printed circuit board designs techniques for EMC compliance, IEEE 1996. 4. T. Hubing, J. Drewniak, T. Van Doren and N. Kashyap, An expert system approach to EMC modelling, Electromagnetic Compatibility Symposium Record, IEEE 1996 International Symposium on, pp.200-203. 5. Ye Chunfei, Gao Jie and Wee Siang Beng, DVD Circuit EMC Analysis Shielding Effectiveness of Enclosure, Philips Report, 2002

WEB-BASED ELECTROMAGNETIC SIMULATION YUN FAN '•2, YONG-LIN LI, ER-PING LI, CHIN SAI KONG Institute of High Performance Computing, 1 Science Park Road, #01-01 The Capricorn, Singapore 117528. E-mail: [email protected],, [email protected], [email protected] chinsk@ihpc. nus. edu. sg LE-WEI LI 2

National University of Singapore, 10 Kent Ridge Crescent, Singapore 119260. E-mail: [email protected]

In order to simulate the complex electromagnetic (EM) models and visualize the simulation results on different platforms at different locations, the VRML (Virtual Reality Modeling Language) and Java techniques are used to develop the web-based interactive programs and implement portability of applications in this work. VRML allows for the distribution of three-dimensional computer models over the World Wide Web (WWW) and for interactive model viewing within the framework of a regular Web browser. Java has gained enormous popularity as a programming language for the web with approaches and tools used for creating truly portable interfaces. This paper describes the interfaces between Java and VRML and the principles of Web-based EM Simulation using VRML and Java for virtual electromagnetic analysis.

1

Introduction and Background

The globalization of the electronics industry is compelling the use of Internet for virtual design and analysis. This requires a group of simulation models, residing on different machines connected to the web, which are analyzed by designers globally and accessible for designers or users on independent platform. Majority of the electromagnetic software packages are available on UNIX workstations running X Window. In order to make them available on the Internet for global engineers and designers, web-based interactive programs to solve complex EM problems and visualize the results on different platforms on the web must be developed, but it is challenging and expensive. With implementation of Internet programming capabilities offered by Java, JavaScript, as well as new web standards such as XML, a cheaper solution can be achieved. 2

JAVA and JAVA Components

2.1 Portability of Java To achieve the EM simulation across different platforms on the web, portability of an application is critical. Portability of applications across different platforms has been quite difficult or expensive for an application with a graphical user interface (GUI) to be truly portable. It requires a certain amount of conformance of operating systems, programming languages and tools. This means in most cases that a GUI is limited to a certain environment, e.g. the Windows platform. In addition, a new version of any of the underlying components may break the original GUI. It is more difficult if the application must be portable across different platforms, such as UNIX and Microsoft Windows. However, a more appealing and effective approach is

267

268

to use Java. Java has been recognized as an effective technology to create new applications that have the potential to make EM simulation web-based. Java has gained enormous popularity as a programming language for the web and it also offers similar advantages for GUI applications outside the web. The latest Java release, with its rich set of GUI tools, is being ported to all platforms, and true to its "write once, run everywhere" slogan. It requires no special maintenance for different platforms, and that leads to a lower cost. 2.2 Java Servlets The interface between the web server and the simulation tools has ever mainly been based on CGI (Common Gateway Interface) scripts. However, in CGI scripts, when a new process is created by each request on the server that loads an interpreter, in turn the interpreter runs another process for each invocation of a module simulator. Consequently the server bears a significant load, the greater the number of users connected to the system the worse the server gets. Java servlets technology provides web developers a simple, consistent mechanism for extending the functionality of a web server and for accessing existing simulation systems. Java servlets can almost be thought of as applets that run on the server side and are intended to run as modules in conjunction with a Web server, extending its functionality in some way. The servlets communicate with web server through servlet APIs and are environment independent, and thus are compatible with most web servers, allowing a servlet application to run across different platforms. Performance improving is a direct consequence of developing an application in Java servlets. In this way, with each request the lightweight thread is produced, rather than a new process is created with the result that once a servlet is loaded, it can be reused continually. In other words, once a servlet is invoked, it can handle further requests resulting in less use of expensive system resources and better scalability. The server-based Java servlets is a powerful technique for developing web-based EM simulation applications run in conjunction with an EM simulation system and a browser. It leads to a great improvement in availability, functionality and performance of web-based EM simulation system. 2.3 System illustration Fig.l demonstrates the realization of the EM simulator, Ansoft HFSS, in the web-based virtual simulation environment by using Java technologies, web server and a middleware of Tarantella (TTA) [6] Enterprise software for web-based EM simulation.

HFSS (Sun Sever)

Middleware (TTA)

Browser (PC)

Figure 1. The structure of web-based Ansoft HFSS simulation system

Ansoft HFSS (High Frequency Structure Simulator) is an interactive software package for simulating the electro-magnetic behavior of a structure. TTA Enterprise software, which was mainly developed using

269

Java and C programming languages, provides a non-intrusive solution and fast, secure access to different platforms anywhere. With TTA, users can access EM simulation system remotely from their client device (anything from a thin client to a top-of-the-range PC). With Java technologies, web server and TTA, the designers need only a browser to access the HFSS simulation tools remotely on his client using a Java enabled browser, and do the simulation and view its results on the interface translated to his desktop. The structure of simulation system is given as in Fig. 1. TTA stores information about designers and applications centrally. To access the applications, designers send requirement by inputting userid, password and require applications via Browser. The web server responses the request and TTA checks its datastore of designers and applications information to determine which application each designer is allowed to access. Through all the time, servlets and TTA manage connections, user sessions, and security. 3

Visualization of EM Simulation Results with VRML and Java

To visualize EM simulation results remotely through the Internet, the concept of the interaction of Java and VRML worlds is extended in the domain of EM simulation. Some EM simulation packages can directly generate results into VRML files, which have an extension .wrl and permit a better 3D visualization of represented models. To increase simulation results' accessibility, the methodology for visualizing the results of EM simulation on the Internet relies on the collaborative power of Java and VRML, which makes it available world-wide through any VRML97 compatible web browser. It allows a user, located anywhere, to access and visualize the simulation results through its URL. 3.1 VRML The visualization is an important part of the web-based EM simulation modules. To view VRML models is now available on the web browsers and the modules can be organized as virtual environments, providing a new powerful way to visualize complex data sets where the user is completely embedded in the scene. VRML is a file format for describing interactive 3D objects to be experienced on the web and the current standard for interactive 3D descriptions of objects. It defines a set of objects useful for 3D graphics, multi-media, and interactive objects building. These objects are called nodes, and contain elemental data, which is stored in fields and events. The main goal of the current version of VRML 2.0 is to provide a rich 3D interactive graphical environment, allowing the user to define static and animated worlds, and to interact with them. 3.2 EAI EAI (External Application Interface) is an interface that allows an external program to access the nodes in a VRML scene using the existing VRML event models. Here the external program is developed in Java. The Java-VRML interaction takes place through the EAI supported by VRML browsers. EAI enables a Java program to modify VRML scenery directly inside the program and thus extends the features of VRML. It allows programs written in Java and JavaScript to control the contents of a VRML world. The controls available include object creation

270

and removing, operations as rotation, translation, user's point of view changing, and properties changing, etc. The VRML EAI can be used to control the visualization of EM simulation results through a dedicated Java applet. The client is built as a Java applet run on the web browser connected to the VRML scene through EAI. 3.3 Integration of VRML and Java With VRML files generated by EM simulator, the integration of VRML and Java is for the control of visualization of simulation results. All the interactive elements of the VRML model are named, and can be directly manipulated by the Java program, which is responsible to the interpretation of user's generated events from the browser. Java servlets interact with and manage a VRML model and provide three important features; restoring the work from a prior session by retrieving information from the log file, emitting the appropriate VRML content, and used to generate models specified parametrically. 4

Summary

Web based electromagnetic simulation, including achieving a simulation and visualizing its results across different platforms on the Internet, has been illustrated in this paper. Both the server side and client side programs are created using Java. Users can also remotely view results generated by EM simulation tools in VRML file format. The VRML files are operated by the Java program using EAI. The future work is to achieve web-based paralleling FDTD (Finite-Difference TimeDomain) simulation, which is to simulate the signal propagation of the differential lines that possess the high signal transmission speed. The FDTD simulator is to be designed using web server side Java programming technology. The simulator resides on the high end super computer for remote users accessible with a low end PC through Internet. After the simulation is done on the server, the results are stored in VRML file format and users use a local PC down load the VRML file and an applet developed in Java and EAI from the web server for visualization. 5

References

1. Josue Jr. G. Ramos, etc. "Development of a VRML/Java unmanned airship simulating environment", IEEE International Conference on Intelligent Robots and Systems, 1999. 2. Fablana Saldanha Tamiosso, Alberto Barbosa Raposo, Leo Pini Magalhaes and Ivan Luiz Marques Ricarte, "Building interactive animation using VRML and Java", IEEE, Computer Graphics and Image Processing, Proceedings, X Brazilian Symposium, (Oct 1997) pp. 42-48, 14-17. 3. L. T. Walczowski and W.A.J. Waller, "Java servlet technology for analogue module generation", the 6th IEEE International Conference on Electronics, Circuits and Systems, vol.3 (1999) pp. 1717 - 1720. 4. Joseph Alex etc, "Teleoperated micromanipulation within a VRML environment using Java", IEEE/RSJ Intl. Conference on Intelligent Robots and Systems (October 1998) Victoria, B.C. Canada •. 5. Lee A. Belfore and II Suresh Chitithoti, "An interactive land use VRML application with servlet assist", Simulation Conference, vol.2, (2000) pp. 1823-1830. 6. Tarantella Enterprise software: http://www.tarantella.com/.

PARALLELIZED FDTD METHOD FOR ANALYSIS OF SIGNAL INTEGRITY IN HIGH-SPEED ELECTRONIC CIRCUITS HONG-FANG JIN u , ER-PING LI, WEI-LIANG YUAN 'institute of High Performance Computing, 1 Science Park Road, #01-01 The Capricorn, Singapore 117528. E-mail: [email protected], [email protected], [email protected] LE-WEI LI Faculty of Engineering, National University of Singapore, 10 Kent Ridge Crescent, Singapore 119260. E-mail: [email protected] 2

ABSTRACT - In this paper, three-dimensional finite-difference time-domain (FDTD) source code has been developed and parallelized at the high performance supercomputers by using the Message Passing Interface (MPI) protocol. Analysis of high-speed differential transmission lines was conducted to evaluate the crosstalk among the interconnects with the parallelized FDTD codes. The dependence of crosstalk on transmission-line separation, width, and substrate height were also presented in the paper.

1

Introduction

FDTD method, first proposed by K. S. Yee in 1966[1], has become a powerful electromagnetic simulation technique. But the analysis of vast areas and complicated systems requires large memory and high computing power. Parallel processing appears to be an effective approach to distribute the intensive computing work for solving large scale problems. It has been demonstrated that FDTD algorithm is ideally suited for parallel implementation [2-3]. At present, the crosstalk of signals between adjacent signal traces in high-speed digital circuits can be a critical performance limitation, and differential line is a common approach to achieve higher noise immunity and reduce the EMI for critical signals in the high speed digital circuits [4-5]. The mechanism and design techniques for differential signal transmission are not well established today, and consequence the research and development on such techniques are highly demanded in microelectronic industry. In this paper, 3D FDTD source code has been developed and parallelized by using MPI protocol. Analysis of high-speed differential transmission lines is conducted to evaluate the crosstalk among interconnects with the parallelized FDTD codes.

2

Parallelized FDTD algorithm

The FDTD algorithm is a time-stepping procedure where a finite difference operation is carried out on each of the cells in the problem space, using discretized versions of Maxwell's curl equations. For a given time step, the electric fields are calculated first and stored, from these the magnetic fields are then computed. Once the electric and magnetic fields are computed in every cell within the problem space, the algorithm moves to the next time step. This process is repeated until the required number of time steps has been

completed.

271

272

Parallelized FDTD algorithm is implemented by splitting the whole analyzed space into N subspace( N is the number of process), each to be computed on a separate processor. At each time step most of the electric and magnetic fields can be computed in the same manner as in the serial algorithm. However, those on the processor boundary will require a small amount of information from the neighboring processors. Required data values from the FDTD grid on neighboring processors are thus exchanged. The computation of electric and magnetic fields for this time step is then completed and a new time step is commenced. The first step of the parallel algorithm is to equally distribute the three-dimensional problem space among the processors. In this paper, two-dimensional decomposing is applied on the x-y plane whereas no division is done along the z axis. So only the communications along x and y axis are needed. This topology is well suitable for thin structures, such as microstrip lines and wire antennas, where the thickness of the problem space along the z axis is small. Communications between the neighboring subspaces are delicate. To calculate the field in the boundary cells, one needs to know the field in the cells belonging to the neighboring subspaces. Let us consider one subspace in Fig.l. In order to calculate the component Ex in the most left plane, the components Hz, localized in the cells of left subspace are needed. So the reception of Hz values is necessary for all the cells of the x-z plane at the left boundary plane, which belongs to left subspace. In the same manner, This subspace must send Hz values to right subspace over all x-z plane at the right boundary plane. The complete send-receive communications of any subspace are indicated in Fig. 1. Note that the right-left communications operate on the x-z plane, while the up-down communications operate on the y-z plane. T p

Receive Hy and Hz

Send Ey ar and Ez y

1=%

j

Receive Hx and Hz

Subspace Send Ex and Ez

O

Receive Ey an t Ez

v- Receive

Ex and Ez

Send Hx and Hz Send Hy and Hz

Figure 1 The communications for a 2D topology of processes

3

Differential line simulation

->

L

I

4Substrate

T1 (a)

Groundplane

(b) Fig 2. Structure of differential line

273

A differential signal consists of a signal and its 180 out-of-phase complement with switching at the same time. In this paper, a two pair differential signal traces was taken as the computing example as shown in Fig.2 where Fig.2a shows the top view and Fig.2b shows the side view of the structure respectively. The structure has the dimension on the order of micrometer, and the frequency range of interest is from dc to 20GHz. The transmission lines were constructed on a thick dielectric of silicon with £ = 4.4 and thickness of 500 fJM. In order to establish a quantitative relationship between the important geometrical dimensions and crosstalk, we performed a number of simulations with different trace separations and substrate height. 4

Result and Discussions

Speedup 20l

ideacase —•—easel case2

15

s ^ss'' ^*\i-'

10-

5-

3

1

5 7 9 11 13 15 17 19 number of processors

1

5

9

13

17

5

7

9

11

13

15

17

19

•number of procesors

Fig.3 The computational time vs. No. of processors for a volume of 15xl0 6 cells

-10-

3

Fig.4 Speed-up results vs. number of processors at different test cases

1

2

t>/W

5

9

13

17

21

•10-

-30 -

-50-

-70 -

-30

- -10G differential 10G single Crosslalk(dB)

Fig.5 Crosstalk as a function of line separation (D/W)

»

-50

Crosstalk(dB)

Fig.6 Crosstalk as a function of substrate height (H/W)

Performance metric speed-up is commonly used to monitor the performance of the parallel computing system and it is defined as the ratio between sequential runtime Ts and parallel runtime T S=T,ITf For system with P processors, the ideal speedup is S=P.

274

In this section, we are going to evaluate the performance of the parallelized FDTD code based on the model defined early. Three typical model size, i.5xl06(case 1), 3.0xl06(case 2),i5xl0<'(case 3) cells, are examined by using the parallel FDTD code on IBM P640. The computed results are shown in Fig.3 and Fig.4. Fig.3 illustrates that the computing time decreases as the number of processor increases. It is also observed that when a task is subdivided and attributed to too many processors, the efficiency of the parallelisation will reach a limitation because the computational task is reduced to a point where it has of the same order as the communications task. Fig.4 shows the results for speed-up of the parallel code. From Fig4, we can see that the efficiency of the code increases as the problem size increases, whose result tends towards the ideal. It is a paradox that it is difficult to accurately assess the efficiency of parallelism for very large problem as they are too large to be run on a small number of processors. However, it is worth noting that whatever the achieved speed-up is, the application of parallel processing techniques is one of the only methods of allowing the use of the FDTD method for very large problems. Fig.5 and Fig.6 are summarized the simulation results. Fig.5 shows the variation of frequency depending on the distance of victim line from the source lines, it indicates that the crosstalk decreases as the distance increases. Fig.6 shows the crosstalk will decrease respectively with the increasing of the height of substrate. 5

Conclusions

The parallel FDTD algorithm has been developed by implementation of the FORTRAN90 on IBM P640 high performance computer using the MPI protocol. Numerical experiments show that the high performance is achieved with the 3D parallelized FDTD, which demonstrates that efficient massive-parallel computation is a preferable technique for simulation of large-scale electromagnetic problems. With the specially developed 3D parallel FDTD code, the crosstalk between high-speed differential transmission lines with the different line separation and substrate height were obtained. These are very useful design guideline for high-speed circuit design. Reference 1.

2.

3.

4.

5.

K. S. Yee, Numberical solution of initial boundary value problems involving Maxwell's equations in isotropic media, IEEE Trans, on Antennas and Propagation, vol.14, no.4, (1966) pp.302-307. J, Patterson, T. Cwik, R. Ferraro et.al, Parallel computation applied to electromagnetic scattering and radiation analysis, Electromagnetics, vol.10, (JanJune,1990)pp.21-40,. V. Varadajaran and R. Mittra, Finite-difference time-domain analysis using distributed computing, IEEE microwave and Guided Wave Letters, vol.4, (1994) pp. 144-145 F. Gisin and Z. Pantic-Tanner, Routing differential 1/0 signals across split ground planes at the connector for EMI control, IEEE International Symposium on Electromagnetic Compatibility, vol.1, (2000) pp.325 -327. N. Orhanovie, R. Raghuram, N. Matsui, Signal propagation and radiation of single and differential microstrip traces over split image planes, IEEE International Symposium on Electromagnetic Compatibility, Vol.1, (2000) pp.339-343.

GENERIC APPROACH TO OVERCOME THE LOW-FREQUENCY B R E A K D O W N IN E L E C T R O M A G N E T I C C O M P U T A T I O N S BY M O M E N T METHODS

A. J. LAPOVOK AND N.L. SUDOV Krylov Shipbuilding

Research Institute, 196158, St. Petersburg, E-mail: [email protected]

Russia

O.V. GRIMALSKI Kostroma State Technological E-mail:

University, 156024, Kostroma, [email protected]

Russia

Decoupling of the electric and magnetic fields at zero frequency leads to well-known difficulties in numerical implementations of moment methods based on the solution of the electric field integral equation (EFIE). The cause is a huge disbalance between the scalar and vector potentials in the EFIE system of algebraic equations at low frequencies. The low-frequency breakdown can be avoided by the extraction of zero-divergent (loop) currents from the space of basis functions for electric current. The present work explains how this approach can be applied to arbitrary connected wires and surfaces.

1

Introduction

The origins of so-called "tree/co-tree" gauge of vector variables in computational electromagnetism are historically linked with edge- and facet-elements [1]. Those elements were introduced in a mathematical manner and, in the beginning, cause a lot of misunderstanding among engineers. Meanwhile, facet elements had been independently discovered in antenna literature under the name of RWG-basis. This was done in much more practical way, but exclusively for applications of the method of moments (MoM) for surface scatterers [2]. So the underlying notions of mathematical spaces, fundamental cycles and graph theory were obscured. Those notions are not necessary at high frequencies when all elements of metallic surfaces can be considered as radiating elements. In this case each fundamental cycle is short and its domain is constituted only by two adjacent elements. The use of long cycles (loops) has been recently proposed in [3] for an accurate implementation of the electric field integral equation at any frequency. But the application has been addressed only to wire structures without branching. The theory of mathematical graphs provides an effective way to pick out long and short cycles for any metallic structure made of arbitrary connected wires and surfaces. A two-step algorithm of constructing the relevant set of basis functions is described in this paper. Then, thanks to the exact zero-divergence property of loop currents, the corresponding scalar-potential coefficients of the EFIE-matrix can be forced to zeroes. A detailed implementation of the matrix setup is provided in [3].

This work was supported by the Russian Ministry of Education under Grant No. TOO-2.4-2230

275

276

2

Facet elements

In order to introduce facet elements, we assume that any contour constituted by axes of thread-like conductors (wires, rods, tubes) is replaced by straight-line segments, and that all surface elements are approximated by planar triangles. The path electric current J along a contour segment L is approximated by two facet functions v,(1 : 2

J(p) = ^Jiyf)(p)

, P^L

(1)

1=1

where 7, is the total current through the fth end-point (vertex) of the segment, v/ =rip/l, I is the segment length, rip is the radius vector directed from the opposite end-point to an internal point p of the segment. The facet functions v/ 1 ' is the simple linear vectorial function, which is directed along the segment. The module of the vector is equal to zero at one vertex and to unity at another vertex. The surface current of a triangle S is approximated using the three facet functions v/2) 3

i{p) = YJJiy?\p)

, PsS

(2)

i=\

where 7, is the total current across the ith edge of the triangle, \l2)=ripl{2sk), Sk is the triangle area, the vector rip is directed from the vertex, which is opposite to the fth edge, to an internal point p of the triangle. Functions v,( * are Whitney two-dimensional facet elements. The normal component of each function is constant along one edge of the triangle and equal to zero along the other edges. Finally, we consider an L-S joint, where a thread-like conductor is connected to a conductive surface. Let the connection be in the y'th vertex of the triangle S. Accounting for the singularity of the electric current near the connection point, we need a particular approximation of the surface current on the triangle. The required approximation is constructed with the help of a new facet function v/3): 3

3(p) = ^J,v\2)(p)

+ Jjvf

(P) , peS

(3)

<=i

where v/3)(p)=v/2)(p)-v/2)(p)/[l-A,y(p)]2, Xj(p) is the barycentric function associated with the y'th vertex, Jj is the total current through an imaginary small arc at the jth vertex of the triangle. The normal component of the function v/ 3) is equal to zero along all three edges of the triangle. 3

Short cycles

We need a set of independent basis functions, which ensure the continuity of the electric current across any junction of line-segments and triangles. For each pair of adjacent elements, the sought basis function w can be formed by two facet functions having the opposite signs. For example, in case of two triangles 5] and 5 2 sharing the Jth edge:

{ yf(P),

yr

j(P) = \

peS:,

(2), ,

[-V; (p), pe

e

S2.

The Eq. 4 defines nothing else but the RWG-basis function [2].

4

)

277

The same basis functions can be provided by a graph approach as illustrated in Fig. 1. There are two types of nodes in the graph. The nodes of the first type are called "nondivergent", depicted by crosses and placed at the interconnections of the elements. The nodes of the second type are called "divergent", depicted by circles and placed in the centers of elements. Actually, all divergent nodes are considered as the single node while constructing the spanning tree and the co-tree of the graph. So, the graph in Fig. 1 contains 4+1=5 nodes, 4 branches of the tree, and 5 co-tree branches. There is one-to-one correspondence between the co-tree branches and the fundamental cycles of the graph. In our case, each cycle is short and contains only one branch of the co-tree and one branch of the tree. By imaging that all divergent nodes coincide, one can see that all short cycles are closed loops. Taking into account that each branch of the graph corresponds to a facet function, the RWG-basis is easily derived from the short cycles. 4

Long cycles

Now consider all nodes in Fig. 1 as non-divergent. In this case we have 2 co-tree branches and, correspondingly, 2 fundamental cycles. We will call such cycles as "long" ones. It is easily proved that any basis function, which is associated with a long cycle, has the zero divergence. Those functions can be used for eddy-current analyses at low frequencies when all displacement currents are neglected. 5

Two-step algorithm

As suggested in [3], our goal is to introduce all possible long cycles into the set of basis functions. Since each basis function, which is associated with a long cycle, can be formed by a linear combination of short-cycle functions, one cannot simply add the long-cycle functions to the short-cycle basis. The equal number of short cycles should be excluded as illustrated in Fig. 2. At the first step, all long cycles are constructed in the usual way. After that, the cotree branches of long cycles are removed from the graph. In Fig.2, those are two branches: 1-2 and 2-3. At the second step, the short cycles are formed also in the usual way, but for the new graph with the reduced set of branches. In common, the obtained sets of long and short cycles constitute the required basis for the electric current. 6

Numerical example

A typical behavior of the EFIE solution for different sets of facet functions is illustrated in Fig. 3. By using solely long cycles (red curve), one can effectively compute eddy currents and inductance at low frequencies, but cannot obtain resonant effects on high frequencies. Vise versa, the utilization of only short cycles (black curve) can cause low-frequency problems.

278 X

non-divergent nodes

O

divergent nodes tree

—'—"~ co-tree short cycles — — long cycles Figure 1. Fundamental cycles formed by 2 line-segments and 2 triangles: RWG-basis consists of 5 short cycles.

Step 1

Step 2

Figure 2. Two-step algorithm: resulting basis consists of 2 long cycles and 3 short cycles.

A ' J = 0 (long cycles only) Non-gauged (short cycles only) Long cycles + short cycles

N

101

10°

101

102

103 10" 105 Frequency, Hz

10*

107

10*

Figure 3. Input impedance of a triangular contour obtained with different sets of facet functions.

References 1. Bossavit A. A rationale for "edge-elements" in 3-D fields computations. IEEE Trans. Magn. 24 (1988), pp.74-79. 2. Rao S. M , Wilton D. R., and Glisson A. W. Electromagnetic scattering by surfaces of arbitrary shape. IEEE Trans. Antennas Propagat. 30 (1982), pp.409-418. 3. Cui T. J. and Chew W.C. Accurate analysis of wire structures from very-low frequency to microwave frequency. IEEE Trans. Antennas Propagat. 50 (2002), pp. 301-307.

CROSSTALK SIMULATION OF HIGH SPEED INTERCONNECTS BY A N EFFICIENT FINITE DIFFERENCE METHOD Y. XIAO AND E. P. LEE Institute

of High Performance

Computing , 1 Science Park Road, 01-01 The Singapore E-mail: [email protected]

Capricorn,

K. H. L E E National

University

of Singapore,

Singapore

In this paper, finite difference method with perfectly matched layer boundary condition is applied to extract the character parameters of interconnects. By implementing a passive model order reduction method on the original space from finite difference method, a smaller space is constructed for fast calculation of character parameters in different frequency point with relatively high accuracy.

1

Introduction

As the edge rate becomes faster and faster in digital signals, the crosstalk of signals between adjacent signal traces can be a critical performance limitation in high speed digital circuit 1 . At relatively high frequency, lumped model of interconnects of P C B becomes inaccurate, hence, full wave models become necessary. However, simulation of high level interconnect models is prohibitively CPU expensive. To overcome this problem, model order reduction (MOD) algorithm is applied to construct a simplified system with full wave method for the analysis of crosstalk of high speed interconnects 2 3 . In this paper, finite difference (FD) method in frequency domain is applied with perfectly matched layer (PML) to extract the character parameters of interconnects. In the proposed method, Maxwell's equation is discreted by FD method using Yee's lattice 4 , which will result in a large sparse matrix. To obtain a fast and efficient frequency sweep of the characters of the problem, an Arnoldi method based passive reduced order approximation, passive reduced order interconnect macromodeling algorithm (PRIMA) 5 , is applied to reduced matrices describing electromagnetic phenomena generated from FD to a smaller space. The reduced model retains accuracy of original responses in wide band frequency simulation. Based on the character parameters obtained from FD method, an equivalent SPICE model of interconnects can be constructed to model the crosstalk of the high-speed signals. The application of model order reduction on FD method provides a powerful tool for characterizing the electromagnetic behavior over a broad frequency band. The construction of the reduced model costs some computational cost, which is small compared to the field analysis at a single frequency. 2

P r o b l e m Formulation

In this paper, Maxwell's equations are directly used for the analysis of electromagnetic field. Faraday's law (1) and Ampere's law(2) in frequency domain are

279

280 V xE

= -s^H

- Ms

(1)

V x f f = seE + aE + Js,

(2)

where the permittivity e, electric conductivity a and permeability /x are only position dependent, s = juj is Lapace variable. To solve the electromagnetic problem with finite difference method, (1) and (2) are discreted by using Yee's lattice into a form a V xH -V xE o

€ 0

+s

0/x )

E H

J M

=

(3)

which can be briefly written as (G + sC)X =

~Bu(s),

(4)

where G, C and B are all position dependent. B denotes the column vector resulting from the discrete of the current source, and u(s) current source wave form in Laplace domain. For unbounded problems, the open boundary have to be truncated by artificial boundary condition to limit the computational domain. In this application, perfectly matched layer (PML) is applied to terminate the port boundary. The PML is an efficient boundary condition for guided wave problems and conventional for the usage of model order reduction. If a PML is placed parallel to the xy plane in Cartesian coordinate, the PML we used in the paper is expressed as:

a+

JJ_

a+

(5) ",y,z

In this case, the Maxwell's equation in PML region is '(Vxfi)

+ I£HX + SfiHx = 0

i (Vxi)

+ ^Hy

(Vxi)

+ ^

+ s/iHy = 0

[(Vxi)

+ i£Hx + snHx = 0

and I ( V X E ) + ^Hy

(V x Ez) + snHz = 0

[ ( v x E)

+ ^

+ s/iHy = 0 (V x Ez) + SfiHz = 0

where (V x E)i or (V x H)i, i = x,y,z represents the i-component of the vector VxfiorVxff. Using the same discrete scheme in (3), (6) is discrete in the form a

-Vxl

V xH

+S

e 0 Op

0 V xH ^V x E 0

J M

(7)

Together with (4), the discrete differential equation in the whole computational domain is (G + sC + -W)X s

= B.

(8)

281 In order to apply model order reduction in this equation, (8) should be cast in a linear form of s. let sXe = Wx, whose size is the number of element Ez and Hz in PML domain, (8) is rewritten in the form

C 0' ' X ' G I' + s 0 -I X W 0 d ) e,

=

J" M

The electromagnetic field problem can solved through (9), and a selector matrix L can be used to select certain points field we are interested. Therefore, we can calculated character parameters of interconnects from (D + sP)X

= Bu(s),

(10)

Y(s)

= LX.

(11)

To simulation crosstalk of interconnects in time domain, it is necessary to calculate character parameters in wide frequency band. The solution of (10) is prohibitively CPU expensive for interconnects problem in high frequency. In order to calculate character parameters in wide band efficient and accurate, model order reduction method is applied to reduce the original model to a smaller one. In our work, a PRIMA model order reduction method is applied to reduce the order of original state space from discrete of Maxwell's equation. From (10) and (11), the solution of interconnects problem can be described as sP)-1B

Y(s) = L(D + l

= L{A-sI)- R

(12)

where A = P^D and R = P~XB. In this equation, A € 5ftnXn. PRIMA is a block arnoldi algorithm based model order reduction method. It reduces the system matrix A to a small block upper Hessenberg matrix Hq, where Hq € Rqxq, and g « n . the Arnoldi algorithm involve successively filling in the columns of ICm the relation AK = K,Hq subject to /CT/C = Iq. Here /C € 5ft" x ' is an orthonormal matrix spanning the krylov space Kr{A,R,q) and Iq € RqXq is an identity matrix. By using Arnoldi algorithm, the original system is reduced to Y(s) = LK{Hq -

sIq)-lK,TR

= L(Hq - a / , ) " 1 ^ .

(13)

Clearly, the poles of the reduced-order system are the reciprocal of the eigenvalues of the matrix Hq. A complete pole/residue decomposition can be obtained by the eigendecomposition of Hq. Hq = T A T " 1

(14)

where A is a diagonal matrix, substitution this in (13), we get Y(s) = LT(A - sIq)-lT-lR

(15)

The accuracy of this method gradually increases as the order is increased. The computational time requirement of this method is small, because P is diagonal.

282 I |

5

6

7

S

9

10 11 Fnquwicy (QHz)

FDiolulion FD wHh modal ordw reduction

12

13

14

15

Figure 1. Magnitude of the reflection coefficient for the filter.

3

Numerical Example

In this paper, a filter with close form solution is presented to verify the efficient and accuracy of the algorithm proposed. This filter is formed by an air filled waveguide of length 1.5cm and a dielectric block with length of 1.06cm. The dielectric constant of the dielectric is 2.25, and the cross section of this rectangular waveguide is 2.286cm x 1.016cm. In Fig. 1, the 511 of the filter from FD and FD-MOR are compared. The results show good agreement of FD and FD-MOR solution. But the order of system of FD is 5146, and the order of FD-MOR is reduced to 343. 4

Conclusion

In this paper, a combined finite difference method and arnoldi algorithm based model order reduction method is proposed to extract the character parameters of interconnects. By implementing a passive model order reduction method on the original space from finite difference method, a smaller space is constructed for fast calculation of character parameters in different frequency point with relatively high accuracy. References 1. R. Achar and M. Nakhla, Proceedings of the IEEE, vol. 89, no. 5, 693 (2001). 2. A. C. Cangellaris , M. Celik, S. Pasha and L. Zhao, IEEE Trans. Microwave Theory Tech., vol. 47, 840 (1999). 3. R. D. Slone, J. Lee and R. Lee, IEEE Trans. Magn.,, vol.38, no. 2, 637 (2002). 4. A. Taflove in Computational Electrodynamics: The Finite Difference TimeDomain Method, (Artech House, USA, 1995) 5. A. Odabasioglu, M. Celik and L. T. Pileggi, IEEE Trans. Computer-Aided Design,, vol. 17, no. 8, 645 (1998)

TRANSIENT SIMULATION OF HIGH-SPEED INTERCONNECTS USING COUPLED MODEL ORDER REDUCTION AND FDTD-MACROMODELING TECHNIQUE EN-XIAO LIU ia, ER-PING LI, XIAO YING Institute of High Performance Computing, 1 Science Park Road, Singapore 117528. E-mail: [email protected], [email protected], [email protected] LE-WEI LI, K H L E E 2

National University of Singapore, 10 Kent Ridge Crescent, Singapore E-mail: [email protected], [email protected]

119260.

This paper describes an efficient method for the transient simulation of high-speed interconnects system. FDTD method is employed to fully characterise the high-speed interconnects into a subnetwork. The resultant transient waveforms are then transformed as frequency-domain Y parameters by using FFT techniques. The macromodel of interconnect subnetwork is obtained by Vector Fitting I2) method. This macromodel is then synthesised into HSPICE circuit simulator to efficiently expedite the system transient simulation. A typical microstrip low-pass filter is analyzed to verify the efficiency and correctness of this method.

1

Introduction

With the rapid advancements in modern VLSI technology, high-speed interconnect effects become a dominant factor in determining the system performance [1]. Macromodeling method is effective to enable a complex interconnects system to a lower order model and to achieve a balance between accuracy and efficiency of interconnect simulation. One macromodeling approach is based on frequency-sampling data of interconnects generated from the full-wave electromagnetic modeling. Full-wave modeling becomes necessity when the discontinuity of interconnect structure as well as other electromagnetic effects need to be fully taken into account. In this paper, the interconnect simulation employs the de-coupling FDTD and Fast Fourier transform (FFT) techniques, which are dedicated to extract the admittance (Y) parameters of interconnect subnetwork with fairly complex geometries. Interconnect macromodel can then be created by rational function approximation using robust Vector Fitting [2] method. Finally, the transient simulation of the circuit package containing interconnects can be fulfilled by the macromodel synthesis approach. Numerical examination is presented to validate this technique. Similar interconnect simulation technique has been presented in [3] but not with the Vector Fitting rational approximation method. 2

Three dimensional FDTD macromodeling

Three-dimensional Finite Difference Time-Domain (FDTD) algorithm is based on the discretization of Maxwell's equations usually from the differential form over FDTD unit cell in reference to [4]. We assume that the media are uniform, isotropic and lossless. Only the resultant difference form of Hz and Ez components are shown below, Hn

+ U2

i,.

.M=//"-1/2L. .M+-*L(£"i. . . „ - £ » ! , .

283

)+jL(En\,.

.,,-En\...

..,)

(1)

284

z

(i,;,*;)

z (ij,t)

£^v

y

(i,j,k)

y

(i-l],ky

£&/ x

0,y-U)

*

((,7,*)

The general form of Y parameter matrix is given as, {/(5)}=[y(5)]{V(5)} (3) Where, vector I and V contain n components of the terminal currents and voltages respectively, i.e., {/}={/i./ 2 .•••'„}» { ^ } - { ^ i ' ^ 2 ' " ' ^ } ' which are obtained from Fast Fourier transform (FFT) of transient waveforms. The entry of the Y matrix, such as Yi}, is defined as

3

u

(4).

_ /, ~ v,

Vm = 0 , if m * j ; m = 1 , 2 ••• n

Rational function approximation by Vector Fitting method

The frequency-domain representation of the interconnect subnetwork in (3) can not be directly inserted into the time-domain simulator for transient simulation. An efficient way to address this problem is approximating each of the elements in matrix Y(s) with its corresponding low order rational function, YiJ(s) = c" +j\-^—

(5)

T~i s -

pk

Where c'J is the direct coupling constant, N the total number of poles, and rklJ and pk the pole-residue pair to be computed. Equation (5) can be efficiently solved by Vector Fitting method [2]. The main procedures of this method are described as follows, a) Introduce an unknown function X(s), and approximate it with a set of starting poles /?£ , we get X (5 ) = 1 + £ *=1

^3— s

-

(6)

p k

b) Scale the original response function Ytj (s) with A(s), and take the rational function approximation, we obtain W(s) = X(s)Y,j(s) = « ; ' + £ _ £ - _ c)

(7)

Substitute equation (6) into (7), and for a given frequency point S[

{l-\--m,

i.e., YJ: has m sampled values), one can obtain AiX=bt

(8)

Where A ^ k - p , ) - 1 - ^ - ^ ) - ' Is, - V ' < > /

......-V'/)/

/(s/-Pi)

X={K—r'N

C

ri---7NJ,

1

/(S,-PN))

b,=Y9(s,)

d) Once the unknowns in (8) are computed,

Ytj(s) can be expressed as

Y[j(s) = W(s)/X(s)» which shows that the poles of Ytj(s) are coincident with the zeros of X(s). Substituting the values of the poles into equation (5) and solving the equation similar to (8), we can easily obtain the residues r^and constant ciJof

Yfjis).

285

4

Macromodel synthesis

From section 3, the macromodel of interconnect subnetwork is created. And for a general n-port subnetwork characterized by ml real poles and m2 complex conjugate pole pairs, the state-space representation by Jordan-canonical method [1] takes the following form, ({x} = [A]{x} + [B]u (9) \i=[C]{x} + [D]u Where {x} = {xx x2 x3}7

[A]=

' Ar 0 0

0 Re(/t c ) -lm(Ac)

0 lm(Ac) Re(A c )

,[B) = [Br 2Re(Bc) 0] r , and

[C] = [Cr Ref c ) I m Q f . Matrix A, B and C are derived from pole-residue pairs. The subscripts (r and c) denote the real and complex conjugate pole-residue pairs respectively. Matrix i and u contain the port currents and voltages. The equivalent circuit from equation (6) can be inserted into HSPICE simulator to implement the transient simulation of circuit package. 5

Simulation example

A simple example circuit shown in Fig. 1. is used to verify the accuracy and efficiency of the interconnect-simulation approach presented in this paper. This example circuit is mainly composed of a typical microstrip low-pass filter, whose geometry is the same as that in [4]. The microstrip low-pass filter is simulated using de-coupling FDTD and FFT method to obtain its Y parameters. Some details in FDTD simulation are as follows: the unit cell size in millimeter is Ax - 0.4233, A>> = 0.4064, Az = 0.265 ; the time step is Af = 0.441/w ; total grid size is 80Axxll0Avxl60Az and total simulation time steps are 8000. The dispersive ABC and Gussian pulse source are used in FDTD simulation. Eighteen poles (2 real poles and 8 complex conjugate pole pairs) are extracted by vector Fitting method to match the Y parameters of this two-port low-pass filter up to 15 GHz. The good agreements between the Y parameters obtained by de-coupling FDTD & FFT techniques and Vector Fitting method [2] show that the rational approximation is accurate (Fig. 3.). The transient analysis by HSPICE is shown in Fig. 2. Finally, the Y parameters of the resultant equivalent circuit from the macromodel are analyzed by HSPICE. The results are compared with the preceding ones obtained by decoupling FDTD & FFT techniques (Fig. 4). Good agreements can be observed at the frequency range from 0GHz to 6 GHz, while discrepancy occurs at the stopband of the low pass filter (about 6GHz to 10 GHz).

Fig. 1. Example circuit

Fig. 2. Transient simulation results: voltage at port 2. (Dashed line)

286

Fig. 3. Magnitude of Y l l and Y21: De-coupling FDTD & FFT vs. Vector Fitting Method

Fig. 4. Magnitude of Yl 1 and Y21: De-coupling FDTD & FFT vs. HSPICE

6

Conclusions

De-coupling full-wave FDTD and FFT method combined with rational function approximation is an efficient approach to address the hybrid electromagnetic (interconnect) and circuit problem, in which the electromagnetic field effects are fully considered and the strength of the HPSICE circuit simulator is also exploited. The Vector Fitting method used in this paper provides a considerably accurate way for macromodeling interconnect subnetwork. However, the deficiency of this rational function approximation approach is that it does not always assure a robust passive macromodel, which will need further study. Acknowledgements: The authors wish to thank Dr. Yuan Weiliang for his support in the FDTD simulation. References 1. Achar R. and M. S. Nakhla, Simulation of high-speed interconnects, Proceedings of the IEEE, 89 (2001), pp. 693-728. 2. Gustavsen B. and A. Semlyen, Rational approximation of frequency domain responses by vector fitting, IEEE Trans. Power Delivery, 14 (1999), pp. 1052-1061. 3. Watanabe T. and H. Asai, Synthesis of time-domain models for interconnects having 3-D structure based on FDTD method, IEEE Trans. Circuits Syst., 47 (2000), pp.302305. 4. Sheen D. M. et al„ Application of the three-dimensional finite-difference timedomain method to the analysis of planar microstrip circuits, IEEE Trans. Microwave Theory Tech., 38 (1990), pp. 849-857.

ZIGZAG SLOT ANTENNA WITH CPW FEED N. ANANTRASIRICHAI, A. LORPHICHIAN ReCCIT, Faculty of Engineering King Mongkut 's Institute of Technology Ladkrabang (KMITL) Ladkrabang, Bangkok, 10520, Thailand E-mail: [email protected] J. NAKASUWAN Faculty of Engineering Rajamangala Institute of Technology Pratumtane,12110, Thailand T. WAKABAYASHI School of Engineering, Tokai University Hiratsuka, Kanagawa, 259-1292, Japan E-mail: wakaba@et_u-tokai.ac.jp In this paper, we will consider in the shape of zigzag slot antenna with CPW feed on the planar antenna of dielectric substrate. In this case the slot in the ground plane is the simplest form. We will analyze electromagnetic field and current density of the zigzag slot antenna with CPW Feed. To have the impedance matching at designedfrequency,the real part of input impedance is approached to the characteristic impedance of transmission line and the imaginary part of the impedance is nearly zero ohm. 1

Introduction

The coplanar wave guide (CPW) is used for feeding wave to zigzag slot antenna on the ground plane. The ground plane conductor in ideal will extends out to infinity. The TEM mode of propagation in a coplanar waveguide has electric field lines from the center conductor in ground plane between slot of left and right side and magnetic field lines around the center conductor. To simulate this antenna by using FDTD analysis and design of microwave circuits and antennas software. This software is a full wave electromagnetic simulation code for general three-dimensional (3D) passive structures, particularly planar-oriented microwave circuits and antenna which based on the FDTD algorithm. It can get the results such as input impedance, S parameter and electromagnetic field. For this reason, it can get the good results by simulate. For this simulation, it can create the time step of Gaussian pulse for analysis and predict of the electromagnetic responses from slot antenna. The zigzag slot antenna is designed in order to match the measurement system and has the thickness of substrate is 6 mm and the dielectric constant is 9.8. The zigzag slot antenna has the problem space in the FDTD analysis is Ax = 0.2 mm., Ay = 0.25 and Az = 0.15 mm.

2

FDTD Method

The starting point of FDTD algorithm is the two Maxwell's equation in derivative forms in the time domain. In a rectangular coordinate system the Maxwell's equation in VxE and VxH can be expanded into the following partial differential equations.

287

288

For example x-component of magnetic field and electric field are calculated by (1) and (2). dHx

at

1 , 5Ey — (—Ju oz

a Ex

(1)

p*Hx) oy

1 5Hz e dy

st 3

9Ez

dHy dz

(2)

aEx)

Zigzag Slot Antenna Structure

-IV-:

1

^ - s i . ~~
Conductor

y l>t

-Dielectric

Fig 1. Zigzag Slot Antenna Structure. Table 1. Parameter of Zigzag Slot Antenna Structure.

Parameter

1

w

h

s

X

Value

7 mm

1.5 mm

7.5 mm

0.5 mm

1 mm

In this structure, the value of each parameter is set as in table 1. The result in characteristic of return loss (sll) are -30.35 dB at resonant frequency 5.15 GHz and -36.24 dB at resonant frequency 9.36 GHz as show in figure 2. From the zigzag shape it can be generate more resonant frequency with adjust for matching impedance. For this case we adjust for matching impedance only 2 resonant frequency.

dB

8

10

Frequency (GHz) Fig 2. Characteristic of the return loss.

12

14

1G

IS

20

289

4

Simulation of Electromagnetic Field and Current Density

4.1 Magnetic Field

(a)

(b)

Fig 3. Steady-state profile of the magnetic field (a) at 5.15GHz and (b) at 9.36GHz.

To consideration of electromagnetic field at 5.15 GHz and 9.36 GHz, the propagation wave of zigzag a slot antenna in magnetic field is alternating of intensely as shown in Fig 3. (a) and Fig 3. (b). 4.2 Electric Field

(a) Fig. 4 Steady-state profile of the electric field (a) at 5.15GHz and (b) at 9.36GHz.

At the same time, electromagnetic field at 5.15 GHz and 9.36 GHz are shown in Fig 4 (a) and Fig 4. (b).

290 4.3 Current Density

(a)

(b)

Fig 5. Current density of Zigzag Slot Antenna (a) at 5.15GHz and (b) at 9.36GHz

5

Conclusion

The electromagnetic field in zigzag slot antenna fed by cpw can be considered from the wave propagation along the aperture of slot antenna. The electromagnetic field at resonant frequency 5.15 GHz will be difference from 9.36 GHz. Therefore, the FDTD method was used to analyze the electromagnetic field of slot antenna. Herein, simulation results of zigzag slot antenna coupled by cpw is shown.

References

1. Matthew N.O. Sadiku, Elements of Electromagnetics, second edition, Oxford University Press, 1995. 2. Allen Taflove, ComputationalElectrodynamics The Finite-Difference Time-Domain Method, ARTECH HOUSE, INC., 1995. 3. Yongxi Qian and Tatsuo Itoh, FDTD Analysis and Design of Microwave Circuits and Antennas Software and Applications, Realize Inc., 1999. 4. F.A. Benson and T.M. Benson, Fields Waves and Transmission Lines, CHAMAN & HALL, 1991. 5. I.J. Bahl and P. Bhartia, Microstrip Antennas, ARTECH HOUSE, INC., 1982. 6. N.Anantrasirichai and K. Narkcharoen, Analysis of Electromagnetic Field in Rightangled-Triangular slot Antenna, 2002 International Conferenceon Electromagnetic Compatibility, JULY 2002.

ANALYSIS ELECTROMAGNETIC FIELD OF SLOT ANTENNA

N. ANANTRASIRICHAI, S. PUNTHEERANURAK, C. JAMJANK ReCCIT, Faculty of EngineeringKing Mongkut's Institute of Technology Ladkrabang (KMITL)Ladkrabang, Bangkok, 10520, Thailand E-mail :[email protected] T. WAKABAYASHI School of Information Technology and Electronics, Tokai University, Hiratsuka, anagawa, 259-1292, Japan E-mail: [email protected]

This paper presents the electromagnetic field in slot antenna on the ground plane which designing shape for dual frequency. The electromagnetic field of slot antenna on matching impedance are obtained with resonance frequency about 5 GHz and 10 GHz. The characteristics of slot antenna coupled by microstrip line is proposed and analyzed by using the finite difference time domain (FDTD) method. The slot antenna is designed in order to match the measurement system and has the problem space in the FDTD analysis Ax = 0.152 mm., Ay = Az = 0.15 mm.

1 Introduction In globalization, information and communication technologies have been developed rapidly. Microstrip or printed antenna is widely used in microwave antenna. It is practical to analyze characteristics of many types of planar antennas. On the other hand, the Finite Difference Time Domain (FDTD) method is introduced to solve the complicated problems in electromagnetic field theory. The FDTD method is capable of computing electromagnetic interactions for geometric problems that it is extremely difficult to analyze by other methods. This technique is well-suited for handling complicate microstrip antenna configurations because it can conveniently model the numerous inhomogeneities encountered in these structures. Therefore, FDTD method is also use as a tool to obtain antennas characteristics various aspect. In this paper, we will consider slot antenna on the ground plane which designing shape for dual frequency. Such an antenna is simple and easy to design and fabricate in practice. We will propose in electromagnetic fields which matching impedance at resonance frequency about 5 GHz and 10 GHz. 2 Slot Antenna Structure The slot antenna structure is shown in Fig. 1 (a). The slot antenna on the ground plane is fed by a microstrip line. This antenna consists of two slots on left and right side of microstrip line in the shape of ladder which one is allocated as an mirror image to each other. The microstrip line is designed to be 50 ohms in order to match the measurement system, the thickness of substrate h = 1.52 mm and the dielectric constant £ r =2.17. The width of microstrip line is calculated by formula given in [1] and designed resonance frequency at 5 GHz and 10 GHz.

291

292 Point A

Point B

slot ground plane

Lsv2

I

Lm

1

Lsvl

i substrate

microstrip line

Lsh2 Lshl Wm

(a) A slot antenna on the ground plane.

(b) Configuration of slot antenna.

Fig. 1 A slot antenna structure and configuration of slot antenna.

The dimensions of the microstrip line and slot antenna are as follows: Width of slots = 0.15 mm, Gap between slots = 0.45 mm. Distance between edge of microstrip line to edge of slot (Lm) =13.2 mm. Length of slot in horizontal (Lshl) = 3.45 mm , (Lsh2) = 8.55 mm. Length of slot in vertical (Lsvl) = 5.85 mm, (Lsv2) = 6 mm. Width of microstrip line (Wm) = 4.95 mm. In this case the two resonance frequency are 5 GHz and 10 GHz as shown in Fig. 3 for characteristic of return loss.

3 Formulation of FDTD method 3.1 FDTD simulation The algorithm of FDTD electromagnetic field analysis was introduced by Kane Yee. FDTD technique can be treats in transients conditions such as pulse in the time domain, and computational electromagnetic modeling which can predict and analysis of the electromagnetic responses of complex problems. Hence, FDTD method is used to analyze the antenna as shown in Fig.l. The analytical space consists of 40x244x160 cells with the cell dimension Ax = 0.152 mm, Ay = Az = 0.15 mm. 3.2 FDTD model of antenna Fig.2 show model of slot antenna that simulation of FDTD program. It consists of a dielectric substrate, microstrip line, slot antenna. For region of dielectric constant is among cell x = lldx to x = 20dx in x-dimension of FDTD grid and the substrate is assumed to have a loss tangent of 0.0009 at 5 GHz and 10 GHz. The top of substrate is slot antenna with ground plane. The bottom of substrate is microstrip line. The air region is above ground plane and under microstrip line.

293 ground plane

substrate

Fig. 2 FDTD model of slot antenna.

Formula in Electromagnetic field of FDTD method is analysis region that calculate by central difference expressions by have base on Maxwell's equations which easily derive Yee's famous "leap-frog" algorithm for updating the six electromagnetic field components with respect to a certain type of source excitation. For example x-component of magnetic field is calculated by (1). For example x-component of electric field is calculated by (2). Hr"(i.J.k Er(u.k)=

l-pAt/2n 1 + p At/2u 1 - oAt/2e l + oAt/2e

HT(i.i.lO

., x

h

At/u f,E:(iJ.k)-E:(i.J-l.k) 1 + p At/2u Ay

At/s f,Hr" 2 (j.j+i.k)-Hr'°(i.j.k) 1 + crAt/2£ ii

Ay

EfoJ.lQ-EjO.J.k-DI

Az

..(1)

Hr w o.j-fc+')-Hr"'ftj.fc)i)r - ( 2 ) Az

Where e , a , u and p* are the electric permittivity (F/m) , conductivity (S/m), magnetic permeability (H/m) and resistivity (Q/m) of the medium, respectively.

4 Simulation Results 4.1 Return loss

Fig. 3 Characteristic of the return loss. In this case the resonance frequency for good matching can be obtained at 4.938 GHz and 9.992 GHz. It is seen that resonance frequency is about 5 GHz and 10 GHz. The return loss ( SI 1 ) is -22.459 dB and -22.738 dB.

294

4.2 Analysis Magnetic Field

(a) Magnetic field slot antenna.

( b ) Electric field slot antenna.

Fig. 4 Transverse magnetic field and electric field of slot antenna.

The wave is propagated along a microstrip line through substrate and aperture of slot antenna. The field distribution in this structure can be plotted the steady-state profile of the magnetic field across the aperture of the slot antenna. The profile of magnetic field from the slot antenna in this case is shown in Fig.4 (a). There are two points of maximum amplitude of magnetic field of slot antenna which is the Point A and Point B of slot antenna as shown in Fig. 1. In vice versa, from Fig. 4 (a) the amplitude of electric field is minimum as shown In Fig 4(b).

5 Conclusion The electromagnetic field in slot antenna can be considered from the propagation wave of slot antenna. The magnetic field will be maximum at Point A and Point B of this antenna. Therefore, the FDTD method was used to analyze the electromagnetic field of slot antenna. Herein, simulation results of slot antenna coupled by a microstrip line is shown. From the results the proposed antenna described here is useful for mobile and satellite communications.

References 1. F.A. Benson and T.M. Benson, Fields Waves and Transmission Lines, CHAMAN & HALL, 1991. 2. Yongxi Qian and Tatsuo Itoh, FDTD Analysis and Design of Microwave Circuits and Antennas Software and Applications, Realize Inc., 1999. 3. N.Anantrasirichai and K. Narkcharoen, Analysis of Electromagnetic Field in Rightangled-Triangular slot Antenna, 2002 International Conference on Electromagnetic Compatibility, JULY 2002. 4. N. Anantrasirichai and P. Rakluea, Slot Antenna Coupled by Microstrip Line for Dual Frequency, International Symposium on Nonlinear Theory and Its Applications, OCTOBER 2002.

LOAD CHARACTERISTICS ANALYSIS OF A 100KVA SYNCHRONOUS GENERATOR WITH HIGH TEMPERATURE SUPERCONDUCTING FIELD WINDING USING FINITE ELEMENT MODELLING

K. S. SHIP, J. K. SYKULSKI AND K. F. GODDARD University of Southampton Department of Electronics and Computer Science, Southampton SOU IBJ United Kingdom E-mail: [email protected] In the development of a high temperature superconducting synchronous generator it is important to analyse the magnetic saturation of the machine so that its internal state during load operation prior to the disturbance is accurately predicted. Furthermore, the use of a superconducting field winding makes the machine more vulnerable to system instability. A modest increase in the field current may cause large loss densities in the winding, leading to thermal runaway. This would require the machine to be shut down while the winding temperature returned to normal, which would compromise the reliability of machine. In this paper a method, employing finite element analysis, is established to calculate the saturated reactances of synchronous machines under steady state operation.

1

Introduction

It is well known that superconducting generators have many advantages in the terms of economic and technical benefits over the conventional generators. The ability to predict accurately the electrical response of the machine ensures that the performance is within stability margin. In the case of a high temperature superconducting (HTS) synchronous generator, two particular areas of concern can be identified both associated with losses generated in the 'cold' regions. The losses in the superconductor increase rapidly as functions of both temperature and current density. Consequently it is necessary to determine the temperature and current that may occur in the field winding due to transient events. A two dimensional finite element model is employed. Direct and quadrature axis reactances, are evaluated by finite element analysis taking saturation into account.

Figurel Sketch of the lOOkVA HTS generator.

295

296

The machine used in this evaluation is a 3-phase 2-pole 100-kVA 3000-rpm synchronous generator with a hybrid salient pole rotor. The stator winding is short pitched (14/24) with two parallel circuits. The design of the machine is shown in Figure 1 and the superconducting generator will be built at University of Southampton, United Kingdom [ 1 ]. The rotor will operate in the temperature range 73-77K and be equipped with a high-temperature superconducting field winding made of stainless steel reinforced Ag-clad Bi-2223 tapes. Since most ferromagnetic materials are brittle at liquid nitrogen temperatures, the rotor of the generator will be made of 9% Ni steel. The use of Invar (36% Ni, 64% Fe) was also considered, but the low thermal expansion coefficient of Invar presented great difficulty in connecting it to non-magnetic structural materials. 2

2D Finite Element Model

The symmetry of the machine is exploited to halve the area that needs to be modelled. This reduces the amount of work needed to create the model and reduces the number of elements and nodes required. Figure 2 shows the extent of the models used. The losses in the cold region may be greatly reduced by introducing a thin layer of copper placed over the rotor surface to provide a low resistance path for the eddy currents. The design also includes 9% nickel steel rings placed between the superconducting coils to divert flux away from the coils and the effectiveness of the diverters is shown in Figure 3. C»jj«r lertwl

24O.0 22O.0

Figure 2 The high temperature superconducting synchronous generator 2D model.

3

Finite Element Analysis

In 2D finite element modelling, field current and armature current were used as the input. A nonlinear Poisson equation is solved to determine the permeability of the iron in the steady state condition. The finite element solution is iterative in order to account for nonlinear BH characteristics of the generator magnetic circuits. Linear finite element computations using the permeabilities from the previous solution were then used for calculation of the saturated reactances. In these models, the field winding is unexcited and a current of 1A is applied to the main phase winding with -0.5A in the other two phases. In a model with XY symmetry, there will be X and Y components of flux density and Z directed currents. The flux linking a rectangular loop with two sides parallel to the Z direction is simply given by the product of the length of these sides and the difference in the vector potential (Az) between the points in the XY plane. The flux linkage X is given by

297 X = — \A da a, J "A

a, J

\Azda

a*

Where aA is the region occupied by outward half of the coils in phase A, aA- is the return half of the coils, ai is the area of each outward coil, a2 is the area of the return coil and N is the number of turns in each coil. The flux linkage of an armature phase per ampere of armature current is the synchronous inductance, L, thus the synchronous reactance is simply x = a>L.

4

Numerical Results

A synchronous generator under load exhibits an interesting symmetry condition. Even though the geometry of the generator is symmetric about direct and quadrature axes, the magnetic field is not symmetric about these axes but does retain rotational symmetry about the centre of the generator, with a period of 180° for a two pole generator. The direct-axis reactance, xd and quadrature-axis, xq reactance were found to be 0.195 per unit and 0.132 per unit respectively. The low values obtained are due to the large air gap that is needed to accommodate the thermal insulation. Calculations were also carried out based on the constants defined by Kilgore and Walker [2,3] and it was found to be consistent although some assumptions have been made to the dimension of the rotor pole. A comparison is given in Table 1. Table 1 Reactances of the generator

Parameters xd (direct-axis reactance) xa (quadrature-axis reactance) Xd (direct-axis transient reactance) xj(direct-axis sub-transient reactance) xa'(quadrature-axis sub-transient reactance) x2 (negative sequence reactance)

Finite element 0.195 p.u. 0.132 p.u. 0.041 p.u. 0.015 p.u. 0.013 p.u. 0.015 p.u.

Calculations 0.226 p.u. 0.148 p.u. 0.052 p.u. 0.019 p.u. 0.016 p.u. 0.013 p.u.

During transient conditions, the synchronous machine exhibits further parameters that need to be considered: transient reactances (direct-axis, xd and quadrature-axis, xq ); and sub-transient reactances (direct-axis, jc/and quadrature-axis, xq'). The direct-axis transient reactance is lower than the corresponding synchronous reactance since eddy currents in the

298

field winding prevent any instantaneous change in its flux linkage. There is no difference between the corresponding quadrature-axis reactances, since flux can pass sideways through the field coils without linking them. The sub-transient reactances are lower than transient reactances, since eddy currents in the damper winding (which in this case is the copper screen used to reduce high frequency eddy current losses) excludes the flux from the rotor. The direct and quadrature axis reactances at different frequencies are shown in Figure 4. The values at appropriate frequencies are included in Table 1. Reactance X(p.u. 0.25 -, —*—Xd (field open) 0.20

-B-Xq -X—Xd (field short)

0.15 .

X

X——X— X X

)( X XX — X

0.10

xg>| 9^0Frequency, f(Hz)

Figure 4 Direct and quadrature axis reactances of the generator at different frequencies

5

Discussion

The direct and quadrature axis reactances were calculated for range of frequencies. The direct axis reactance was calculated with the field winding open circuit to give the synchronous reactance, and with the field winding short circuit to give the transient reactance. All three graphs converge to the sub-transient reactance at high frequencies. The values of the saturated reactances calculated using the above method are most appropriate for modelling small transients. If fault currents were to be considered, a non-linear AC model might be more appropriate. 6

Acknowledgements

The development of the high-temperature superconducting rotor at the University of Southampton is sponsored by Engineering and Physical Science Research Council, United Kingdom, and the research team is led by Professor J.K. Sykulski. References 1. Al-Mosawi M. K., Beduz C , Goddard K., Sykulski J. K., Yang Y., Xu B., Ship K. S., Stoll R., and Stephen N. G., lOOkva high temperature superconducting generator, Paper presented at Nineteenth International Cryogenic Engineering Conference, (Genoble, France, 2002) 2. Kilgore L. A., Calculation of synchronous machine constant - reactances and time constant affecting transient characteristics, A.I.E.E. Trans. 60 (1931) pp. 1201-1214 3. Walker J. H., Large synchronous machines - design, manufacture and operation (Oxford University Press, New York, 1981) pp. 131-137

STOCHASTIC MODELING AND CHARACTERIZATION OF ELECTRICAL TREES IN COMPOSITE INSULATION STRUCTURE USING FRACTAL CONCEPTS

R. SARATHI* AND C. R. ANILKUMAR Department of Electrical Engineering, IIT Madras, Chennai- 36, India E-mail: rsarathi @ iitm.ac.in R. JAYAGANTHAN Department of Materials Science, National University of Singapore, Kent Ridge, Singapore 119260 Republic of Singapore E-mail: [email protected] Abstract Electrical trees have been generated both experimentally and through modeling in homogeneous and in composite insulation structure. The structure of the simulated tree patterns characterized using Fractal Dimension. The analysis of tree growth through Weibull distribution analysis indicates that the Weibull parameters explain the dynamical aspects of the electrical trees. The Skewness factor confirms that the tree growth is an intermittent process. The space charge effect on the dynamics of the tree growth was analyzed experimentally. 1. 0 Introduction Polymers have been extensively used in the fields of electrical insulation because of their excellent electrical insulation property. In an insulation structure, under normal operating voltage stress, a series of partial pre-breakdown channels grows from a region of extremely high electrical stress due to imperfection present in the insulation structure especially, due to presence of asperities, gas cavities, and conducting inclusions or intrusions. The pre-breakdown channel formed around the defect sites in the dielectric structure resemble the branches of a tree and hence the name treeing is given to this deleterious process and since such an occurrence is purely due to electrical stress the mechanism is termed as Electrical Treeing [1]. Considerable work has been carried out to understand the dynamical aspects of electrical trees in XLPE insulation under AC voltages. At this stage, it is essential to understand the dynamics of tree especially under the composite voltage formed by DC and AC voltages. A methodical study was formulated and the results acquired were discussed in this paper. The individual branches of the electrical trees are self similar structures and hence Fractal geometry principle can be applied to the study of electrical treeing [2,3]. In the present work, the chaotic behavior of the electrical trees was simulated following the WZ model [4]. The dynamic aspects of electrical trees were studied using Weibull parameters[5]. Structural property relations with the fractal dimension of the simulated trees were analyzed. 2. 0 Experimental In the present work actual 33 kV XLPE cables were used for the study. The samples of 2 cm length were cut from long length of cable. The trees were expected to initiate from a defect point. The needle (defect) used had a nominal tip radius of curvature, 5 |Xm. The selected pins were inserted into the insulation at 130°C and annealed for half an hour to relieve the residual strain at the tip of the needle. The effective thickness between the central conductor and the tip of the electrode was maintained between 3-5 mm. The needle was connected to the high voltage source and the core conductor of the cable was grounded. The laminated insulation structure is prepared using the actual XLPE cable insulation. The cable insulation is cut in to regular size of lcmxlcmxlcm. The needle is inserted in to the insulation structure following the sample procedure carried out for actual cable studies. Then the one side of the insulation is coated (of 2mm thick) with the epoxy resin insulation (CY205 ER and HT 972 hardener) and placed in oven at 90°C for drying. The bottom side of the epoxy resin is coated with silver paint which acts as the ground electrode. The gap between the needle electrode tip and the ground electrode maintained between 3-5 mm. The AC high voltage obtained from 20 kV, 20 KVA

299

300 transformers and for the composite voltages the AC and DC voltages were isolated from one another through proper blocking resistance and capacitors. 3. 0 Modeling of electrical trees In simulating the fractal trees, WZ model has been applied. The geometry used in the present work, is identical to the needle plane configuration in two dimensions, as shown in Fig. 1. The cross section of the dielectrics is represented by the rectangular grid and each and every point of the grid is assigned a potential (tp) which is obtained by solving the modified Laplace equation (includes the influence of permittivity of the medium) in charge free space by solving numerically following the Leibmanns iterative scheme. The modified Laplace equation is written in digitized form [6] as

^=L4(,1+,2)J

(1)

Where Et is the permittivity of the first medium and e2 is the permittivity of the second medium. In the present work, the mode of tree propagation has been studied in a laminated structure using a 64 X 64 grid structure. The first layer of the 64 X 32 units is an insulating material with the dielectric constant of £i and another layer of material with different dielectric constant £2. In the present work, the homogeneous material i.e. Ei = e2 =1 is identified as Type-A material. If Ei = 1 and E 2 =4 is called as Type B material and Ei = 4 and e2 =1 is called as Type C material. In the tree structure, the location and direction of tree propagation is chosen at random such that the probability of propagation is basically a power law P ~ E n where r| is an exponent. In the present work, the value of r|=1.0. When once the prebreakdown occurs, further propagation can be explained by the addition of a link to the previous structure. The potential at each and every point has to be calculated with the altered boundary conditions. A very important assumption made here is that the potential of the connecting links are the same as that of the top electrode. 4. 0 Results and Discussion Fig. 2 shows the optical photographs of different type of electrical trees formed in the XLPE cable specimen under the AC voltages. It is observed that Bush type of Tree (Fig. 2 a) and a Tree like Tree (Fig 2b) structure are formed at the tip of the needle electrode which is connected to high voltages. Fig. 2c shows a typical breakdown path formed in the insulation structure due to propagation of electrical tree and terminating to the ground electrode. When high voltage is connected to the needle electrode, the local electrical field near the needle tip enhances and if the order of magnitude exceeds the maximum electric field of the material, causes incipient damage to the insulation structure. Further, the injection of charges from the high voltage electrode to the insulation structure were trapped near the defect site formed and the charges gets deposited in the surface of the damaged zone causing local reaction with the applied field reducing the electric field in the zone. Also, the decomposition products of the insulating materials subject to discharges are mainly gaseous but unsaturated hydrocarbons and conducting carbons are produced [7]. In general, depending on the electro negativity and the chemical reactivity of the gas contained in the micro voids, it can retard or accelerate the tree growth. Sometimes, the charges were also injected into the insulation structure through the defect formed zoneenhancing field at one point causing further enlargement of channel resulting in "Tree-Like-Tree" structure. Otherwise local discharges will occur causing increased diameter of the damage zone forming "Bush Type" of electrical tree. Fig. 3 shows the Weibull plot for the failure times of the insulation structure due to electrical trees under composite voltages. Table 1 shows the variation in the characteristic life of the insulation material and the shape parameter for the failure data caused due to electrical treeing. It is observed that under composite voltage, increase in magnitude of DC voltage shows an increase in the characteristic life of the material, irrespective of the polarity of the DC voltage. It is also observed that the failure of insulation due to electrical treeing is high under AC superposed over positive DC voltage. In addition, from Table-1 , it is noticed that the slope parameter ((3) obtained under different electrical stress is varying. It is clear that if the |3 value is more than one, it indicates that the failure of the insulation materials is due to local erosion. When the electrical stress is high, the value of (5 is less than one indicating that the failure causes intrinsic failure (puncture) of insulation structure. At lower voltage magnitudes, it is noticed, initially with the certain number of samples, the slope parameter is greater

301 than one and as and when the number of failure of samples is increased, the value of [5 is reduced. Similar characteristics were observed by R. Bozzo et al., showing a reduction in the f$ values with respect to time [8]. This clearly indicates that, in the specimen which is stressed for a long time, especially in the electrical tree formation studies, the failure of the specimen is not only by the electrical stress but also any local condition formed, which alters the failure rate. It means that the local conditions aid the process of failure and cause failure of the materials at the early stage. Fig. 4 shows the simulated electrical tree structures in homogeneous and laminated dielectric structures. The rate of tree propagation in the model is estimated from the certain time step. It is assumed that each added link is one unit time step. The time step is arbitrarily chosen as one second/step and each link is identified as liim. The rate is defined as the ratio of the axial length (1) of the tree structure from the point of tree inception to the time (t) required for a tree channel to approach the required length. Fig. 5 shows the variation in the rate of propagation of the tree structure in the homogeneous and the laminated dielectric structure. In the laminated insulation structure, it is observed that the rate of tree propagation is less as and when the tree approach the interfacial zone and the converse when the tree reaches near to the ground electrode. Fig. 6 shows variation in fractal dimension of the simulated electrical tree structure at different lengths in the homogeneous material and in the laminated dielectric structure. In the laminated insulation structure, it is noticed, near the interfacial zone, an increase in the fractal dimension of the tree structure. It indicates that the tree growth is retarded near the interfacial zone. Table-2 shows the characteristics life of the homogeneous and the laminated insulation structure obtained from weibull analysis. It is observed from the model studies, that the characteristic life of the laminated insulation material is different from the homogeneous insulation structures. Fig. 7 shows the variation in the values of (5 at different lengths of the simulated electrical trees in the homogeneous and the laminated dielectric structure. If the value of (J increases the extent of damage is high due to wear in the material. Fig. 8 shows the variation in skewness of the tree structure at different lengths. In general, if the skewness value is positive, it is the indication that rate of tree propagation in the insulation is high. If it is negatively skewed then the amount of local damage in the insulation structure is high and the rate of tree propagation is less. 5. 0 Conclusions It is observed that the characteristic life of insulation material due to electrical tree is less when the AC voltage is superposed over positive DC voltage compared to AC voltage of same stress. It is also noticed that the failure time is high with the negative DC voltage compared with positive DC voltage superposed with the AC voltage. It is also observed that the values of beta are relatively the same irrespective of the polarity of the DC voltage superposed with AC voltage. The simulated electrical trees are much similar to the experimentally generated tree structures. The fractal dimension of the tree structure provides information about the size of damage in the insulation structure. The weibull parameters and the skewness factor explain the dynamical aspects of the tree structure. The skewness factor indicates that the treeing process as intermittent process. References 1. 2.

R. M. Eichhorn, IEEE Trans. On Electrical Insulation, 12, (1976) PP. 2. B. B. Mandelbrot, The fractal geometry of nature, W.H. Freeman, Sanfracisco, 1982 (book).

3.

Barclay A.L., Sweeney P.J, Stevens.GC, Journal of Physics-D - Applied Physics, vol 23, pp.

4. 5. 6.

Weissman. H.J, and Zeller, Journal of Applied Physics, 60 (1986) pp. 1770. Wayne. Nelson, Applied life data analysis, John Wiley and Sons, NY 1982 (Book). R. Marathi and A. Vijaya sarathi, Conference on Electrical Insulation and Dielectric Phenomena (1999) pp 622-625. C. Laurent and C. Mayoux, IEEE Trans. On Electrical Insulation, EI15, No.l, (1980) pp 33-

1536-45, 1990.

7.

42.

302 R. Bozzo, F. Guastavino, M. Cacciari, A. Contin and G.C. Montanari, Conf. Record of the 1994 IEEE Int. Symp. On Electrical Insulation, Pittsburg, PA USA, June 5-8 (1994) pp 269272.

Needle i Electrode *»i

e, -Interface

-»-*-0

Fig. 1 Geometrical Structure for Tree Simulation

'?-* ; ?g?4:

(a)

(b)

(c)

(d)

(e)

Fig. 2 Electrical Trees under AC voltage Bush Type Tree (b) Tree-Like-Tree (c) Tree Followed with Breakdown (d) Tree Like tree in Laminated Insulation (e) Bush Type Tree in Laminated Insulation

•

s.

^

•

^ • • T •

AC(14k V) AC-DC(3kV) AC-DC(SkV) AC-AC(7kV)

__ »

I n ( T i m e ) in M i n u t e s

l n ( T i m e ) in M i n u t e s

(a) (b) Fig. 3 Weibull Plot of Times to Failure of XLPE Specimen due to Electrical Tree Under Composite Voltages (a) Positive DC (b) Negative DC 0.5 P 0.45

4>

.&'-

H

0.4

ff 0.35 |

°'3

Jt 3 0.15 o.i

Type-A

Type-b

Type-c

Fig. 4 Simulated Electrical Tree based on Present model.

Fig. 5 Rate of Tree Propagation

303

Fig. 6 Variation in Fractal Dimension

-o.el

O

.

tO

.

20

'

'

30 tO LENGTH

'

SO

Fig. 7 Variation in Beta at different length

'

SO

1

70

Fig. 8 Variation in Skewness factor at different lengths Table 1 Variation in Characteristic life and shape parameter at different voltage Levels Voltage Magnitude 14kV AC 14 kV AC+3 kV DC 14 kV AC+5 kV DC 14 kV AC+7 kV DC 14 kV AC-3 kV DC 14 kV AC-5 kV DC 14 kV AC-7 kV DC

Time in Minutes 155 108 121 147 163 157 199

Shape Parameter(P) 1.30 1.57 1.13 1.05 1.21 1.01 1.00

Table 2 Characteristic Life of Insulation Material SI. No

Type of Material

Characteristic Life (a)

1 2 3

Type-A Type-B Type-C

495 536 430

APPLICATION OF ROBUST DESIGN TECHNIQUES TO ELECTROMAGNETIC DEVICES DESIGN OPTIMIZATION X. K. GAO, J. T. LI, Z. XIE, AND Z. J. LIU Data Storage Institute, 5, Engineering Drive 1, Singapore E-mail: [email protected]

117608

The paper discussed a novel product design approach based on robust design techniques for hard disk drive devices, in particular for BLDC spindle motors. The approach is to optimize the performance whilst reducing performance variation at the design stage of new product. Different methods in robust design techniques were employed and their applications were demonstrated by the use of examples. The Unbalanced-Magnetic-Pull and cogging torque of a spindle motor have been minimized, as well as their variations. The results suggested that the robust design is an efficient and disciplined approach that can aid product delivery teams in designing for quality.

1

Introduction

Robust design [1] is an approach to aim at high quality by decreasing performance variability in products and processes. Product performance is influenced both by variables that are controlled by designers and by difficult-to-control variables such as environmental conditions, rawmaterial properties, and aging. The idea of robust design is to select the best values of the easy-to-control variables to minimize the effects of the hard-to-control variables, thus making the product or process robust. With the exceptionally rapid development of data storage technology, the high quality requirement in hard disk drive (HDD) becomes especially obvious. On the one hand, the higher precise spinning motion, larger aerial storage density, higher data transfer rates and miniaturization of the HDD with high performance and lowest cost are the demands in current market. On the other hand, the torque ripple, repeatable run-out (RRO), nonrepeatable run-out (NRRO), acoustic noise and other problems associated with HDD are becoming more serious in quality loss. Therefore finding effective techniques to overcome these problems for performance optimization becomes stringent. In this paper, the novel robust design methods, Taguchi method and Response Surface methodology (RSM), are proposed combining the finite element analysis, to reduce the Unbalanced-Magnetic-Pull (UMP) and optimize the torque performance. 2

UMP Optimization Using Taguchi Method

The configuration of the 8-pole 9-slot spindle motor is not magnetically balanced even without excitation. This will affect the design

304

305

of future high-speed spindle motors as special stiffness and dynamic stability requirements will have to be imposed. Cost will thus increase. Therefore this study proposes a novel design with Taguchi method [2] to minimize the UMP without amplifying the other undesirable features in the performance of the motor. Taguchi's robust design classifies the variables into two groups: control variables and noise variables. An orthogonal array (OA) [2,3] is employed for devising a smallest possible fractional factorial design. In this study we have chosen to represent the quality of the design by the value of signal-to-noise (S/N) ratio, where the signal represents the mean performance and the noise represents the variance.

Figure 1. One-pole/ half-slot illustration of the spindle motor

Figure 2. Field distribution using FE analysis.

In the 8-pole 9-slot spindle motor (Figure 1), the length of air gap, tooth angle and other design parameters or variables may have effect on or be related to UMP. A screening experiment design should be considered to rank the design parameters in order of importance. The thickness of yoke (Hy), tooth angle (A), air gap length (La), and pole-arc to pole-pitch ratio of magnet (P), have then selected out for their substantial effect on the UMP amplitude. Differing from the conventional optimization, Taguchi stresses the importance of studying the variation caused by uncontrollable variables. Therefore, the deviations of these four control factors are selected for noise variables in the experimental simulation and analysis. Using FE analysis (Figure 2), the S/N ratio can be derived from the simulation results. Analysis of Mean (ANOM) is carried out to determine the best level combination to achieve optimal performance. The analytical results are graphically shown in Figure 3. Analysis of Variance (ANOVA) quantitatively determines the effects of different variables on the S/N ratio and illustrates the relative importance of various design variables by using the sum of squares due to each factor, which equals to the total squared deviation of the performance for each control variable from the overall mean. From the results, the thickness of yoke (H) has the most significant influence on the UMP

306

(39.09%), tooth angle (A) comes next (28.05%), while airgap length (L) has the least effect (11.18%). The optimum values of four design variables are HI (0.53mm), L2 (0.3mm), PI (65%) and A3 (38.0°). Table 1 shows the improvement of UMP after using Taguchi method and FE analysis.

Figure 3. S/N Ratio and mean of UMP using ANOM.

Figure 4. Response surface in RSM design

Table 1. Comparison of UMP With and Without Taguchi Method.

UMP Original design Taguchi robust design

3

Mean

Deviation

52.37mN 10.98mN

5.81 5.66

Torque Optimization Using Response Surface Methodology

The torque ripple (dominated by cogging torque) arises from the interaction between the permanent magnet and the ferromagnetic core, and degrades the performance of the spindle motor. In this study, robust design optimization is performed on the torque performance to investigate the electromagnetic behavior of the motor with changes of design variables. Performance evaluation for each setting of variables is simulated. As an effective experimental strategy, Response Surface Methodology (RSM) [4] can build a polynomial approximation model for the functional relationships between torque performance characteristics and design variables. The parametric regression model can then be used to determine the optimum design variable values and for sensitivity analysis. Considering the study of UMP optimization, the same variables of the motor were chosen for the experiment design and analysis. For rotatability of the design with four control variables in the central composite design (CCD) [5], the control variables are studied at five levels represented in coded form by -2, -1, 0, 1, 2. The second-order regression model is thus fitted based on the FE analysis results: y = 0.0178 + 0.0001*! - 0.008x2 + 0.0005x3 + 0.0033x4 - O.OOlx,2 + 0.0033*2 +O.OOO6X3 -O.OOHX4 -0.0006jt,jt2 -••• + 0.0022*3x4

307

In order to check the adequacy of fit in the design region, ANOVA is computed and tabulated in Table 2. Table 2. Comparison Analysis of Variance for Torque Optimization Using RSM.

Regression

29

Sum of Mean F squares __s£iwre__ 2.2753 2.4039E-3 8.2893E-5

Residual

11

0.7547E-3

Total

40

3.1585E-3

Source

DOF

R2

RA2

0.7611

0.4266

6.8609E-5

The optimal point (jc,,jc2,Jc3,Jc4) from the fitted model is (-0.3827, 1.1961, -0.2756, 0.1231). The corresponding settings for the actual variables, are X, = 0.73,X2 = 0.36,X3 = 78%,X 4 =35.18° . Figure 4 illustrates the predicted response surface versus the variables. The optimized cogging torque is 0.0167mNm, which is less about 79.4% than that of original design. 4

Conclusion

The computational examples presented above clearly demonstrate the feasibility for UMP reduction and torque optimization by using FE analysis combined with Taguchi method and Response Surface Methodology. It is clearly shown that the tooth angle not only has important effect on UMP, but also with considerably reduced effect on the cogging torque. In conclusion, it is possible to mitigate potential manufacturing process problems and product failure rates using robust design techniques. These techniques are efficiently applicable to the design of other motors with stringent requirements. References 1. Sung H. Park, Robust Design and Analysis for Quality Engineering, Chapman & Hall, UK, 1996. 2. Genichi Taguchi, Taguchi on Robust Technology Development, ASME Press, New York, 1993. 3. Philip J. Ross, Taguchi Techniques for Quality Engineering, McGraw-Hill, Inc., New York, 1996. 4. Andre I. Khuri and John A. Cornell, Response Surfaces: Designs and Analyses, Marcel Dekker, Inc., New York, 1996. 5. X. K. Gao, T. S. Low, Z. J. Liu, and S. X. Chen, "Robust Design for Torque Optimization Using Response Surface Methodology", IEEE Trans, on Mag., Vol. 38, No. 2, pp. 1141-1144, March, 2002.

Numerical Simulation of Nonlinear Behavior of Electromagnetic Pulses Inside Dielectrics With Nonlinear Susceptibilities MingTsu Mark Ho, Tzer-Hsiang Huang Department of Electronic Engineering, Wu-Feng Institute of Technology 117 Chian-Kuo Rd., Sec.2, Min-Hsong, Chia-Yi, Taiwan 621 Tel: 886-52267-125 Ext. 71306; Fax: 886-52260-213 [email protected], [email protected] ABSTRACT In this paper, numerical simulation of nonlinear behavior of electromagnetic (EM) pulses inside dielectrics is reported. For simplicity, we consider the problem as one-dimensional. We also assume that the nonlinearity depends on the strength, but not the frequency, of the electric field. The present study hinges on the application of characteristic-based algorithm to nonlinear optical simulation in time domain. The corresponding frequency response is also reported. KEYWORDS Linear and nonlinear susceptibilities, Computational Electromagnetics (CEM). I. INTRODUCTION Nonlinear optical phenomenon did not receive much attention until the invention of the laser. It is the basic characteristics of the laser, particularly coherence and high intensity that are accountable for the interaction of the laser light with matter. Such interactions provide information concerning the structure and properties of matters. The related studies of such nonlinear behavior are very difficult for the reasons that the governing equations are quite complicated, and especially that their analytic solutions are usually not available. Yet, if numerically possible, it is interesting to investigate the nonlinear phenomena of EM pulse inside nonlinear dielectrics. Generally speaking, a light wave consists of electric field and magnetic field varying sinusoidally at optical frequencies. The material electric polarization describes the response of the material to the externally applied electric field, assuming that the magnetic effects are negligible, and can be taken as the power expansion in the field. Entering the governing equations are two types of susceptibility of the medium: the linear and nonlinear susceptibilities. They are respectively responsible for the linear and nonlinear properties of the medium. The propagation of EM waves either in free-space or inside dielectric media can be

308

309

completely described by the Maxwell's equations. Incorporating the nonlinear susceptibilities of the medium in Maxwell's equations gives rise to many important effects such as sum-frequency, difference-frequency, second-harmonic generations, and so on. Computational techniques in recent decades have been playing important roles in many areas: fluid dynamics, mechanics, electromagnetic related problems, and many others. In electromagnetic aspects, they can provide numerical approximations to Maxwell's equations both in the time-domain and in the frequency-domain. The time-domain results give details regarding evolution of the fields in time, leading to better understanding of different electromagnetic field problems [1-3]. Meanwhile, the frequency domain information generally presents steady-state characteristics of the interested quantity. For instance, the induced current distribution over object under the illumination of EM waves is the computational results of solving proper integral equations [4-6]. An implicit, characteristic-based solver for the electromagnetic problems is recently proposed, which has been developed for approximating the time-dependent Maxwell's equations. It has also been shown that this new technique is able to accurately predict several electromagnetic phenomena in free space [3]. Furthermore, the present method is capable of handling electromagnetic problems involving objects with complex shapes and can be formulated to solve problems in frequency domain as well. This characteristic-based approach determines flux change in each computational cell. The numerical results are obtained based upon solving Maxwell's equations in a general boundary-conforming coordinate system. It is then reasonable to foresee a promising future of its application to electromagnetic related problems. II. GOVERNING EQUATIONS AND BOUNDARY CONDITIONS The time-dependent Maxwell curl equations, governing electromagnetic problems in source-free region, are given by — + VxE = 0 , dt dt

V x H = 0 , V • D = 0 , and

V B = 0 . The constitutive relations are D = e 0 ( l + x ) E and B=(i 0 |x r H. The linear susceptibility % is the short-hand for x(l)- F°>" media characterized by the nonlinear susceptibilities, x(2) and x<3)> the electric flux density D can be given in terms of the linear and the nonlinear constitutions, and becomes D = e0(l + x (l) + X<2) E + X<3) E E)E = £0E + P where P is the induced polarization expressed in a power series in E. Commonly, for most materials the magnetization effects can be neglected and it is always true to relate B to H simply by B = u«H. Let's consider a two-dimensional formulation and take a transformation from the Cartesian coordinate system (x, y) into a curvilinear coordinate system (£, r|). The

310 time-dependent Maxwell's equations then is rewritten as

1 dt

1 = 0 where Q = Jq, dE, drj

F = J(£xf + £yg), G = i(Tjxf + J] g), J is the Jacobian of the inverse transformation, and the flux vectors q, f, g are: q = [ B x , B y , D J T , f = [0, - E z , - H y ] T , g = [E z , 0, H X ] T where the superscript T represents the transpose of arrays. An implicit closed form of numerical formulation with second order accuracy on time and third order accuracy on space is formed, to which the followings are applied: the central difference operator, the flux vector splitting technique, and the Newton sub-iterative method. This numerical procedure produces approximations to the Maxwell's equation in curvilinear system. Boundary condition treatments are essential to the success of any numerical solver. In the present algorithm, the boundary conditions at the interface are derived from the concept of characteristic variables and are known as the characteristic variable boundary conditions (CVBC's). Every characteristic variable, theoretically and physically, defines the velocity of the information propagation. The characteristic variable associated with the zero eigenvalue, for one case, can be interpreted as the normal component of the total flux densities that must either vanish on a perfect electric conductor (PEC) surface or continue across the interface between media. On the interface of transparent media, the numerical boundary conditions must be that the tangential components of both electric and magnetic field intensities are continuous and that the normal components of both electric and magnetic flux densities are continuous as well. III. CASES STUDIED The model used in this study is simplified as one-dimensional. In order to keep the number of parameters under a controllable range, problems are simplified even further, i.e., there exists only one layer of dielectric material with finite thickness and scalar nonlinear susceptibilities. Although these nonlinear susceptibilities are functions of the frequency of the incident field, we do not take that into account explicitly in this report. The schematic representation of the studied problem and the computational domain are shown in Figure 1. The slab at the middle of thickness d designates the medium, which is described by the relationship D = e0(l + x(l> + X<2> E + x(3> E E) E. Locations R and T are the two of three sampling points respectively for the reflected and transmitted. A third recording point is located in the middle of the slab. The incident EM pulses, normally incident on the slab, are assumed to be in the form of Gaussian distribution, to propagate in the positive x-axis, and to have its electric field with

311 peak value of 1 V/m. The width of Gaussian pulse is defined as one half of the width where the magnitude is about 60.6% (~e~v') of the peak value. For practical reason a rectangular window is applied to the Gaussian pulse such that magnitude below 100 dB would be truncated. For better resolution, the grid density is determined according to the highest frequency content of the Gaussian pulse. As an example, a Gaussian pulse with a width of 10 pico-seconds (ps) has the highest frequency content in the spectrum about 76 GHz. For such Gaussian pulse whose wavelength is about 4 millimeters, it requires a grid density about 2500 grids per meter assuming there are 10 points per wavelength. In order to keep this accuracy, a denser grid distribution is required as the pulse propagates inside the dielectrics. IV. RESULTS In the cases studied, the parameter d is 2 mm and the pulse width is 3 ps having its highest frequency content about 250 GHz. In order to observe effects of nonlinear susceptibilities on the EM pulse, nonlinearities %(2) and x<3) have been exaggerated in magnitude on purpose. Firstly, x* ' and x are set simultaneously to be either +0.5 or -0.5 while Y_(1) is kept constant: 8 for Figure 2 and 3 for Figure 3. Electric fields as functions of time are plotted in Figures 2 and 3 where the reflected and transmitted fields, and that inside the dielectrics are plotted for two different values of x ''s. Secondly, for comparison, results for linear material, x<2) = X<3) = 0, are also included in the plots. It is reasonable to expect that, due to the effects of higher-order x's, pulses penetrating through different materials, as specified by different x's, propagate in different speeds. This can be clearly observed in both figures. For each frame in Figures 2 and 3, the electric field of the pulses propagating through the dielectrics with negative x<2) and Y_(3> (dashed line) lead those with non-negative nonlinear x's- This is reasonable because the negative nonlinear susceptibilities reduce the electric permittivities and hence the refractive indices of the dielectrics. Information concerning the effects of nonlinear susceptibilities on the spectrum is also given. As shown in Figure 4, locations of minima are shifted away from the linear medium (solid lines). So far as the reflected is concerned, the frequency response of negative x<2) and x (dashed lines) shift upstream while that of positive nonlinear x's (dotted lines) shift in opposite direction. V. CONCLUSION The characteristic-based algorithm is capable of incorporating the nonlinear susceptibility terms into the Maxwell's equations and gives reasonable information both on

312 time-domain and frequency-domain. It is planned in the near future to develop the existing numerical procedure for solving monochromatic problems in two- or three-dimension, and in the optical region of the spectrum. REFERENCES [I] K. S. Yee, "Numerical solution of initial boundary value problems involving Maxwell's equations in isotropic media," IEEE Trans. Antennas Propagation, vol. AP-14, pp. 302 -307, May 1966. [2] J. H. Beggs, "Finite-difference time-domain implementation of surface impedance boundary conditions in one, two, and three dimensions," Ph.D. Dissertation, Pennsylvanian State University, May 1993. [3] J. P. Donohoe, J. H. Beggs, M. T. Ho, "Comparison of finite-difference time-domain results for scattered EM fields: Yee algorithm vs. a characteristic based algorithm," 27 IEEE Southeastern Symposium on System Theory, March 1995. [4] J. Patrick Donohoe, C. D. Taylor, MingTsu Ho, A report submitted to U.S. Army Missile Command, AMSMI-RD-WS- UB, December 1992. [5] J. Patrick Donohoe, C. D. Taylor, MingTsu Ho, "Induced currents on slotted bodies of revolution under plane wave illumination," IEEE SOUTHEASTCON 1996. [6] J. Patrick Donohoe, C. D. Taylor, MingTsu Ho, "Response of an open-ended slotted body of revolution to an ultra-wideband electromagnetic pulse," 1997 IEEE AP-S International Symposium and URSI Radio Science Meeting.

Computational Domain Air

j

Air

Ez A

\t 1

d

/

*

* -p

V

z

L

J

Figure 1. Schematic representation of the studied problem (*: sampling points).

313 Reflected

0.25

txd) .X(2) • X(3)] -0.55 0.04

0.08

0.06

[ 8 , 0.0, 0.0] [ 8 , +0.5 +0.5] [ 8 , -0.5 -0.5]

0.1

0.14

0.12

Transmitted Ix(i) ,X(2) , X(3) ] -— [ 8 , 0.0, 0.0] [ 8 , +0.5, +0.5] [ 8 , - 0 . 5 , -0.51

0.08

0.12

0.14

Inside Dielectrics txO) ,X(2) • X(3)] CO CO CO

0.0, 0.0] +0.5 +0.5] -0.5 -0.5]

_X^

0.11 0.13 0.15 ns Figure 2. The effects of y.(2) and x<3) on the electric field

0.01

0.03

0.05

0.07

0.09

under the illumination of a Gaussian pulse with E p ^ = 1 V/m.

314 Reflected

0.2

1x0) ,X(2) ,X(3)] [ 3 , o.o, 0.0] [ 3 , +0.5 +0.5] — 0.06

-0.5]

[ 3 , -0.5 0.1

0.08

Transmitted lx(i) ,X(2) ,X(3)] [ 3 , 0.0, 0.0]

- - I 3 , +0.5 +0.5] [ 3 , -0.5 -0.5]

-

0.06

0.12

0.14

Inside Dielectrics Ixd) .X(2) ,X(3)] — 0.6

-

[ 3 , 0.0, 0.0] [ 3 , +0.5, +0.5] [ 3 , - 0 . 5 , -0.5]

0.4

0.2

0^— ; 0.01

0.03

0.05

0.07

Figure 3. The effects of x(2) and x<3) on the electric field under the illumination of a Gaussian pulse with E p ^ = 1 V/m.

315 Reflected 0

I

V ^ V ^

*^V-

-25

<

~ - _>"-^-

S -50

-75

[X(1),X(2).X(3)1 [ 3 , 0.0, 0.0] [ 3,+0.5,+0.5] [ 3 , -0.5 , -0.5 ]

—

.

-100

50

.

100

150

i

200

250

200

250

200

250

GHz Transmitted

-25

S "50 -75

-100

IxO) ,X(2) ,X(3)] [ 3 , 0.0, 0.0] [ 3 , +0.5 +0.5] -

[ 3 , -0.5 -0.5]

50

100

150

GHz Inside Dielectrics

-25

-50

1x0) ,x(2) ,X(3)] CO CO CO

S

0.0, 0.0] +0.5 +0.5] -0.5 -0.5]

-75

-100

50

100

150

GHz Figure 4. The corresponding spectra for case xm = 3.

T H E N O I S E - R O B U S T V A R I A B L E STEP-SIZE A L G O R I T H M F O R L A T T I C E F O R M A D A P T I V E IIR N O T C H F I L T E R C. BENJANGKAPRASERT, S. TEERASAKWORAKUN, K. JANCHITRAPONGVEJ Research Center for Communications and Information Technology (ReCCIT), and Department of Information Engineering, Faculty of Engineering King Mongkut's Institute of Technology Lardkrabang Bangkok 10520, Thailand Email: [email protected], [email protected] For adaptive algorithm analysis, the best adaptive algorithm requires fast convergence speed, low variance, unbias and low steady-state mean square error (MSE) in both low and high signal to noise ratio (SNR) situation. This paper proposed the new variable step-size algorithm in adaptive lattice form structure. The propose algorithm is to focus on noise robustness and fast convergence speed. In the noisy environment the propose algorithm is highly effective to improving good performance.

1

Introduction

Adaptive Notch Filter (ANF) are widely used in many signal processing applications such as eliminating narrow band signals from board band signals in communication system, sinusoidal detection in radar and sonar, the echoes cancellation in echo canceller, the cancel of 50-Hz interference in the recording of electrocardiograms (ECG). It is known that variable step-size algorithm used to increase high convergence speed and decrease low estimation error. On the other hand, is the more accurate the estimation in the presence of observation noise. In [1] the algorithm has fast convergence speed but still contain large of fluctuation in low SNR environment. This paper propose, a new noise-robust adaptive step-size algorithm perform in lattice form adaptive IIR notch filter which takes non-stationary environment into account. The step-size control algorithm used the correlation of error signals and energy of the output control adaptation speed. The used of step-size control algorithm serves two objectives, first, at the beginning the output signal of the filter has highest value that made step function has highest value and fast convergence. After convergence the output signal of the filter has lowest value made step function as lowest value and low MSE. And second, to rejects the effect of uncorrelated noise sequence on the step-size update to ensuring low misadjustment. We uses the output error correlation to accommodated an effective adjusting of step function when the solution is far from the optimum and step update function decreasing as we approach the optimum even in the presence noise. Finally, the simulation results are given to demonstrate the performance of the proposed algorithm. 2

An adaptive variable step-size algorithm

It is well known that for a fixed value of the step-size for tap weight adaptation to exits a trade-off between the filter convergence rate and steady state error. If the large value of the step-size is used, then the faster convergence is attained, as long as the filter remains stable. On the other hand, the smaller step-size provides more accurate to estimation in the

316

317 presence of observation noise. Moreover, both fast convergence speed and low estimation error could be realized by adaptive step-size control algorithm that proposed in this paper. The output signal of notch filter for [1] is expressed as: y(n) = u(n) + 2k0(n)u(n - 1 ) + u(n - 2) u{n) = x(ri) - k0(n)(l + p)u{n - 1 ) - pu(n - 2) the step-size algorithm used to control the adaptation of adaptive filter coefficient k0 and step function of [1] is given as follows: H{n +1) = «//(«) + yp 2 (n) In [1] used the energy of filter output to control the adaptation to mitigating fast convergence speed but it still adverse effect of noise. The adaptive lattice filter coefficient [1] is shown as follows

k0(n + l) =

k0(n)-M(n)sgn\sgn[y(n)]-P^-\ [ dk0(n)\

The propose algorithm is modified the step function in filter coefficient equation from [1]. Adaptive step-size algorithm viewpoint is the value of step-size close to the theoretical optimization as possible at each time instant. This paper proposes the alternative formulae of the adaptive step-size algorithm to be applied in noisy environment. 3

The new noise-robust variable step-size lattice form structure

The variable step-size algorithm is useful that when the filter coefficient are close to the optimum solution a small step-size is used, othewise a large step-size is applied instead. Based on this algorithm, it provided fast convergence speed and high performance for overall several input signal. In this section we describe the adaptive lattice notch filter structure in [1]. The new noise-robust variable step-size algorithm that is described as fj (n + 1) = a (n)// (n) + yp (n) The used of step-size /i{ri) serves two objectives. First, the fast convergence speed of the update //(«) that we used the energy of the output to control the adaptation expressed as: p(n) = ap{n - 1 ) + (1 - a)y2 («) At the beginning p2(n)

has highest value that made n(n)

has highest value and fast

convergence. After convergence p («) has lowest value made n(n) has lowest value too. Second, to rejects the effect of uncorrected noise sequence on the step-size update to ensuring low misadjustment. We uses the output error correlation to accommodated an effective adjusting of / / ( « ) when the solution is far from the optimum and step update / / ( » ) decreasing as we approach the optimum even in the presence noise. The estimate in the time-varying of the error correlation e(n) • e(n -1) that described as

318 a(n) = ea(n-l)

+ (1 - e)e(n) • e(n - 1 )

e(ri) = x(n) — y(ri) Where a and e are positive constants. The parameter a{n) gives the step-size update /j(n) has the accurate step value of in early adaptation process stage when it has contain large value of noise and not effect by independent disturbance noise for overall adaptation process. Where
Simulation results

In this section, we used the single sinusoidal signal additive white Gaussian noise as the input of adaptive notch filter. The performance of the proposed algorithm compared with algorithm in [1]. The parameters of the proposed algorithm and algorithm in [1] are given as

k0 (0) = 0, SNR = -2 dB, p= 0.9, / / (0) = 0.07, a = 0.98, a (0)=0.95, y = 0.001, £ =

0.0001 and N = 1000. In Figure 1 both from the top and the bottom shows the comparison of adaptive algorithm in [1] and the proposed algorithm in the noisy environment. The proposed algorithm have fast convergence rate and low fluctuation. The performance of the proposed algorithm and [1] have improve by variance, Magnitude of bias, Mean square error that shown in Figures 2,3 and 4. Assume data range N = 1000, iteration in 50 times and find its average ensemble form. SNR in the range o f - 2 dB to 25 dB. It is shown that from the simulation results for all SNR even in low SNR the proposed algorithm has higher performance. 5

Conclusions

In this paper, proposed the noise-robust variable step-size algorithm for lattice form structure. The proposed algorithm is an excellent to achieve the fast convergence speed while it maintains low misadjustment that provided a good stability for instance, its variance, bias, MSE particularly in low SNR.

L-

;

V-

:

WA'V#V

V*

'vy

E1]

]

& •?./>/•

40O 600 BOO Numbers of iteration CO

SNR =

-2dB

1000

120C

proposed

a -0-2 -0.4 -0.6

SNR = -2 d B 400 600 800 Numbers of iteration (S)

1000

1200

Figure 1 The convergence process of filter coefficient in low SNR (SNR = -2 dB)

319

Figure 2 Variance value of adaptive filter coefficient

Figure 3 Magnitude of bias of adaptive filter coefficient

Figure 4 Mean square error of adaptive filter coefficient References 1. 2.

C.Benjangkaprasert, S. Teerasakworakun and K.Janchitrapongvej "Variable step-size algorithm for lattice form adaptive IIR notch filter" ISIC-2001, pp. 138-141,Sep. 2001 T. Aboulnasr and K. Mayyas, "A robust variable step-size LMS-type algorithm analysis and simulation", IEEE Trans. Signal Processing, vol.45, no.3, pp. 631-639, March 1997.

A NEW CONSTANT MODULUS ALGORITHM FOR ADAPTIVE EQUALIZER PORNPIMON TUPCHAI, CHAWALIT BENJANGKAPRASERT, ORNLARP SANGAROON Research Center for Communications and Information Technology (ReCCIT), and Department of Information Engineering, Faculty of Engineering, King Mongkut's Institute of Technology Ladkrabang, Bangkok 10520, Thailand. Email: [email protected], kbchawal(a)kmitl.ac.th

In this paper, we propose a new adaptive algorithm using modified constant modulus algorithm for adaptive equalizer. The proposed adaptive algorithm is used to improved receiver signal quality such as QAM signal. The new constant modulus algorithm gives good performance and fast convergence rate, when used to compensate the distortion signal. Computer simulation results show the effectiveness of the proposed algorithm.

1 Introduction In digital communication, due to frequency selective filtering, echoes and dispersive nature of transmission channels, a common problem of intersymbol interference arises whereby detection of a certain symbol is hindered by symbols sent before or later. Equalizer are systems possessing inverse characteristics relative to channel and are supposed to correct these distortion [1-6]. In communication systems employing high bandwidth efficiency QAM signalling, the constant modulus algorithm (CMA) based FIR equalizer is a popular blind equalization scheme [2,3,5]. Because of its robustness and easily implemented. The signd error version of CMA (SE-CMA) which is motivated by a further reduction in the computation complexity of the equalizer coefficient update. This paper proposed a new adaptive equalizer to improved the performances. 2 Background and the proposed algorithm Consider the baseband model of a digital communication channel characterized by a symbol space FIR filter and an additive Gaussian white noise source. Specifically, the received signal at sample k is given by m-1

r ( k ) = X C i S ( k - i ) + v(k)

(i)

i=0

where m is the length of the channel impulse response, C; is the channel taps, the symbol sequence s(k) is independently identically distributed, and v(k) is a gaussian white noise. A symbol-space equalizer is employed, which has an FIR structure defined by:

y ( k ) = I , w i r ( k - i ) = w T r(k)

(2)

i=0

where nis the equalizer order, w(k)is the equalizer weight vector and r(k) is the equalizer input vector.

320

321 2.1 The constant modulus algorithm CMA) The CMA adjusts the equalizer weights by minimising J CMA (w) = E

(yCkf-y)2

(3)

Using the stochastic gradient descent approach to optimize the CMA cost function, where y is real positive constant defined by: Y = Els(k)| 4 ]/E|s(k)| 2 |

(4)

The CMA updating equalization is given by w(k + l) = w(k) + n * e ( k ) * r * ( k )

(5)

where His a step-size and r*(k)is the complex conjugate of r(k). And e(k) is the error function can be expressed as: e(k) = y(k)( Y -|y(k)| 2 )

(6)

2.2 The signed-error version of CMA (SE-CMA)[4] The signed error version of CMA (SE-CMA) updating equalization is defined as c(k +1) = c(k) + \L * (r* (k)) * sgn(e(k))

(7)

2.3 The new constant modulus algorithm In order to reduce the computational complexity of the equalizer coefficient update, to improved the convergence rate and noise robustness properties. We used the new sign algorithm modified the equalizer update equalization of CMA by retaining only the sign of the input vector function. Using the standard definition of the sign function for a real valued argument, Eq.(5) can be modified of yield the new equalizer update equation, is given by c(k + l) = c(k) + |X*sgn(r*(k))*e(k) where \i is a step-size parameter.

(8)

322

3 Simulation results In this section, the performance of the new algorithm tested by the input signal of 16 QAM digital communication additive noise. Fig. 1 shows the comparison of convergence rate of the equalizers with step-size parameter |J. = 0.001. Fig. 2 shows the equalizer output signal constellations after convergence of the three algorithms. From Fig. 1 and Fig.2, it is seen that the proposed adaptive algorithm performs much better performance than the previous one. a,07

urn

ft

am

%

ss

4

»;«

2ai

an

at

-

co

Fig. 1 Comparison of convergence performance by: CMA, SE-CMA and new algorithm.

W- & #

*

:

•

•'">*

-y •

••x

•

•

• > . •

>

-..V .>¥••;

. * > •

:'-"V " . ) ? > •

:

••'•>•?• •

*\

(a)

%:: . , *

>f ' * < • ' •

•

*

,•

•;j;.-

'

• • • ^ • . •

'

*

:

4 i

•r

:.7{'«-

."v&

s—-t,—•

o'.

(b)

323

*!• '$

•fr.

'Ji..

• $ \ .

•'**'•

•

'

&

•

•

• ' " ' •

'"R* • '..•;» v •' *>;'''

$•'

i!

.

• • * '

•

'

*

.

• i'i

(c) Fig. 2 Equalizer output signal constellation after convergence (a) the CMA, (b) the SECMA and (c) the new algorithm

4 Conclusions This paper, we have proposed a new algorithm for adaptive blind equalizer, From the result has conformed that the proposed blind equalizer slightly outperform the CMA and SE-CMA. And also reduce the distortion signalling in the systems. REFERENCES [1] Shai Qureshi, "Adaptive Equalization", in Adavanced Digital Communication, Kamilo Feher, Ed., Prentice-Hall, 1987. [2] Cusani, R., Laurent A,. "Convergence analysis of the CMA blind equalizer", IEEE Transactions on Communications, Vol. 43, Issue: 2 Part: 3, pp. 1304 - 1307, Feb.March-April, 1995. [3] Schirtzinger, T.A.; Jenkins, W.K., " Designing adaptive equalizers based on the constant modulus error criterion" , IEEE International Symposium on Circuits and Systems, pp.1094-1097, vol.2,1995. [4] D.R.Brown, P.B.Schniter and C.R.Johnson. Jr.,"Computationlly efficient blind equalization", 35 th Allerton Conference on Communiucation, Control and Computing, (Monticello, IL), pp.54-63, Oct, 1997. [5] Johnson, R., Jr.; Schniter, P.; Endres, T.J.; Behm, J.D.; Brown, D.R.; Casas, R.A., "Blind equalization using the constant modulus criterion: a review", Proceedings of IEEE, Vol. 86 Issue: 10, pp. 1927 -1950, Oct, 1998. [6] S. Chen, T.B. Cokk, L.C. Anderson, "Blind FIR Equalization for High-Order QAM Signalling", 2002 6th International conference on Signal Processing Proceedings, Vol. 2, pp. 1299-1302, Aug, 2002.

THE VARIABLE STEP SIZE BLIND ADAPTIVE DECORRELATING DETECTOR PIC IN DS/CDMA SYSTEM S. BENCHAPORNKULLANIJ, C. BENJANGKAPRASERT, M. LERTWATECHAKUL Research Center for Communications and Information Technology (ReCCIT), and Department of Information Engineering, Faculty of Engineering, King Mongkut's Institute of Technology Ladkrabang, Bangkok 10520, Thailand Phone 662-739-2382 Fax 662-326-4176 E-mail: kbchawal(3),kmitl. ac. th. b sirithon&hotmail.com The parallel interference cancellation detector (PIC) for code division multiple access (CDMA) has inferior bit error rate (BER) performance and the detector requires computation of the inversion cross-correlation matrix. Sometime the inversion cross-correlation matrix cannot be obtained if it is a singular matrix. Thus, adaptive decorrelator is proposed to solve this problem using blind adaptive decorrelating detector PIC (BAD/PIC) and reduce the complexity. Since, a blind adaptive without requiring training sequence. Therefore, this algorithm has a superior view of utilizing bandwidth. This paper is proposed to improve the performance of a blind adaptive decorrelating detector PIC using the variable step size algorithm (VSBAD/PIC).

1

Introduction

Multiple Access Interference (MAI) is a factor which limits the capacity and performance of DS/CDMA systems. The matched filter receiver (MF) cannot eliminate MAI. Therefore, the multiuser detection for DS/CDMA was introduced by Verdu in [1] which the optimal multiuser receiver has a good performance but it was too complex to be implemented in real system. So the sub-optimal multiuser receiver has been proposed. In the class of interference cancellation can be divided into parallel interference canceller (PIC) and successive interference canceller (SIC). In this paper interests PIC, we are proposed to improved the performance convergence rate of blind adaptive decorrelating detector PIC [3] using the variable step size algorithm (VSBAD/PIC). 2

System model

In a DS/CDMA system with K users. We consider transmit their information synchronous over a common AWGN channel and over interval [0,T]. The receiver signal can be written as: v

r(t)=Iakbksk(t)+n(t) k=l

where a k is the received amplitude for the k-th user's signal, b k € {- l,+ l} is the k-th user transmitted bit, s k (t) is the signature waveform of the k-th user, and n(t) is AWGN with zero-mean, variance a . The output of the k-th user's matched filters can be written as: T

yk = Mt>k(t)dt o

324

;k = l,2,...,K

325

This can be written in the matrix-vector form as: y = TAb + N where

y = [yi,y2,-,yK]T

>

b = [b 1 ,b 2 ,...,b K ] T

,

N = [n1,n2,...,nK]T

,

ar

A = diag[a ],a 2 ,...,a K J, >d ^ is the cross-correlation matrix is defined as: T

r j k = JSj(t>k(t)dt ;j,ke{l,2,-,K} o 3

Decorrelating detector [2]

From equation the receiver signal used the inverse of the matrix T That is:

to separate the signals.

z = r _ 1 y = Ab _ 1 + N where N is a zero-mean Gaussian vector with the variance matrix a T The decision output of the decorrelating detector defined as: b = sgn(z) 4

The bootstrap algorithm [3]

The inverse cross-correlation matrix is proposed to be generated with a blind adaptive bootstrap algorithm, as shown in Fig. 1 . Let z = Vy = VrAb + N So, the transformation matrix is V = I-W To obtain an adaptive bootstrap algorithm, the update formula is defined as: w k (i + l) = w k ( i ) + n z k s g n ( z k ) 5

The variable step size algorithm

The variable step size algorithm is modified from the bootstrap algorithm by adjusting the step size parameter fx, as shown in Fig. 2. The parameter update recursion of the algorithm is given by: w k (i +1) = w k (i) + n(i)z k sgn(z k ) The objective of variable step size bootstrap algorithm is to ensure large (i(i) when the algorithm is far from the optimum, |^(i) decreasing as we approach the optimum. That is described as: cp(i) = o « p ( i - l ) + e ( l - B E R ) Thus, the variable step size update is given by:

326

n(i + l) = Yn(i) + pcp2 where a , e, y, and P are positive constants. 6

Simulation Result

Consider DS/CDMA system, which support 5 users. Each user transmits 500 bits. The spreading codes are the random generated binary code of length 15. For the perfect power control in Fig. 3, the average BER of 5 users over 10 ensembles versus Eb/N0. The power of each user is 8 dB. From this Figure, the simulation results can be observed the VSBAD/PIC has lower than the BAD/PIC for every Eb/N0 values. For the near-far effect, the power are 1, 8, and 18 dB in Fig. 4, 5, and 6 respectively. In Fig. 4, and Fig. 5,show the average bit error rate (BER) of the lowest-power and the mediumpower user over 10 ensembles versus the Eb/N0 respectively. The VSBAD/PIC has lower BER than that of the BAD/PIC for every Eb/N0 values. And the highest-power in Fig. 6 shown that the VSBAD/PIC approaches the BAD/PIC and DD/PIC for every Eb/N0 values. 7

Conclusion

A variable step size blind adaptive decorrelating detector for PIC (VSBAD/PIC) is proposed has the better performance than a blind adaptive decorrelating detector (BAD/PIC) in case of the perfect power control, lowest-power and medium-power user in near-far situation. And highest-power user in near-far situation has the same performance. References 1. 2. 3.

S. Verdu, Multiuser Detection. Cambridge University Press, 1998. S. Moshavi, Multi-User Detection for DS-CDMA Communication. IEEE Communications Magazine (1996) pp. 124-136. N. Rasrikriangkrai, S. Jitapunkul, S. Kunaruttanapruk, C. Archavawanitchakul and S. Wanichpakdeedecha, Blind Adaptive Decorrelating Detector PIC in DS/CDMA Systems. 24'h Electrical Engineering Conference (EECON-24, 2001) pp. 1127-1131.

_r

I matched filters

transform V-I-W

I MA

I Figure 1. Structure of Blind Adaptive Decorrelating Detector PIC

matched filters

i i

327

Figure 2. Structure of Variable Step Size Blind Adaptive Decorrelating Detector PIC •u

-

-

-

i=

-

- : - -

_ _ _ i

10°

,

:—

- : -

,

1

1 ~""""4^-^l

i

i

L

I

i

-

- : - -

1__ _ , _ _

- ; - -

I

_T^-*^2F

I . I

- : - -

_ | _ _ _ _ _ _

-

-

= = = p - = l=

_ , _ _ _

^ ^ • «

I

'

I

-: -

'

_T^^ 1

'

._ . 1

_ .. _. I

. . I._ .

<

.

I_ _ N ^ L ^

•. -

E 10"2

1— - 1 -— i 1

1

1

1

1

1 1

1

: =

= -

1 — "' > 5 ^ L — ~

I

I

I

'

C

^

^

iff*

_ —

fc

—

1

1

1

1

1 -

\

p - ~\- ~ \- ~'-' " " i " "

-!-• - - ! - --'-

- • - DD/PIC -fr- BAD/PIC - + - VS BAD/PC 10

12

14

=

!"-^si

16

0

2

4

-,6

Eb/No (dB)

- - ! - -- - !

- -!--

- , -•

r

8 10 Eb/No (dB)

"

,

,

... _ _,

(

- - ; \ v

,

-

- - , - - - • - -

12

- - - t^ - - i — —— i — —— i — -i ,

*ii-u. =

14

16

18

Figure 4. BER of the lowest-power user in near-far situation.

Figure 3. BER in perfect power control.

Z _ _,

,,s

-

- • - DD/PC -<J- BAD/PIC - 4 - VSBAD/PIC

8

:

,

-IO" 3

:=

i

\- - - - - -

, _ ._ -

^^-j--!—.---1 r r i c z z c f E i E E E i E r ^ ^ ^ E E EiE E Ei E - E L . . . . 1. _ _ l 1_ _ J . : * ^ ~ \ J _ J.. _ _

1 = = e = i g T ^ T l ^ l j =i = 5 In If : 5 1 i_

- z. - p - = , = - = , - - -,- - = , - - = , - - - X ^ f c i - -

i

i

i

i

r \ _

i

— ^ — F^ — — f— — — 1 — — — 1 — — — 1 — — —3— — =1 = ^ ^ _ ~

I_ - -

- , -" ~ l ~

; l~"~l~

. _; _\

- - . - - - , - - -

- - - - -- - 2

-+- DD/PIC - © - BAD/PC i

i

I

!

i

I

Figure 5. BER of the medium-power user in near-far situation.

- + - VSBAD/PIC |

|

|

j

10

12

\ 14

16

18

Figure 6. BER of the highest-power user in near-far situation.

- -

P E R F O R M A N C E E V A L U A T I O N OF F I N I T E - L E N G T H M M S E - D F E IN W I D E B A N D MIMO CHANNEL P. CHANGSUWAN, M. CHAMCHOY, S. PROMWONG, AND P. TANGTISANON Department of Information Engineering, Faculty of Engineering, King Mongkut's Institute of Technology Ladkrabang, Bangkok 10520, Thailand E-mail: [email protected] This paper presents a random 2 x 2 channel impulse responses that the optimum structure for multi-input multi-output (MIMO) transmission in a fading environment uses a channel decomposition that involves signal processing both at the receiver and transmitter. We are compares the performance of the optimal finitelength MIMO minimum-mean-square-error decision feedback equalizer (MMSE DFE) with channel transmission using a discrete matrix multitone (DMMT) system in the presence of small amounts of ISI. In design of MIMO DFE, fast methods are presented for computing the optimal FIR filter coefficients for a MIMO MMSE-DFE. The results are comparison of performance of DMMT with those of MMSE-DFE. We suggested that somewhat to surprising is performance of DMMT.

1

Introduction

Recently, multi-input multi-output (MIMO) channel has been used to improve channel capacity in wireless communication system. It has been shown that in flat Rayleigh fading environment, the capacity of the channel with M transmit and N receive antennas is approximately minimum of M and N times higher than that of a single antenna system [1]. Consequently, several research works have concentrated on coding for transmit diversity such as space-time codes and orthogonal designs. In CDMA system, the multi-user detection (some forms of MIMO system) can eliminate intersymbol interference (ISI) by spreading it in frequency. In this case, decision-feedback equalizer (DFE) is typically used in such the system. One of the advantages of the DFE is that it is implemented only on the receiver. However, it is still necessary to estimate the impulse response of each of the MIMO channel. For the FIR filter structure, there are three important parameters should be optimized in the system for given FIR feed-forward filter length: the filter delay, the feedback filter coefficients, and the feed-forward filter coefficients. The fast method is introduced for computing the optimal FIR filter coefficients of the MIMO MMSE-DFE [2]. Other technique used to eliminate the ISI through spatial frequency division is discrete multitone or discrete matrix multitone (DMMT) [3]. This system is proposed to reduce the complexity of the spatio-temporal vector-coding communication structure. In this paper, the comparative study of the performance of DMMT with those of MMSE-DFE is presented. However, the channel that was used for comparison is not realistic though this have effected the results. Future work would have to channel models adaptation, and better algorithm for adaptive filters and filter coefficient estimation times.

328

329

2 2.1

T h e S y s t e m and Channel M o d e l Input-Output

Model

We consider the general case of a linear, dispersive, and noisy digital communication system with rij inputs and n0 outputs. Based on the standard complex-valued baseband equivalent signal model. Assuming an oversampling factor of /, the samples at the j t h channel output (1 < j < n0) have the standard form (see Fig. 1 and Fig. 2)

0) 4ef

yfc' = 2 ^ 2 - , n m"" X fc-m + nfc

E h(i '•n

\xi---x( i)

Y

;_„C*.J)J

U)

+ nk

>

(1)

where j k 3 ' represent j t h channel output vectors, /4n are channel impulse responses between the ith input and the j t h output whose memory is denoted by v^'^, and nj, are noise vectors at the j t h output. All of these three quantities are I x 1 column vectors corresponding to the I time samples per symbol in the assumed temporally oversampled channel model. Furthermore, the overall channel impulse response between the ith input and the j t h output is represented by the vector h^'^ = [h^'J'h['' • • • ft« ]. By defining the input auto-correlation matrix [4], R X I = E \^x.k+Nj-\-.k-v xfc+jV/-i:fc-«]! and the noise auto-correlation matrix, R n n = E\n-k+Ns-\-.k nfe+jV/-i:fe]> the inputoutput cross-correlation and the output auto-correlation matrices are given by R-xy — E [Xk+Nj-1-.k-v ^W

2.2

Performance

=

E [Vk+Nf-V.k

Vk+Nj-V.k] Vk+Nf-l-.k}

= R-nH*,

(2) (3)

— HRnH*.

Analysis

The FIR MIMO MMSE-DFE consists of a feed-forward filter matrix, W* d= [ WQ W£ • • • Wff ] , with Nf matrix taps W,, each of size (UQI X rii), and a feed-

FFF

n, x 1 ^

, m x l

Figure 1. Block diagram of the MIMO channel model.

Dec

/i X

___

Figure 2. Block diagram of the multi-input multi-output decision feedback equalizer.

330 back filter matrix equal to [lni 0nixniNb]

— B* = [(Ini — BQ) — B* • • • —B*Nb ] .

Furthermore, we define the matrix, B* = [BQ B{ • • • B]^b], can be show that the MIMO MMSE-DFE error vector at time k is given by (assuming correct pervious decisions) Efc = B*X.k+Nf-l:k-v

~ Wyk+N,-l:k,

(4)

Therefore, the rn x n, error auto-correlation matrix of the MIMO MMSE-DFE is equal to R e e d= E [E£ Efc ] d=lf £ • R X B + G * R y y G . 2.3

Discrete Matrix

(5)

Multitone

In [3], the MIMO analog of DMT is presented. This system involves a small time overhead (circularizing the channel, or pre-transmitting the last few symbols in block before the block), but allows for an elegant solution. In this situation, the inverse fast fourier transform are basis vectors, and thus can diagonalize, the channel matrices. Mathematically, if H is composed of H n m , the sub-matrix representing the circular transmission of a block of N data points from transmitter n to receiver m, then FHF* = [ E n m ] , where E n m represents the block diagonal matrices of eigenvalues. Thus, permuting the result with permutation matrices P T and PR result in a block diagonal matrix, G. Since, P H F H F ' P T = G. Each of the MR x MT blocks can then be diagonalized using the singular value decomposition, and thus the diagonalization of the channel is completed. If G = U V S V * , then Y = U * P f l F H F * P T V X = SX. After channel estimation, the system implementation only really requires the calculation of DMMT may actually be more feasible, in a changing channel environment, than the MIMO DFE, a somewhat surprising result. 3

Simulation Setup and Results

Random 2 x 2 channel impulse responses were produced for 20 different channels. The impulse response was supposed to be three taps, with normally distributed real coefficients. After the coefficients were calculated, each impulse response was normalized to unit energy. This channel model was used because of its simplicity (and ease if calculation). Transmission was assumed to be BPSK. Because the channels are unit energy, a measure of receiver SNR is simply the inverse of the noise power. The pseudo-optimum (feed-back fixed to one tap) MIMO DFE coefficient were calculated for a 6 taps feed-forward filter. The channel diagonalization for a DMMT implementation with a block size of 16 symbols was also calculated. Note that optimally, the input signals to each DMMT subchannel would be adjusted in power and/or bits/symbol given the knowledge of the diagonal channel transmission function. 20,000 random bits were generated, and the number of bit errors was calculated at the output of the DFE and DMMT.

331

Figure 3. Comparison of average BER of MIMO MMSE-DFE with DMMT versus SNR.

Figure 4. Instantaneous BER of each channel of MIMO MMSE-DFE (dash-dot line) and DMMT (solid line) versus SNR.

Comparison of average BER of MIMO MMSE-DFE and DMMT versus SNR show in Fig. 3. The performance of DMMT almost always performs slightly more poorly than the DFE. The result show in the Fig. 4 which is presented all channel of BER of MIMO MMSE-DFE with DMMT versus SNR. 4

Conclusions

The performance of finite-length MIMO MMSE-DFE in wideband MIMO channel and the simple analytical model for random 2 x 2 channel impulse responses were generated for 20 different channels is evaluated. The purpose of this paper is to compare the performance of MIMO MMSE-DFE with DMMT systems in the presence of ISI. This paper is to study somewhat surprisingly suggest that the performance of DMMT, a system in which there are optimal transmitter and receiver filter operations does not perform dramatically better than the MIMO DFE, a receiver-only system. Finally, the channel that was used for comparison is not realistic channel that this would have affected the results. However, requirement permutation of the channel input, the DMMT will not work in this situation. References 1. G. J. Foschini, "Layered space-time architecture for wireless communication in a fading environment when using multi-element antennas," Bell Labs Technical Journal, vol. 1, no. 2, pp. 41-59, 1996. 2. N. Al-Dhahir and A. Sayed, "The Finite-Length Multi-Input Multi-Output MMSE-DFE," IEEE Trans. Signal Proc, vol. 48, pp. 2921-2936, Oct. 2000. 3. G. Raleigh and J. Cioffi, "Spatio-Temporal Coding for Wireless Communication," IEEE Trans. Comm., vol. 46, pp. 357-366, Mar. 1998. 4. S. Haykin, Adaptive Filter Theory, 3nd ed., Englewood Cliffs, NJ: PrenticeHall, 1996.

T H E E F F I C I E N C Y OF W I - F I (IEEE 802.11B) I N P R E S E N C E OF BLUETOOTH SYSTEM INDOOR PROPAGATION T. SUBSON, P. SUPANAKOON, P. RAWIWAN, M. CHAMCHOY, S. P R O M W O N G , AND P. TANGTISANON Department of Information Engineering, Faculty of Engineering, King Mongkut's Institute of Technology Ladkrabang, Bangkok 10520, Thailand E-mail: [email protected] The emergence of difference radio technologies operating in the same frequency bands is becoming one of the most challenging issue due to both can be built into the same platform that may lead to signal interference and result in significant performance degradation. In this paper we consider the effect of Bluetooth system on the performance of Wi-Fi (IEEE 802.11b), occupy the 2.4 GHz unlicensed ISM band. We have created the simulation model of both PHY and MAC layers and measured performance in terms of packet loss, bit error rate and throughput.

1

Introduction

Over the last few years unlicensed wireless network has experienced rapid growth. The two wireless systems that have experienced the most rapid evolution and wide popularity are Wi-Fi, identified as IEEE 802.11b and the Bluetooth technology. Both systems work in the 2.4 GHz Industrial, Scientific, and Medical (ISM) band. Coexistence of Wi-Fi and Bluetooth system creates the potential for interference and causes the degradation performance between both technologies. In this paper, we have created the simulation model of both PHY and MAC layers of Bluetooth and Wi-Fi system. These models analyze in the full ramification detail of the physical layer such as hopping, spectral masks and filter selectivity. The block diagram of Wi-Fi system in presence of Bluetooth radios is shown in Fig. 1. The Wi-Fi and Bluetooth model are base on IEEE 802.11b [1] and IEEE 802.15 specification [2], respectively. Most of the dominant factors have been discussed to evaluate the Wi-Fi performance liked the Wi-Fi payload range, the distance between Wi-Fi and Bluetooth implements, and the number of Bluetooth devices. The efficiency of Wi-Fi system measures in terms of packet error rate, bit error rate and throughput.

2

B l u e t o o t h and Wi-Fi s y s t e m

Wi-Fi system operates at bit-rates as 11 Mbps and employs Direct Sequence Spread Spectrum (DSSS) scheme for high data rate [3]. DSSS signaling technique divides the 2.4 GHz band into 14 twenty-two MHz channels, of which 11 adjacent channels partially and the remaining three do not overlap. Wi-Fi uses complementary code keying (CCK) which includes a set of 256 complex code words transmitted using QPSK modulation. CCK has a code length 8 and a chipping rate of 11 Mchip/s. The 8 complex chips comprise a single symbol. By making the symbol rate 1.375 MS/s the 11 Mbps waveform ends up occupying the same approximate bandwidth as for the 2 Mbps QPSK waveform.

332

333

Figure 1. The block diagram of Bluetooth interference on Wi-Fi system.

2.1

Complementary

Code Keying

(CCK)

The 8-bit CCK code words are derived from the following formula: C = { e^^1+^2+'^3+'^i'>

eJ'Wi+V's+iM g j ^ i + ^ a + W —ej(i'i+i>4) (1)

where C is the code word with LSB first to MSB last, and tpi-ip4 is the phase values of the complex code set. Bluetooth radios operate at a channel bit rate of 1 Mbps [3] and uses FHSS scheme in order to make links more robust. Bluetooth hop frequency (1600 hops/second and 79 hop frequencies) is very high when compared to the radio frequency usage of IEEE 802.11 (2.5 hops/second). In addition, they use Forward Error Correction (FEC), which limit the impact of random noise. These radios utilize Gaussian Frequency Shift Keying (GFSK) with a nominal modulation index of hf = 0.33 and normalized bandwidth of B\,T = 0.5, where Bb is the 3 dB bandwidth of the transmitter's Gaussian low pass filter, and T is the bit period. 2.2

The GFSK Signal

The GFSK signal can be represented by [4] S{t, a)-

A cos (27r/ci + cp(t, a)),

(2)

where A = J ^ , Ef, is the energy per data bit, fc is the carrier frequency, and a is the random input stream, comprised of the data bits a,, S(f>(t,a) is the output phase deviation, given by [4}. The positive effect of variability on the richness can be expressed as: (j>(t, a) = 2-nhf

2J

i=n—L+l

otiq(t — iT) + irhf 2^

a

i

(3)

334 The second sum is the accumulated phase of all previous symbols, and it is called the phase state. Lq(t) = Ji=_oc g{T)d,T, where g(r) is the impulse response of a Gaussian filter, and L is the length of g(r) in bit periods.

3

Simulation Results

We consider a Wi-Fi system using a DSSS scheme and providing an instantaneous rate equal to 11 Mbps. We assume that each Wi-Fi packet is followed by an acknowledgement and information is streamed in a continuous manner; also we assume the data exchange is a symmetric. Fig. 2 shows the Wi-Fi sensitivity in the case of Wi-Fi payload equal to 1600 bytes in the presence of Bluetooth data traffics, as the Wi-Fi payload and the number of Bluetooth device vary. We assume that the transmission of Wi-Fi packet fails whenever the received packet bits are not correct. As expected, the packet error rate increase as the number of Bluetooth grows and a large Wi-Fi payload is considered.

::::::::::::!::::

:!::::::::::•:•« ::::::::! ::::: ::::::::::::*:::: :::!:: ^Z

:^\-

V'"

£

m

i.

" -*• 1 BT -•2BT -m3BT

200

400

i 600

i

i

I

i

800

1000

12O0

1400

1800

WI-FI Payload [Bytaa]

Figure 2. Performance comparison of packet error rate with Wi-Fi payload.

Fig. 3 depicts the bit error rate (BER) curves for Wi-Fi system. We assume that an interference limited environment with carrier-to-noise ratio (CNR) equal to 35 dB, so the main source of error is the interference. We suppose that the interference is always on and that it exists for entire length of Wi-Fi packet. The carrier-to-interference ratio (CIR) is defined before the low pass filter at the Wi-Fi receiver. One sees that the CIR values increase the BER decrease, in contrast with, the number of Bluetooth grows, the BER curves declines. Fig. 4 illustrates Wi-Fi bit error rate as a function of distance from Bluetooth to Wi-Fi implement. At one can see, when the distance of Bluetooth and Wi-Fi is greater than 8 meters, the BER degradation is sharply due to the limitation of Bluetooth coverage range (about 10 meters). We notice that the Wi-Fi performance with three Bluetooth interference is the worse case.

335

0

2

4

6

8 10 CIR (dB)

12

14

16

Figure 3. Performance comparison of BER with CIR.

1

2

3

4

5

4 7 1 Dtatanc* (m)

f

t

10

Figure 4. Performance comparison of BER with Bluetooth distance.

4

Conclusions

In this paper, we present that the performance of Bleutooth on Wi-Fi system operated in the 2.4 GHz ISM band and beased on MAC and PHY layer detail. Which distance of Bluetooth station less than 8 meters from the Wi-Fi receiver severely decrease the Wi-Fi performance. In order to improve the Wi-Fi efficiency, we can increase the Wi-Fi output power. References 1. IEEE Std. 802-11, "IEEE Standard for Wireless LAN Medium Access Control (MAC) and Physical Layer (PHY) Spcecification," June 1997. 2. Bluetooth Special Interest Group, Specifications of the Bluetooth System, vol. 1, v.l.OB 'Core', Dec. 1999. 3. B. P. Crow, I. Widjaja, J. G. Kim, P. T. Sakai, "IEEE 802.11 wireless local area networks," IEEE Communications Magazine, pp. 1-12, Sept. 1997. 4. R. Steeler (Ed.), Mobile Radio communications, John Wiley & Sons. Inc., 1996.

A STUDY OF SELF-SIMILAR PSEUDORANDOM TELETRAFFIC GENERATORS FOR SIMULATION Jong-Suk R. Lee and Hyoung-Woo Park Supercomputing Research Department Supercomputing Center Korea Institute of Science and Technology Information Taejon, Korea Phone: +82 42 869 0521 Fax: +82 42 869 0599 E-mail: {jsruthlee, hwpark}@hpcnet.ne.kr

Hae-Duck J. Jeong Department of Computer Science University of Canterbury Christchurch, New Zealand +64 3 364 2987 ext 7876 +64 3 364 2569 [email protected]

ABSTRACT It is generally accepted that self-similar (or fractal) processes may provide better models for teletraffic in modern telecommunication networks than Poisson processes. Neglecting this fact can lead to inaccurate conclusions about performance of telecommunication networks. Thus we need an efficient generator of long synthetic stochastic self-similar sequences for conducting simulation studies of telecommunication networks. Five generators of pseudo-random self-similar sequences, based on the fast Fourier transform (FFT), random midpoint displacement (RMD), successive random addition (SRA), fractional autoregressive integrated moving-average (F-ARIMA), and fractional Gaussian noise and Daubechies wavelets (FGN-DW) methods, are considered in this paper. We discuss their properties in the sense of their statistical accuracy required to produce sequences of a given (long) length. KEYWORDS Teletraffic generators, Self-similar processes, Hurst parameter, Simulation input generation 1 INTRODUCTION The search for accurate mathematical models of data streams in modern telecommunication networks has attracted considerable interest in the last few years. Several recent teletraffic studies of local and wide area networks, including the World Wide Web, have shown that commonly used teletraffic models, based on Poisson or related processes, are not able to capture the self-similar (or fractal) nature of teletraffic [9], [11], [13], especially when fast modern networks are engaged in such services as, for example, variable-bit-rate (VBR) video transmission [5]. The use of traditional models of teletraffic can result in overly optimistic estimates of performance of telecommunication networks, insufficient allocation of communication and data processing resources, and difficulties in ensuring the quality of service expected by network users [1], [13]. On the other hand, if the strongly correlated character of teletraffic is explicitly taken into account, this can lead to more appropriate traffic control mechanisms. Several methods for generating pseudo-random self-similar sequences have been proposed. They include methods based on fast fractional Gaussian noise, fractional autoregressive integrated moving-average (F-ARIMA) processes [6], output processes of queues [11], autoregressive processes, spatial renewal processes [9], etc. Some of them generate asymptotically self-similar sequences and require large amounts of CPU time. For example, Hosking's method [6], based on the F-ARIMA(0,d,0) process, needs over 1 hour to produce a self-similar sequence with 131,072 (217) numbers on a Pentium II [7], [11]; also see the next sections for further discussion. Even though exact methods of generation of self-similar sequences exist, they are only fast enough for short sequences. They are usually inappropriate for generating long sequences because they require multiple passes along generated sequences. To overcome this, approximate methods for generation of self-similar sequences for simulation studies of telecommunication networks have been proposed [9], [10], [12]. Our comparative evaluation of five methods proposed for generating self-similar sequences

336

337 concentrates on how accurately self-similar processes can be generated. We consider five methods: (i) a method based on the fast Fourier transform (FFT) algorithm and proposed by Paxson [12]; (ii) a method based on the random midpoint displacement (RMD) algorithm and implemented by Lau et al. [10]; (iii) a method based on the successive random addition (SRA) algorithm, proposed by Saupe, D. [4], in its version implemented by Jeong et al. [7], [9]; (iv) a method based on the FARBVIA process, proposed by Hosking [6]; and (v) a method based on the fractional Gaussian noise and Daubechies wavelets (FGN-DW), proposed by Jeong et al. [8], [9]. 2 SELF-SIMILAR PROCESSES AND THEIR PROPERTIES Basic definitions of self-similar processes are as follows: A stationary continuous-time stochastic process {X,} is self-similar with a self-similarity parameter H (0 < H < 1), known as the Hurst parameter, if {Xa} and {cHX,} (the rescaled process with time scale ci) have identical finitedimensional probability for any positive time stretching factor c [2]. This means that, for any sequence of time points ti, ti, ..., tn, and for any c > 0, {cHXth cHX,2, ..., cHX,„} has the same probability distribution as {Xcll, Xcl2, •••, Xcm}. In the discrete-time case, let {Xk} = [Xk : k = 0, 1, 2, ...} be a (discrete-time) stationary process with mean [i, variance a2, and autocorrelation function (ACF) {pk}, for k = 0, 1, 2, ..., and let {Xk<m>} = {X,<m}, X2lm>, ... },k=l, 2, 3, ..., m = 1, 2, 3, ..., be a sequence of batch means, that is, Xkm> = (Xkm_m+! + ... + Xkm)/m, k >1. The process {Xk} where pk —> L(k)k'fi, where L(k) is a slowly varying function, as k —> oo, 0 = pk, for any m = 1, 2, 3, .... In other words, the process [Xk] and the averaged processes {Xkm>J, m >1, have an identical correlation structure. The process {Xk} is asymptotically secondorder self-similar with H = 1 - (fi/2), if pk<m> —> pk, as m —• oo. The main properties of self-similar processes include slowly decaying variance, long-range dependence and 1/f-noise [2], [3], [11]. 3 FIVE METHODS The FFT- and RMD-based methods were suggested as being sufficiently fast for practical applications in generation of simulation input data, for example in [10] and [12]. In this paper, we will report properties of these two methods and the F-ARIMA-based method; and compare them with SRA and FGN-DW, two recently proposed alternative methods for generation of pseudo-random self-similar sequences [7], [8], [9]. These methods can be characterised as follows: • FFT Method: This method generates approximate self-similar sequences based on the Fast Fourier Transform (FFT) and a process known as the Fractional Gaussian Noise (FGN). Its main difficulty is connected with calculating the power spectrum, which involves an infinite summation. Paxson has solved this problem by applying a special approximation. Briefly, the FFT method is based on (i) calculation of the power spectrum from the periodogram (the power spectrum at a given frequency represents an independent exponential random variable), (ii) construction of complex numbers which are governed by the normal distribution and (iii) execution of the inverse FFT; see [12] for more details. • RMD Method: The basic concept of the random midpoint displacement (RMD) algorithm, which generates approximate FBM sequences, is to extend the generated sequence recursively, by adding new values at the midpoints derived from the values at the endpoints. The reason for subdividing the interval between 0 and 1 is to construct the Gaussian increments. Adding offsets to midpoints makes the marginal distribution of the final result normal. For more detailed discussion of the RMD method, see [4], [10]. • SRA Method: A method for the direct generation of an FBM process is based on the successive random addition (SRA) algorithm [4], [9]. The SRA method uses the midpoints in the same way as RMD, but adds a displacement of a suitable variance to all of the points to increase stability of the generated sequence [7]. The reason for interpolating midpoints is to construct Gaussian increments, which are correlated. Adding offsets to all points should make the resulted sequence self-similar and of normal distribution [4]. • F-ARIMA Method: Hosking [6] provides an algorithm for generating an LRD process

338 called fractional ARIMA(0,d,0), the simplest and most fundamental of the fractionally differenced ARIMA processes. The process has Gaussian marginals with zero mean and variance, and fractional differencing parameter d = H - 1/2. The process {Xk} is chosen from the Gaussian distribution N( UK Vk) where Uk is the k-th mean and Vk is the k-th variance. This algorithm requires 0(n2) computation time, because each number of the sequence depends on every previous number. See [6] for further discussion. • FGN-DW Method The method based on FGN and Daubechies wavelets (FGN-DW), and proposed in [8], [9], is based on the strategy proposed in [12]. The algorithm consists of the following steps: (i) calculation of the power spectrum an FGN process, (ii) construction of complex numbers which are governed by the normal distribution and (iii) calculation of two coefficients of Daubechies wavelets which are needed in the inverse Daubechies wavelets transform. 4 ANALYSIS OF SELF-SIMILAR SEQUENCES Five generators are comparable because they have the same statistical properties such as Gaussian distributions, means, and variances. The five generators of self-similar sequences of pseudorandom numbers described in Section 3 have been implemented in C on a Pentium II (233 MHz, 128 MB) computer. We have analysed the accuracy of the five methods. For each of H = 0.6, 0.7, 0.8 and 0.9, each method was used to generate 30 sample sequences of 32,768 (215) numbers starting from different random seeds. We have summarized the results of our analysis in the following Tables 1 and 2: The estimates of the Hurst parameter obtained from the least biased of the H estimation techniques, i.e., the wavelet-based H estimator and Whittle's MLE (see [9]), have been used to analyse the accuracy of the five generators. The presented numerical results are all averaged over 30 sequences. The results for the wavelet-based H estimator and Whittle's MLE with the corresponding 95% confidence intervals (CIs), (see Tables 1 and 2), show that for all input values, the F-ARIMA, the FFT and the FGN-DW methods produced sequences with less biased H values than other methods. Our results show that all five generators produce approximately self-similar sequences, with the relative inaccuracy increasing with H, but always staying below 9%. Table 1: Mean values of estimated H using the wavelet-based H estimator for five generators for H = 0.6, 0.7, 0.8 and 0.9. We give 95% CIs for the means in parentheses. Method F-ARIMA FFT FGN-DW RMD SRA

0.6 .5974 (.593, .601) .6005 (.596, .604) .6013 (.574, .629) .5963 (.591, .601) .5848 (.579, .589)

Mean Values of Estimated H 0.7 0.8 .6990 .7947 (.693, .704) (.787, .801) .6967 .7862 (.692, .700) (.782, .790) .7962 .6987 (.671, .726) (.769, .824) .6907 .7805 (.684, .696) (.773, .787) .6797 .7700 (.674, .685) (.763, .776)

0.9 .8900 (.880, .899) .8639 (.859, .867) .8938 (.866, .921) .8592 (.852, .866) .8499 (.842, .856)

5 CONCLUSIONS In this paper we have presented the results of a comparative analysis of five generators of (long) pseudo-random self-similar sequences. It appears that all five generators, based on the FFT, RMD, SRA, F-ARLMA, and FGN-DW methods, generate approximately self-similar sequences, with the relative inaccuracy of the results below 9%. The results of this research can be extended by designing more computationally efficient self-similar generator able to construct arbitrary long sequences. Such sequences are necessary in parallel computing simulation studies of the telecommunication networks with self-similar teletraffic.

339 Table 2: Mean values of estimated H using Whittle's MLE for five generators for H = 0.6, 0.7, 0.8 and 0.9. We give 95% CIs for the means in parentheses. Method F-APJMA FFT FGN-DW RMD SRA

0.6 .5803 (.571, .590) .6002 (.591, .610) .5849 (.575, .594) .5765 (.567, .586) .5762 (.567, .586)

Mean Values of Estimated H 0.8 0.7 .6628 .7469 (.738, .756) (.654, .672) .7002 .8003 (.691, .710) (.791, .809) .6725 .7620 (.663, .682) (.753, .771) .6567 .7401 (.647, .666) (.731, .749) .6563 .7395 (.647, .666) (.730, .749)

0.9 .8324 (.823, .842) .9002 (.891, .909) .8530 (.844, .862) .8261 (.817, .835) .8252 (.816, .834)

REFERENCES [I] J. Beran. Statistical Methods for Data with Long Range Dependence. Statistical Science, 7(4):404-427, 1992. [2] J. Beran. Statistics for Long-Memory Processes. Chapman and Hall, New York, 1994. [3] D. Cox. Long-Range Dependence: a Review. In H. David and H. David, editors, Statistics: An Appraisal, pages 55-74. Iowa State Statistical Library, The Iowa State University Press, 1984. [4] A. Crilly, R. Earnshaw, and H. Jones. Fractals and Chaos. Springer-Verlag, New York, 1991. [5] M. Garrett and W. Willinger. Analysis, Modeling and Generation of Self-Similar VBR Video Traffic. In Computer Communication Review, Proceedings of ACM SIGCOMM'94, volume 24(4), pages 269-280, London, UK, 1994. [6] J. Hosking. Modeling Persistence in Hydrological Time Series Using Fractional Differencing. Water Resources Research, 20(12):1898-1908, 1984. [7] H.-D. Jeong, D. McNickle, and K. Pawlikowski. A Comparative Study of Three Self-Similar Teletraffic Generators. In Proceedings of 13th European Simulation Multiconference, ESM'99, volume 1, pages 356-362, Warsaw, Poland, 1999. [8] H.-D. Jeong, D. McNickle, and K. Pawlikowski. Fast Self-Similar Teletraffic Generation Based on FGN and Wavelets. In Proceedings of the IEEE International Conference on Networks, ICON'99, pages 75-82, Brisbane, Australia, 1999. [9] H.-D.J. Jeong. Modelling of Self-Similar Teletraffic for Simulation. PhD thesis, Department of Computer Science, University of Canterbury, 2002 (Submitted). [10] W.-C. Lau, A. Erramilli, J. Wang, and W. Willinger. Self-Similar Traffic Generation: the Random Midpoint Displacement Algorithm and its Properties. In Proceedings of IEEE International Conference on Communications (ICC'95), pages 466-472, Seattle, WA, 1995. [II] W. Leland, M. Taqqu, W. Willinger, and D. Wilson. On the Self-Similar Nature of Ethernet Traffic (Extended Version). IEEE ACM Transactions on Networking, 2(1): 1-15, 1994. [12] V. Paxson. Fast, Approximate Synthesis of Fractional Gaussian Noise for Generating SelfSimilar Network Traffic. Computer Communication Review, ACM SIGCOMM, 27(5):5-18, 1997. [13] V. Paxson and S. Floyd. Wide-Area Traffic: the Failure of Poisson Modeling. IEEE ACM Transactions on Networking, 3(3):226-244, 1995.

THE DELAY-BOUNDED SOURCE MODEL S. RUTTANAWIT , M. LERTWATECHAKUL AND P. SOORAKSA Research Center for Communications and Information Technology (ReCCIT), and entific Department of Information Engineering, Faculty of Engineering King Mongkut's Institute of Technology Ladkrabang, Bangkok 10520, Thailand E-mail: [email protected], klmavure(a),kmitl.ac.th, [email protected] This paper proposes a new traffic source model, called Delay-Bounded Source model. A packet stream could be generated in constant distribution and user can adjust the generating time of the packets under a limited time interval. As a result, the source model generates a constant rate packet stream, which its packet generated delay time (compare to constant distribution) is fallen within the assigned time interval. In this model, the generated delay time distribution pattern could be generated in exponential distribution or normal distribution.

1

Introduction

Information Technology (IT) is growing rapidly, various types of application is emerged. Many of which work well in best-try sharing network environment, such as e-mail, FTP and WWW. On the other hand, many applications need a certain resource to fulfill their user satisfaction, such as multimedia application, voice and video on demand. To provide certain resource for a traffic flow, we need such kind of protocols to negotiate a set of traffic parameters corresponding to the specified service class (QoS). The user QoS of traffic flow would be obtained whenever their behavior is conformed to the contract agreement. Nowadays, there are many network technologies were developed. To achieve QoS on a specific network, end system has to follow the network discipline. For example, users are assigned fixed time slots in Time Division Multiplexing (TDM) network. A user's QoS is deterministic, predictable and isolated from the other users. The ATM (Asynchronous Transfer Mode) defines a set of traffic management functions to support a flexible set of services offering different QoS. While the servicewindow concept introduced in W-TDM (Window based TDM-like Scheduling) [4] is to enhance TDM method in service-time flexibility. In this paper, we propose a new traffic source model, called Delay-Bounded Source (DBS) model. The model was developed in the OPNET modeler 6.0. Using DBS model and other sources model, we could evaluate the QoS performance of a network model such as ATM, W-TDM etc. through difference kinds of traffic source. 2

Source Models

There are two basic philosophies for characterizing source traffic parameter: deterministic and random. The deterministic traffic model (which is control by leaky bucket algorithm) clearly defines the source characteristics as understood by the user and the network. The other philosophy for modeling traffic source behavior utilizes random models to define traffic parameters. Here, we describe some popular traffic models including the mathematical method for generating random variables from them.

340

341

2.1

The Constant Distribution Process

It has a probability distribution function represented as: P(X)=\

*=C [ otherwise It has an expected value E(x) = C, Naturally the parameter C must be an integer. 2.2

Exponential Distribution

The exponential distribution process has an exponentially distributed inter-arrival times with rate parameter e. The probability density function is represented as:

f(x) =

x<[

It has an expected value E(x) = 1/b. 1.3

Normal Distribution Process

The normal (or Gaussian) distribution process has the probability density function is represented as:

•JITZO

It has an expected value E(x) = i. 3 1.1

Delay-Bounded Source (DBS) model Delay-Bounded Source (DBS) algorithm

Some network model discipline allows packets of a traffic flow to be sent periodically with some relax delay (such as W-TDM). Constant distribution function periodically generates the packet stream represented as:

ip-'p

+j

where tpk is the generating time of the packet kth, e representative of mean packet generating rate. The DBS The generating packets are concerned there are two ways of how we consider a discrete-time queuing model. That is early generating (tj and lately generating (ti), when the generating packet early than its reference time (tp) we assign DBS generates the packet by constant distribution function minus with delay time. When the generating packet lately than its reference time we assign DBS generates the packet by constant distribution function plus with delay time represented as:

342

where z is generated with exponential distribution function or normal distribution function. User can defined maximum delay time (z) under a limited time (d), z < d. »

X«-

t m t t, t-p

. k-}

«C

TT

»/>

TT

»/

TT

&/7

TTT7

*/J

^ k+2 W"

Figure 1. Delay-Bounded Source Algorithm

1.2

Delay-Bounded Source (DBS) process

We explain the state transition diagram of the OPNET process model, as shown in Fig. 2. The transition diagram of the DBS algorithm consists of four states: init state, wait state, send state and end state. The init state load user specified parameter such as distribution model and maximum delay jitter time (z) and transfer to the wait state. In the wait state, the process waits for self-interrupt which is scheduled in a table for activating the process to generate a new packet in send state.

^mm?

Figure 2. Delay-Bounded Source Process

1.3

Delay-Bounded Source (DBS) result

We compare the simulation result of DBS model with different distribution pattern for generating delay time (z). The average generating rate (e) was defined as 0.2 packets/second and generating delay time (z) equal to 2 for every traffic source. Fig.3 and fig.4 show packet-generating time of the constant-packet stream compared to a DBSpacket stream with exponential distribution and normal distribution accordingly. The results show that the DBS model could generate a packet stream with a specified generating rate while its generating time could be vary under a defined boundary.

343 .IW T T W8!^L...../.............. J M M& twos*? itsrm

;rr

TTT

Figure 3. Constant source and DBS with exponential distribution, t=2 sec, 1/X=5 sec. •|F»W''W^

Tpipnnnnr Figure 4. Constant source and DBS with normal distribution, x=2 sec, 1/X=5 sec.

2

Conclusion

The Delay-Bounded Source (DBS) model is a new traffic source model developed on the OPNET Modeler 6.0. Using the DBS model, users can generate a packet stream in A packets/sec under specified generating delay time bound. The DBS traffic shape is look seem like the output of leaky bucket which is working in full load but its generated time is variant by specified distribution function. As a result, the DBS model could be used as a traffic source to determine the effect of different source models on performance of a network model.

References 1. Edward R. Doughety, Probability and statistics for the engineering, computing and physical sciences, Prentice Hall (1990) 2. John J. Komo, Random signal analysis in engineering systems, Academic Press (1987) 3. K. Sam Shanmugan and A. M. Breipohl, Random signals detection, Estimation and data analysis, John Wiley & Sons (1988) 4. M. Lertwatechakul, R.Warakulsiripunth,QoS quarantee in ATM network Employing Window-based TDM-like Scheduling, In Proceeding of Asia-Pacific Symposium on Broadcasting and Communications, (2000) pp. 383-388. 5. Gerd Keiser, David Freeman, Carrie Carter, 1995. ATM Test Traffic Generation Algorithms, In Proceedings of Fourth International Conference on Computer Communications and Networks, (1995) pp. 462-469. 6. Chukwuemeka N. Aduba, Matthew N. O. Sadiku, Simulation and Analysis of different Traffic Models for ATM Networks, In Proceedings IEEE Southeast Conference, (2002) pp. 73-75 7. OPNET Contributed Model Depot http://www.opnet.com/services/depot/home.html

INVERTIBLE INTEGER FFT AND DCT APPLIED ON LOSSLESS IMAGE COMPRESSION

YAN YUSONG', WANG CHUNMEI2 , SU GUANGDA', SHI QINGYUN 3 1

Department of Electronic Engineering, Tsinghua University, Beijing 100084, P.R.China 1

School of Mathematical Sciences, Peking University, Beijing 100871, P.R.China 3

Center for Information Sciences, Peking University, Beijing 100871, P.R.China Abstract: This paper completes the construction of FFT (Fast Fourier Transform) that map integers to integers by using Lifting Scheme and butterfly-style construction, which is the basis for invertible integer DCT. Experimental results using integer DCT for lossless image compression are given and show that this transform is fast and efficient for image compression.

1

INTRODUCTION In paper [1], Daubechies etc. advise lifting steps to rebuild wavelet transforms that map integers to

integers.

Using this integer version of wavelet transforms, the computations are still done with

floating point numbers, but the output is guaranteed to be integer and invertibility is preserved, which is crucial to lossless image coding. As traditional image compression methods, FFT and DCT have wide applications. Moreover, to some extent, DCT is better than wavelet for image compression with a higher compression ratio. So we want to construct invertible DCT that maps integers to integers. Lifting scheme is briefly reviewed in Section 2. In Section 3, we give a detailed analysis for FFT's butterfly structure, and try to adapt lifting to integer FFT. Finally, integer DCT is built and applied to lossless image compression; the relative results show its efficiency.

2

LIFTING SCHEME First, we briefly give lifting scheme for an upper triangular matrix. Given a transform:

^1-

1 a\ *i] 0 l'j x2\

where Xx, X2 are inputs, y1, y2

are outputs, and a

(2.1)

is a floating number. Now we can construct

its invertible transform:

j>, =x, + |_a x2\ (2.2)

y2=x2 where [_xj means rounding-off.

According to equation (2.2), we can see, if X{, X2 are integers, the

computed yt, y2 are integers also. Therefore, (2.2) defines a transform mapping integers to integers. Easily we can get its invertible transform:

x2 = y2 I

xx=yx-\ax2\ 344

I

(23)

345 It is clear that (2.3) is also integer transform.

Similarly, this can be done for a lower triangular matrix.

In paper [2], the invertible wavelet transform that maps integers to integers can be constructed according to the following steps: first divide a transform matrix into the multiplication of triangular matrices, whose diagonal elements equal to one; then for every triangular matrix, compute its invertible transform which maps integers to integers; finally, multiply all these transforms in order and get the final transform.

This step is called as lifting, and the triangular matrix with diagonal elements

equaling to one is called as lifting matrix. Although the computations are still done with floating point numbers, the results coming from the upper process are guaranteed to be integer and invertibility is preserved. Therefore, this transform can reduce signals' redundancy efficiently, and benefits data's lossless compression First, give two basic matrices' lifting: 1) Lifting for scaling matrix:

=

v4

i k-k2~\ 0 1 'j

i ol 1 ]/k l'j[0

k-\\ 1 01 1 'j 1 l'j

(2.4)

2) Lifting for rotation matrix: cos a sin a

- sin a~\ I = cos a J

•%(«/2)l

1

1

sin a

!l

-«(a/2)l

(2.5)

l'j So for scaling and rotation transforms, their invertible integer transforms can be gotten easily [2].

3

INTEGER FFT Discrete Fourier Transforms find wide applications in signal processing, and 2-D DFT is defined

as follows: F

1

-ilnnu^

L"] = - r = Z F M e x P [ — r , — ] '

^ M = -7Tf2. F Mexp[——]

(3-D

At the same time, a fast algorithm called as FFT is designed. In this paper, we shall concentrate on the case N=2m.

Figure 3.1(a) shows FFT with a length of eight. N

(a)

al

Dataj

a2

Data j+N/2

(b)

Figure 3.1 (a) Eight variables' FFT, where O represents complex numbers (b) Simple butterfly-style transform, where a l , a2 are outputs, b l , b2 are inputs The transform for Figure 3.1(b) is:

346

Wi 1 * 1 . 1 < / 2 | JIAJ

1 a

where WX = exp[

2j

—] .

(3.2)

i

-^'j

In fact, for better compression ratio, the dynamic range of

TV compressed data should be as small as possible. So in equations (3.1), we multiply a factor _L_ on •JN

both sides of DFT; that is to say, multiply _ L for equation (3.2), and we can get changed equation:

4i Wi 1 V2" Wi VJ

1

V2

a ]

'\ -

-Jl where w£ = exp[-

1

Lv2

(3.3)

VIj

- Yin j ~~N

•

According to upper discussion, FFT can be broken down into a combination of many butterfly transforms. And for every butterfly, we can rewrite its transform as follows: 1

W

J_

w

1

Obviously,

«

0

0 i1

i

IJ

(3.4)

1

• WJ 'j

o

J

•yj2|

1 01 and 0 1 1]

0^

are lifting matrices.

J?/

Moreover,

/2 0

1 J

1

° I 's

a

sca m

' 8

V2lj

matrix, and has its own lifting scheme according to equation (2.4). Therefore, we only need to prove that '

" 11 has its invertible integer transform also.

As we know, most FFT algorithms suppose the input signal is complex; therefore, it is possible to get lifting scheme for b = bx + ib

W^

, a = ax + ia

: the transform between complexes:

b = W^ x a , where

, can be rewritten as follows:

s1 .VJ'

cos# sin#

-sin#] I "*} cos# J ay\

where

9=~¥2xj

(3.5)

N

According to lifting equation (2.5), we can get the corresponding invertible integer transform which maps complex integer a to complex integer b . Accordingly, there is invertible integer transform for b = -W^ x a', so we can get the relevant invertible integer transform for 1 0 1. 0 -W^\ That is to say, we can construct invertible integer transform for a single butterfly transform (3.4), which leads to invertible integer transform for FFT. Invertible integer FFT coming from the upper steps has the following advantages: 1) which maps integer to integer, and has no errors because of lacking floating point computation; 2) which preserves

347 high precision, if there exists floating point numbers' computation; 3) which keeps the dynamic range. So such integer FFT has a better performance for complex data. In fact, generally when FFT operates on real signals, the output data are symmetric and conjugate; so we only need to record half of these complex data for restoring original signals. characteristic is crucial for data compression and integer DCT's realization.

Such

In later paper, we will

give integer FFT, which has conjugate symmetry for real signals, and correspondingly deduct invertible integer DCT. 4

EXPERIMENTAL RESULTS Using integer DCT coming from integer FFT, we compress several standard images losslessly.

Here, 8 x 8

integer DCT and arithmetic coding are exploited.

experimental results.

Table 4.1 shows the relative

Clearly, high performance can be achieved when using integer DCT first. Table 4.1 Integer DCT for lossless image compression

Standard image

Original Size (bytes)

Only Arithmetic Coding (bytes)

Integer DCT + Arithmetic Coding (bytes)

Lena

262144

237466

157958

Zelda

262144

229142

146500

Girl

262144

216736

153388

Barb

262144

239026

170803

Plane

262144

203268

155801

Flower

262144

220888

132934

Goldhill

262144

222413

172176

Integer DCT can not only be used for lossless image compression, but lossy image compression. Combining it with progressive coding methods together, we can realize grey image compression from progressively until losslessly.

Moreover, in paper [6], invertible integer transform from RGB space to

YCbCr color space can be constructed.

So, progressive until lossless color image compression can be

completed finally. Of cause integer DCT is useful for other data's lossless compression, such as audio signal. same time, there are quantities of fast algorithms for DCT.

At the

Therefore, it is worth doing research on

how to combine fast algorithms with invertible integer transform together, and it will reduce computational complexity accordingly. Reference 1. R.Calderbank, I.Daubechies, W.Sweldens, and B.-L.Yeo. Wavelet Transforms That Map Integers to Integers. Technical Report, Department of Mathematics, Princeton University,

1996.

http://cm.bell-labs.eom/who/wim/papers/papers.html#integer 2. Daubechies, and W.Sweldens. "Factoring Wavelet Transforms Into Lifting Steps". Technical Report, Bell-Laboratories, Lucent Technologies, 1996. 3. Xia Deshen and Fudesheng, Technology and Application for Modern Image Processing. Nanjing: Southeast University Press. (In Chinese) 4. William B.Pennebaker, Joan L.Mitchell. JPEG Still Image Data Compression Standard. NewYork: Van Nostrand Reinhold, 1993. 5. H.V.Sorensen etc. "Real-Value Fast Fourier Transform Algorithms". IEEE Transactions On Acoustics, Speech, and Signal Processing. VOL .ASSP-35, No 6, JUNE 1987. 6. Yan Yusong, Shi Qingyun. "Reversible Color Space Transform", Pattern Recognition and Artificial intelligence, Hefei, 1999, volume 12. No. 1 (in Chinese))

SOLUTION OF SCATTERING BY HOMOGENEOUS DIELECTRIC BODIES USING PARALLEL P-FFT ALGORITHM 'WEI-BIN EWE, u Y A O - J U N WANG,

1|2

LE-WEI LI, 3 ER-PING LI

'Department of Electrical and Computer Engineering, National University of Singapore, 10 Kent Ridge Crescent, Singapore

119260.

2

High Performance Computation for Engineered Systems (HPCES) Programme Singapore-MIT Alliance (SMA), Singapore/USA 119260/02139 3

The Computational Electro-Magnetics & Electronics (CEE) Institute of High Performance Computing (IHPC), Singapore

Division 117528

In this paper, pre-corrected FFT (P-FFT) is employed to solve electromagnetic scattering by threedimensional homogeneous lossy dielectric object. Triangular patch has been used to model the surface of the dielectric object. Meuiod of moments is applied to solve integral equation formed by Poggio, Miller, Chang, Harrington, Wu and Tsai (PMCHWT) formulation. Pre-corrected FFT algorithm, which is a 0(N2l2logN) algorithm, is applied in order to speed up the matrix-vector multiplication in the iterative solver. It also eliminates the need of generating and storing of impedance matrix and thus reduces the memory requirement. When implementing the P-FFT algorithm, most of the computation time has been taken up (up to 75%) to perform forward and inverse FFT. The computation will be more efficient if more processors are available for parallel implementation of forward and inverse FFT. The MPI (message passing interface) has been used for parallelizing the P-FFT algorithm on the platform, IBM p690 PowerPC_POWER4: AIX 5L. Numerical results are presented to demonstrate the accuracy and capability of the proposed method on high performance computers.

1

Introduction

Integral equation methods have been used for the solution of electromagnetic scattering problems by three-dimensional homogeneous bodies. In this approach, the problem is formulated by choosing appropriate integral equation and then applies Method of Moments (MoM) to form a matrix equation. Solving the matrix equation using normal matrix inversion techniques such as Gaussian elimination and LU decomposition require 0(N 3 ) floating point operations and 0(N 2 ) memory storage. An iterative solver such as Conjugate Gradient (CG) require 0(N 2 ) floating point operations per iteration and 0(N ) memory storage. The computational complexity and memory storage requirement of traditional matrix inversion solver and iterative solver preclude their application to large problem. Pre-corrected FFT (P-FFT) algorithm has been proposed to overcome the drawback of traditional MOM solver. It was originally proposed to solve electrostatic problems by Phillips and White [1]. Recently Nie et al. [2] have extended the P-FFT to solve electromagnetic scattering problem by perfectly electric conducting (PEC) object. For surface scatters, P-FFT has 0(N3/2logN) computational complexity and 0(N3/2) memory requirements. The performance of the P-FFT algorithm is highly dependent on effectiveness of FFT calculation in the algorithm and an efficient implementation of PFFT algorithm has been proposed by Wang et al. [6]. In this paper, we first consider the formulation of scattering problem of 3-D dielectric bodies using Poggio-Miller-Chang-Harrington-Wu-Tsai (PMCHWT) [5] approach. We

348

349

use the RWG basis function [4] to expand the equivalent electric and magnetic currents J and M, which involved in the formulation. In our experiment, we choose RWG basis as testing function and convert the formulated EFIE and MFIE into matrix equation. This formulation is found to be free of interior resonance and yields accurate result. We solve the resulting matrix equation using Generalized Minimal Residual Algorithm (GMRES) [7] and Parallel P-FFT algorithm to reduce the computational complexity and memory requirement. 2

Formulation

In this section, we consider the electromagnetic scattering problem of a three-dimensional arbitrary shape and homogeneous dielectric body. The detail derivation can be found in [8], but for completeness of formulation, the summary of equations is given below.

nxEinc =-K-nx\^(VW [A 0 = K-nx\-^-(yV•

•Ai+k^A,)-VxFi\ J 5+ \ + klX)~

Vx

^2 \ 1

nxHmc=J-nx{VxA

(la)

+ yy

(lb) S~

^+/C'M

(lc)

0 = -J-nX VxA+^—&±M3.l {

JhVi

(ld)

\s-_

where the magnetic and electric vector potential A, and Fi for i=l,2 are given as

- e~ik'r

-

A = J*-A—

(2)

- - e'jk'r Ft=K* 4/rr

(3)

Using PMCHWT approach, we combined (la) - (lb) and (lc) - (Id), and obtained nxEinc

=-hx

^ ( V V - A, + £ 1 2 A 1 ) + ^ 2 - ( W - A 2 +

[JK

jk2

fc2A2)-VxF1 (4a

nxH

l

= - n x VxA, + V x A 2 + -

[

L

?

— +

JhVx

JKVi

-VxF2l

J

l

-^4

J

(4b) After obtained the integral equation, we approximate the electric and magnetic current using Rao-Wilton-Gilson (RWG) basis function [4]:

' = £ ' , • / • 00

(5a)

350

K = fjM,fn(r)

(5b)

i

and the integral equation is converted to a matrix equation: ZI=V

(6)

The resulting matrix equation will be solved using iterative solver and accelerated the matrix-vector multiplication using parallel P-FFT algorithm [6] with slight modification to suit our need. The pseudocode of the algorithm is given for completeness of formulation. IF (rank .eq. p0) THEN Project the panel charges to the grid charges CALL MPI_Scatter() [scatter data from p 0 to P0-P15 ENDIF Istart to compute convolution CALL 3-D FFT() ! pj computes ith FFT CALL 3-D FFT() ! Pi computes i'h FFT"1 DO i=0,15 IF (rank .eq. p0) THEN CALL MPI_RECV() ! p 0 receive result from p, Rest computation relating to convolution ELSE CALL MPI_SEND() ! pi sends result to p 0 ENDIF ENDDO lend of convolution IF (rank .eq. p0) THEN Interpolate the grid potentials back to the panels Correction(Compute nearby interactions) ENDIF 3

Numerical Results

In this section, we present numerical result to demonstrate the accuracy of the proposed method. The computation was carried out on IBM p690 PowerPC POWER4. The example we considered is a dielectric sphere having a radius of 1 m and a relative permittivity, 6^2.0. The bistatic RCS for VV and HH polarization are computed at 750 Mhz and are compared with the Mie Series solution in Fig.l. Good agreements are observed between the results. 4

Conclusion

In this paper, the P-FFT algorithm has been extended to solve electromagnetic scattering of homogeneous dielectric bodies. The problem was formulated using PMCHWT approach and discretized by MOM. The resultant matrix system was then solved by

351 iterative solver and accelerated using P-FFT algorithm. Numerical example was presented to illustrate the accuracy of proposed method.

40

^

30

Mie Series P-FFT

r^

i

r^

20

CO

o cr .a

0

"oS

-20 -30 0

20

40

60 00 100 120 Angle 6,180-9 (Degree)

140

160

180

Fig. 1: Bistatic RCS of a dielectric sphere (a=lm, 8j=2) at 750 Mhz 5

Reference 1. J. R. Phillips and J. K. White, "A precorrected-FFT method for electrostatic analysis of complicated 3-D structures", IEEE Trans. Computer-Aided Design of Integrated Circuits and Systems, Vol.16, No.10, pp. 1059-1072, Oct., 1997. 2. Xiaochun Nie, Le-Wei Li, Ning Yuan and Yeo Tat Soon, "Precorrected-FFT Algorithm for Solving Combined Field Integral Equations in Electromagnetic Scattering", Journal of Electromagnetic Waves and Applications, Vol.16, No.8, pp.1171-1187, 2002. 3. K. R. Umashankar, A. Taflove, and S. M. Rao, "Electromagnetic scattering by arbitrary shaped three-dimensional homogeneous lossy dielectric objects," IEEE Trans. Antennas Propagat., vol. 39, pp. 627-631, May 1991. 4. S. M. Rao, D. R. Wilton, and A. W. Glisson, "Electromagnetic scattering by surfaces of arbitrary shape," IEEE Trans. Antennas Propagat., vol. 30, pp. 409-418, May 1982. 5. J. R. Mautz, and R. F. Harrington, "Electromagnetic scattering from a homogeneous material body of revolution," Arch. Elek. Ubertragung, vol. 33, no. 4, pp. 71-80, Apr. 1979.

352

6. Y. J. Wang, L. W. Li. and E. P. Lee, "Parallelization of Pre-corrected FFT in Scattering Field Computation," submitted to Int. Conf. on Sci. and Eng. Computation 2002. 7. Y. Saad and M. Schultz, "GMRES, A Generalized Minimal Residual Algorithm For Solving Nonsymmetric Linear Systems," SIAM J. Sci. Stat. Comput., vol. 7, No. 3, July 1986. 8. A. F. Peterson, S. L. Ray and R. Mittra, Computational Methods For Electromagnetics, IEEE Press, New York: IEEE Press, 1998.

THE COMMON COMPONENT ARCHITECTURE (CCA) APPLIED TO SEQUENTIAL AND PARALLEL COMPUTATIONAL ELECTROMAGNETIC APPLICATIONS DANIEL S. KATZ, E. ROBERT T1SDALE, CHARLES D. NORTON Jet Propulsion Laboratory, California Institute of Technology, 4800 Oak Grove Drive, Pasadena, CA, 91109, USA E-mail: {Daniel.S.Katz, E.Robert.Tisdale, Charles.D.Norton}®jpl.nasa.gov The development of large-scale multi-disciplinary scientific applications for high-performance computers today involves managing the interaction between portions of the application developed by different groups. The CCA (Common Component Architecture) Forum is developing a component architecture specification to address high-performance scientific computing, emphasizing scalable (possibly-distributed) parallel computations. This paper presents an examination of the CCA software in sequential and parallel electromagnetics applications using unstructured adaptive mesh refinement (AMR). The CCA learning curve and the process for modifying Fortran 90 code (a driver routine and an AMR library) into two components are described. The performance of the original applications and the componentized versions are measured and shown to be comparable.

1

Introduction

The work described in this paper was undertaken to answer the following questions regarding the Common Component Architecture (CCA): How usable is the CCA software? What work is involved for a scientist to take previously written software and turn it into components, particularly for parallel components? Once the components exist and are linked together, how does performance of the componentized version of the application compare with that of the original application, again, particularly for parallel components? The paper does not deal with the question of why one might choose to use components. It assumes that the reader has an interest in using components, and wants to understand the implications of choosing to use the CCA software for this purpose. The remainder of this paper will describe the initial software, describe the componentization process, and provide and analyze the timing measurements, and finally summarize the answers to the questions. 2

The Common Component Architecture (CCA)

The CCA Forum [1] was founded in January 1998, as a group of researchers from the U.S. National DoE labs and academic institutions committed to defining a standard Component Architecture for High Performance Computing. The CCA Forum noticed that the idea of using component frameworks to deal with the complexity of developing interdisciplinary HPC applications was becoming increasingly popular. Such

353

354

systems enable programmers to accelerate project development through introducing higher-level abstractions and allowing code reusability, as well as provide clearly specified component interfaces which facilitate the task of team interaction. These potential benefits encouraged research groups within a number of laboratories and universities to develop, and experiment with prototype systems. However, these prototypes do not interoperate. The need for component programming has been recognized by the business world and resulted in the development of systems such as CORBA, DCOM, Active X and others. However, these systems were designed primarily for sequential applications and do not address the needs of HPC. The objective of the CCA Forum is to create a standard that both a framework and components must implement. The intent is to define a minimum set of conditions needed to allow high performance components built by different teams at different institutions to be used together, and to allow these components to interoperate with one of a set of frameworks, where the frameworks may be built by teams different from those building the components. The CCA forum members are developing implementations of the standard as well, both components and frameworks. 3 The Non-Componentized Software The original JPL software consisted of two units. The first was the 2dimensional, parallel version of the Pyramid unstructured Adaptive Mesh Refinement (AMR) library [4], developed at JPL over the last few years. Pyramid uses the MPI library for its interprocessor communication. The second was a driver routine for this library [2]. The driver is also parallel, but it does not have any communications routines, since they are all handled within Pyramid. All of the original software was written in Fortran 90, though Pyramid requires an additional library called ParMetis, that determines a repartitioning for the parallel version of the Pyramid library. ParMetis was only used as a binary library, and was not modified in any way in this work. The function of the software is to read in a mesh resulting from an electromagnetic problem, and to (possibly repeatedly) refine a region of this mesh. 4 Componentization of the Software The initial work on this task [3] included development of simple single component and two component example applications. After these were developed, the only problem that had to be overcome to componentize the sequential software was building a C++ wrapper for the Fortran Pyramid library, and translating the driver code into C++, as the CCA framework (Ccaffeine) required components to be written in C++. The CCA model for parallel applications is a Single Component, Multiple Data (SCMD) model. In this model, one process of each component exists on all processors. In a given processor, one components

355

communicates with another component through the framework. Intercomponent communication takes place as expected, using a library such as MPI. A component in one processor cannot communicate directly with a different component in a different processor. As mentioned above, Pyramid uses the MPI library to communicate, and the driver component does not do any communication. Thus the only real differences between the sequential and parallel versions of the application are in launching the framework in parallel and ensuring that the components are also started in parallel. For the current framework, it can be launched on multiple processors by simply starting it with mpirun —np $number $path_to_ccaffeine. Since the driver code and the Pyramid library were both written in such a way that they can run on one or more processors, no changes needed to be made to the driver or Pyramid components. 5 Timing Results For each run of the application, two times were measured, the maximum time from the before the first call to the library to after the last call to the library over the set of processors in a given run, and the wall clock time. These two times were not significantly different for any run. Figure 1 shows the results from the parallel experiments. (Sequential results are not shown in the interest of space.) Each result is the average of 5 to 10 runs.

D Without CCA • With CCA

E c

4

8

16

32

Number of Processors Figure l.Timing results for the parallel component vs. driver/library application.

These results show an insignificant difference between the speed of the component application and the driver/library application on 2 to 32 processors. In some cases, the component application is slightly faster, in others, the driver/library application is slightly faster. The key point is that

356

the scalability is unchanged between the versions; the CCA framework has no effect on how the parallel application scales. 6

Conclusions

The lessons learned in this work are: There was initially a fair amount of learning associated with use of the CCA Forum's technology, including the CCAFEINE framework. It took 2-3 months to componentize the first application, though the second was componentized fairly quickly. Once the sequential application was componentized, proceeding to the parallel application was simple. The lack of a means to write Fortran90 components is a serious shortcoming for many science applications. It is possible to get around this shortcoming, but this introduces additional work for the componentizer and adds the chance for additional errors to come into the application. Once an application is componentized, if the amount of work done in each component call is large when compared with the time needed to make a function call, it is likely that the componentized version of the application will perform well. The authors' knowledge of ongoing work within the CCA Forum leads them to believe that the first issue has been mostly resolved, and the second issue will be resolved in time, most likely in less than 9 months. Once this is done, the CCA model will be a promising method for building large singleprocessor and parallel applications. In the next year, an effort will be undertaken to continue to resolve the first two issues above (flattening of the CCA learning curve and ensuring the Fortran90 components can be used in CCA.) Additionally, plans exist to turn a climate application into a CCA application. References 1. Armstrong R., Gannon D., Geist A., Keahey K., Kohn S., Mclnnes L. C , Parker S., Smolinski B., Toward a Common Component Architecture for High-Performance Scientific Computing, Proceedings of High Performance Distributed Computing, (1999) pp. 115-124. 2. Cwik T., Coccioli R., Wilkins G., Lou J. and Norton C , Multi-Scale Meshes for Finite Element and Finite Volume Methods:Active Device and Guided-Wave Modeling, Proc. of AP2000 Millennium Mtg. (2000). 3. Katz D. S., Tisdale E. R., and Norton C. D., A Study of the Common Component Architecture (CCA) Forum Software, Proceedings of High Performance Embedded Computing (HPEC-2002), (2002). 4. Norton C D . , Lou J. Z., and Cwik T., Status and Directions for the PYRAMID Parallel Unstructured AMR Library, 8th Intl. Workshop on Solving Irregularly Structured Problems in Parallel (I5th IPDPS), (2001).

A FAST ALGORITHM FOR THREE-DIMENSIONAL ELECTROSTATIC ANALYSIS: FAST FOURIER TRANSFORM ON MULTIPOLE (FFTM) E. T. ONG AND H. P. LEE Institute of High Performance Computing, 1 Science Park Road, Singapore E-mail: onget@ ihpc. as tar. edu. sg

117528

K. H. LEE AND K. M. LIM National University of Singapore, Department of Mechanical 10 Kent Ridge Cresent, Singapore 119260

Engineering,

In this paper, we proposed an alternate fast algorithm for solving large problems using Boundary Element Method (BEM). It utilized two important features, namely the multipole expansion, and potential evaluation by discrete convolutions, via Fast Fourier Transform (FFT). We refer to it as the Fast Fourier Transform on Multipole (FFTM) method. It is demonstrated that FFTM is an accurate method, and is likely to be more accurate than Fast Multipole Method (FMM), for the same order of expansion p, at least up to p=2. It is also shown that the method has only linear growth in the computational complexity, which implies that FFTM can be as efficient as FMM.

1

Introduction

Consider an electrostatic problem with electrical conductors embedded in a homogenous dielectric, the charges a(x') induced on the conductors satisfy the integral equation

^W^'h^pbfM- " r

(1)

where ^(JC) is the applied potential, x and x' correspond to the field and source point, respectively, eis the dielectric constant, and ||JC| is the Euclidean length of x. BEM is often used to solve equation (1). However, it generates dense linear system, which requires o(n3) and o(n2) operations if solved by direct methods (Gaussian Elimination) and iterative methods (GMRES), respectively. Recent development utilizes the matrix-free feature of the iterative methods, which requires computing matrix-vector products that can be seen as potential evaluation process. This important observation has led to the developments of numerous fast algorithms that is only 0(n). One such algorithm is the Fast Multipole Method (FMM), which has been developed by Greengard and Rohklin [1], and later used by Nabors and White [2] in electrostatic problems. The efficiency of FMM arises from the effective usage of multipole and local expansions in a hierarchical manner through a series of transformation operations. Further improvements were made by Greengard, Rokhlin and Cheng [3,4], by using new compression techniques and diagonal forms for the transformation operators. Other techniques developed for solving equation (1) rapidly include, the precorrected FFT approach [5], the singular value decomposition [6] and wavelet-transform methods [7]. In this paper, we describe an alternate approach, which we referred to as Fast Fourier Transforms on Multipole (FFTM) method. It utilizes two important features, namely:

357

358

i) using multipole expansion to approximate far potential fields, ii) evaluating the approximate potential fields by discrete convolution, via FFT. In the following section, the FFTM algorithm is described. Section 3 presents some numerical examples to demonstrate the accuracy and efficiency of the method. Finally, a conclusion is given in Section 4. 2

Fast Fourier Transform on Multipole (FFTM)

This method arises from realization that the multipole expansion can be seen as discrete convolutions, which can be evaluated rapidly using FFT algorithms. The method comprises of four main steps: A.

Problem domain is subdivided into many cells, in which the panels' charges are to be represented by multipole moments. The number of cells should satisfy that required by FFT solvers. But, with the help of FFTW (Fastest Fourier Transform in the West) [8], we can perform FFT of any arbitrary size (preferably factors of small primes).

B. Transforming panels' charges q{x') into multipole moments M™ using

M: = i^Y-^e-^y'dr

(2)

The potential due to M™ is given by the multipole expansion,

«*)«£lAC^ n-0 m=-n

(3)

^

where p is the order of expansion, and Y™(O,0) is the spherical harmonics functions, respectively, which are defined as

C(M= J)KHHLH/ £/f'(cos^

(4)

with P"(cos0) being the associated Legendre function of the first kind with degree n and order m, where n is a non-negative integer, and -n<m
ym

r,

(u,*)-ZI

/ *•

M;{rj,k!)-^{f-tj-f,k-k!\ R

"I

(5) IJ

where indices (ijjc) and (i j ,k) denote the discrete locations of the field and source points, respectively. By discrete convolution theorem [9], equation (5) can be evaluated efficiently using FFT algorithms. D. The cells' potentials are then interpolated onto the panels' collocation points, using a simple quadratic interpolation scheme. This accounts for the 'distant' charges effects only. But prior that, potential correction is performed to remove the 'near' charges contributions, which are inaccurately represented by multipole expansions. Finally, the 'near' charges contributions are added directly onto the collocation points.

359

3

Numerical Examples

Different FFTM schemes are defined according to the multipole expansion order p, and the direct interaction list D;„, (layers of cells in which panels' charges are to be computed for directly). The results are compared with the GMRES explicit approach (where the full coefficient matrix is formed explicitly). 3.1 Accuracy analysis of FFTM

The 4x4bus-crossing example [2] is used here to compare the accuracy of FMM and FFTM. The results of capacitance matrix are tabulated in table 1. In general, FFTM is more accurate than FMM. This is largely due to the ways the 'distant' potential contributions are computed in the two methods. In FMM, multipole and local coefficients are used repeatedly in a hierarchical process, in which approximation errors can accumulate. On the other hand, FFTM replaces this hierarchical process by discrete convolutions that are evaluated by FFT algorithm, which is "exact" up to round-off errors. Table 1. Capacitance extraction of 4x4bus-crossing example by FMM from [2], and FFTM methods. Solution Method GMRES explicit FMM (p = 0) FMM(p = l) FMM (p = 2) FFTM (p = 0) FFTM (p = 1) FFTM (p = 2) * Note that D,isl = 2.

Cn 402.9

Cn -136.2

394.5 406.6 405.2 404.2 403.4 403.2

-124.0 -139.7 -137.8 -133.1 -136.7 -136.3

Capacitance Matrix Entry (pF) C14 C13 C/6 Cn -12.00 -7.886 -48.18 -39.90 -0.175 -2.471 -52.15 -43.39 -12.36 -6.676 -48.48 -40.45 -11.91 -8.079 -48.36 -40.09 -13.53 -6.108 -49.14 -41.53 -12.57 -8.014 -48.15 -39.63 -11.49 -7.966 -48.36 -40.05

c„ -39.90 -43.08 -40.27 -40.01 -41.27 -39.62 -40.05

Cis -48.18 -52.92 -48.46 -48.45 -49.85 -48.05 -48.34

3.2 Efficiency analysis of FFTM The self-capacitance extraction of a unit cube is used in this analysis. The cube is meshed with uniform elements, where the larger problems are generated by using finer mesh. The efficiency plots for the CPU times and memory storage requirements are given in figure 1. •O/FESeplicit/ Dus,= 1,^=1 •Dus,=X'P = 2

- • - GMRES explicit,"

•pist = 2, p = 2

-*-D;iM = 2, p = 2'

~a~Dlai= l,p= 1 - _ * Diist = \,p = X -*-Du,t = 2, p - 1

.gl.OE+02

Q.

1.C6W Rcttemsiasin

1.0E+04

Problem size, n

(a) (b) Figure 1. Plots of (a) CPU times and (b) memory storage versus problem sizes for cube example.

360

All the problems are solved more rapidly using FFTM. But more importantly, it is observed that the FFTM schemes have only linear growth in their computational complexity. This means that FFTM can be as efficient as FMM. 4

Conclusion and Future work

In this paper, we proposed and implemented an alternate fast algorithm, the Fast Fourier Transform on Multipole (FFTM) method. It is demonstrated that FFTM is an accurate method, which is likely to be more accurate than FMM for the same order of multipole expansion (at least up to p = 2). It is also shown that FFTM has only linear growth in computational complexities. This means that it can be as efficient as FMM. In fact, for a given order of accuracy, we believe that FFTM is likely to be more efficient, since FMM would need a higher order expansion in order to achieve the desire accuracy. However, the current FFTM algorithm cannot attain very high order of accuracy, because of the way the potential interpolation is done. One obvious solution is to use the local expansions in conjunction with multipole expansions. However, in this case, the 4

number of discrete convolutions scales like 0(p ), which may render the algorithm inefficient. Hence, our future work aims to implement an accurate and efficient local expansion version of FFTM. References 1.

Greengard L. and Rokhlin V., A fast algorithm for particle simulations. Journal of Computational Physic 73 (1987) pp. 325-348. 2. Nabors K. and White J., Fastcap: A multipole accelerated 3-D capacitance extraction program. IEEE Transaction on Computer-Aided Design Integrated Circuits and Systems 11 (1991) pp. 1447-1459. 3. Greengard L. and Rokhlin V., A new version of the fast multipole method for the Laplace equation in three dimensions. Acta Numerica 6 (1997) pp. 229. 4. Cheng H., Greengard L. and Rokhlin V., A fast adaptive multipole algorithm in three dimensions. Journal of Computational Physics 155 (1999) pp. 468-498. 5. Phillips J. R. and White J., A precorrected-FFT method for electrostatic analysis of complicated 3-D structures. IEEE Transaction Computer-Aided Design Integrated Circuits and Systems 16 (1997) pp. 1059-1072. 6. Kapur S. and Zhao J., A fast method of moments solver for efficient parameter extraction of MCM's. Proceedings of Design Automation Conference, CA, June (1997) pp. 141-146. 7. Levin P. L., Spasojevic M. and Schenider R., Creation of sparse boundary element matrices for 2-D and axi-symmetric electrostatics problems using the bi-orthogonal Haar wavelet. IEEE Transaction on Dielectric and Electric Insulation 4 (1998) pp. 469-484. 8. Matteo Frigo and Steven G. Johnson. FFTW, C subroutines library for computing Discrete Fourier Transform (DFT). Freeware available from http://www.fftw.org. 9. Brigham E. O., The Fast Fourier Transform and its Applications. Prentice-Hall, Englewood Cliffs, 1988.

A N A L T E R N A T I V E I M P L E M E N T A T I O N OF I N T E R P O L A T I O N IN MULTILEVEL FAST MULTIPOLE M E T H O D (MLFMM) CHAN-PING LIM, YAO-JIANG ZHANG, FANG WU AND ER-PING LI Institute High Performance Computing, 1 Science Park Road, #01-01, The Capcricorn Singapore Science Park II, Singapore 117528 E-mail: [email protected] A fast algorithm is presented to solve a two-dimensional conducting cylinder using an Electric Field Integral Equation (EFIE) formulation. The fast algorithm presented in this paper employed the multilevel fast multipole method with the use of the Lagrange interpolation. The iterative solver used is the conjugate gradient method. This algorithm has a complexity of O(NlogN). The code is verified with method of moment (MoM).

1

Introduction

The method of moments * formulation will result in a dense matrix which requires 0(N2) to solve. With the implementation of fast multipole method (FMM), the operation can be reduced to 0(N1^). And, when FMM is implemented in multilevel (which is multilevel FMM), the operation can be further reduced to 0(N\ogN) for sparse scatters, and O(N) for densely packed scatters. In our implementation, we first discretise into elements by method of moments. Each element is at least 0.1 A. Then, in every level, the elements are grouped to its respective group. We organise the non-empty groups into a quad-tree structure with the smallest group at the bottom of the inverted tree, and the largest group at the root of the tree. Empty group is not stored in the tree. When one element is located far from another element, the field due from these elements is computed by the factorised Green's function in a multilevel multi-stage fashion. However, when the elements are close to each other, the field is obtained using the traditional MoM. In order to reduce the computation time, for our aggregation and dis-aggregation implementation, we make use of interpolation between levels and we employ Lagrange Interpolation. 2

2 D M e t h o d of M o m e n t s

A simple 2D method of moment (MoM) that discretize the 2D integral equation of scattering is shown as follow1 : N

'^2Ajiai = bj,

(1)

where j = 1, 2, • • •, N and MW A , ill + U TiM

tS ;Mi nn f ^^ l l , r(2)

4

AiH^>{kPji),

)]•

i

(2)

i±j

For the self-term (i = j), the Hankel function is represented by its small argument approximation, and for the non-self term, it is approximated by a one-point integration for simplicity. A,; is the element size and (j/Ae = 0.163805).

361

362 3

M u l t i l e v e l Fast M u l t i p o l e M e t h o d in 2 D

Multilevel implementation of fast multipole algorithm is to factorise the Green's function into multifactors. The Hankel function can be written as follows with the aid of the addition theorem repeatedly 2 ' 3 . H

(krji)

o

= PjJi • PjiJi • aJih

• Phh • Phi

(3)

where (4) Ji-m(kpjlJa)e-jV-m»w

[A/^km = [Pj2i2]m,n = \PhhUp

(5)

H^_n(kpj2h)e-^-^^

=

[Phi}P,i =

(6)

Jn-P(kphh)e-j(n-rt^

(7)

Jpikpi^e-^'^

(8)

The above equation can be diagonalized as shown in the two-level case: rl-K

H{0krji)= 4

/ Jo

da%j1(a')0j1j,(a')&jlIa(a')0IaJl(a')pIli{a')

(9)

Interpolation

The integral in Eq. (9) can be written in discrete form as shown: (10) 9=1

Anterpolation between Pjj1(aq) and Pj1j2{oiq) and interpolation between Pi2ix{otq) and Pi1i(aq) are needed to achieve an O(NlogN) algorithm. Eq. (11) shows how the interpolation matrix is obtained. jji-\-l,n -'ll rn+l,n i 21

rn+l,n M2 jn-\-l,n •'22

Tn+l,n . 'lQi 7-n4-l,n l 2Qi

7-n+l,n Q2l

7-n-t-l,n L Qi2

jn-\-l,n

(11)

/3^V
1

^ 7 X ( « Q I ) .

Qi > Qi and we can choose Q2 = 2 x Q j . We can see the Pjj1 in the

nth-\eve\

{Pjjx) is the same in Pjjx in the nth + 1-level (P™^1) except for the number of sampling points. Therefore, we can obtain P™^1 by interpolating from the data in

363

(3™jx We can use Lagrangian interpolation to get the values of the additional points. Lagrangian interpolation in the matrix form is as follows: Ml ( 7 l )

0(71) 0(72)

W2(7l) ^2(72)

^1(72)

w

W

l(7Q2)

WQ x (7l) Qi(72)

w

/?(«i)

/3(« 2 )

(12)

2(7Q2)

where [/3] is a Q2 x 1 vector, [/3] is a <2i x 1 vector and [w] is a Q2 x Qi matrix. where i = 1, 2, • • •, Q\ and £ = 1, 2, • • •, Q2

i(7i) =

(13)

Not all the Wi(7;) have to be filled and therefore, the matrix in Eq. (12) can be made sparse. T h e aggregation matrix in our multilevel F M M formulation has a block structure as shown in Eq. (14). Vl(Qi), V2(Qi), • • •, VG(Qi) are sub-matrices and each of these sub-matrices has to go through interpolation separately.

V\Qi) 0

0 y 2 (Q,-) (14)

0 Now, let us take V1(Qj)

•••

VG(Qj)

0

sub-matrix as an example.

v : (Q 3 ) =

VAhi)

V112(7i)

^1(72)

^2(72)

VQ S I(7Q S )

V-4 3 2 ( 7 Q 3 )

(15) •••

V4 3n ( 7 Q 3 ) V?„(«i)

^2(«i)

V

1

^ )

V2\(a2)

V212(a2)

VQ2I((*Q2)

V422(QQ2)

^,("2)

= •••

(16)

V^ 2 n K> 2 )J

where n is the number of elements in the group and Q3 < Q2wi(7i)

V2U72) t&i(7Qa)J

^1(72)

w

i(7Q2)

^2(71) ^2(72)

WQ3(7I)

W2(7Q2)

WQ3(7Q2)

W

where [w] is a Q2 x Q3 matrix and i = 1,2, •,ra.

Q3(72)

V?i(«l) V2\(a2) (17)

364 5

N u m e r i c a l results

Based on the MLFMM formulation described in this report, we developed the 2D MLFMM code using Fortran 90 which consists of over 2000 lines. We also try to use the minimal built-in library in order to enhance the portability of the code. In this section, we would like to verify our MLFMM results with the MoM results. We have chosen a circular cylinder with radius of 20A for our verification. All the verifications are performed by comparing the numerical results obtained by MLFMM with MoM.

Circular Cylinder with Radius of 201

Circular Cylinder With Radius of 20 X _i

1

,

i

i

i

i

I

i

I

i

L

40-

MLFMM

35•

MoM

302520-

— -

I

-4i^

1510-

^H|

(»K

150

200

1

50

100

Degree

250

300

0

— I — ' — I — 200 400

600

800

1000

1200

1400

Current Segments

Figure 1. Current distribution and RCS calculation of a circular cylinder

From the numerical result shown in Fig. 1, we can see the results obtained using our algorithm show excellent agreement with the exact method, i.e. MoM. 6

Conclusion

The multilevel fast multipole method with the emphasis on the interpolation has been presented. The code has been verified with MoM code and excellent agreement is obtained. Future work include parallelizing of this code will be investigated. References 1. R. F. Harrington, Field computation by moment methods, (Malabar, FL:Krieger, 1982). 2. C. C. Lu and W. C. Chew,Micro. Opt. Tech. Lett. Vol.7 N o . 1 0 , 466 (1994). 3. J. M. Song and W. C. Chew,Micro. Opt. Tech. Lett. Vol.10 N o . l , 14 (1995).

FAST MATRIX ALGORITHMS FOR HIERARCHICALLY SEMI-SEPARABLE REPRESENTATIONS

S. CHANDRASEKARAN AND T. PALS Department of Electrical & Computer Engineering University of California Santa Barbara CA, USA 93106-9560 M. GU Department of Mathematics University of California Berkeley CA, USA The hierarchically semi-separable representation for arbitrary matrices is presented, and fast backward stable direct solvers for such representations are constructed. This technique generalizes earlier work by Rokhlin and his co-workers on certain kinds of fast integral equation solvers [1,3,5]. This work can also be viewed as a generalization of the work of Dewilde and his co-workers in time-varying systems theory [4]. The method has the further advantage of reproducing optimal complexities independently of the number of underlying spatial dimensions. In particular the method will reproduce the optimal complexity for solving many sparse matrices by direct factorization.

1

HSS representation

In this p a p e r we consider a new class of r e p r e s e n t a t i o n s of m a t r i c e s t h a t we t e r m h i e r a r c h i c a l l y s e m i - s e p a r a b l e (HSS) r e p r e s e n t a t i o n s . T h e HSS r e p r e s e n t a t i o n of a s q u a r e m a t r i x A is given by six sequences of m a t r i c e s Di, Uk-,i, Vk-i, Rk;i, W*;» a n d Bk;i,j • T h e r a n g e of t h e t h e indices k, i a n d j will b e c o m e clear shortly. F i r s t t h e r e is a n integer K such t h a t 0 < k < K. We will call K t h e d e p t h of t h e H S S r e p r e s e n t a t i o n . T h e r e a r e 2K n u m b e r s mx;t such t h a t D{ is a n TUK;j x rn,K;i m a t r i x , a n d 5^j rriK;i is t h e n u m b e r of rows (columns) of A. We now define t h e a d d i t i o n a l n u m b e r s m^i = m,k+v,2i-i + w}/fc+i;2j where 0 < k < K a n d 1 < i < 2k. We now define a d d i t i o n a l m a t r i c e s Uk;i b y t h e following recursion UK,i

= UU

K

1 < i < 2

Uk-,i _= f U " k+ « i-;2i-iR ! - ^ k-+v,2i-i *"«+^-i } , 1

n0 ^< i k. ^<* K, -

i1 ^< „i• ^< ok 2

k+V,2iRk+l;2i• -• Uk+iM-K:

Similarly we define t h e a d d i t i o n a l m a t r i c e s VK;i

= UU

l
^;i=f^2'-1^+1^-1>), V

Vk+l;2iWk+l;2i

0
l
j

T h e m a t r i c e s Uk-,i a n d Vk;i have m^i rows. T h e n u m b e r of columns t h e y have a r e d e t e r m i n e d by t h e n u m b e r of rows of Rk-i a n d Wk;t respectively. P a r t i t i o n t h e m a t r i x A recursively as follows A);l,l =

A,

365

366 "lfc+l;2i-l

Ak.it

= ' '

m

1

2

*+ ^ '-

"ljfc+l;2t

1

M*+ii«-i.2*-i

"l*+l;2i

^+i;2i-i,2i A

V •^•*+l;2«,2t-l

^fc+l;2i,2t

o < fc < iT,

Note that only the diagonal blocks are recursively partitioned. representation is chosen such that AK-,i,i =Di: 1 < t < 2K, Akiij = UktiBkiiiJVi^, 0
1 < i < 2k.

/

Then the HSS

l
j = i±l,

It can be shown that given a sequence of 2K numbers rriK;i such that £ ^ niK„i = n, every n x n matrix has a corresponding HSS representation, and that this representation can be constructed in 0(n2) flops in the general case. The HSS representation is interesting only when it requires far fewer parameters than the usual representation of a matrix. For this to be the case the rank of Bk]itj must be much smaller than mk]i and mk)j in a certain well-specified sense. 2

Fast multiplication

First we consider how to multiply a matrix in HSS form with a regular vector (matrix). The technique is identical to the usual fast multipole methods. Hence we just summarize the recursions here. Suppose A is in HSS form and we wish to compute z = Ab. We first row-partition b according to the sequence mK-,i and call the i-th partition &JC;*. Then the following recursions Gk,i = Wk+1.2i-iGk+l;2i-l -Ffc,2i-1 = Bk;2i-l,2iGk;2i

+ Wk+1.2iGk+V,2i,

+ •R*;2i-1-Pfc-1,»)

Fk,2i = Bk]2i,2i-lGk-2i-l

+

Rk;2iFk-i-i,

with GKJ = ^K^K\i and Foji = 0, can be computed rapidly. We then observe that ZK;i = Dibx-,% + UK;iFji-,i, where ZK-,% is obtained by row-partitioning z according to the sequence m,Kti3

Fast stable solver

We now consider how to compute x rapidly, where Ax = b, and A is available in HSS form. The algorithm can presented in a recursive fashion. The recursion takes one of three forms. Case 1. Compressible off-diagonal blocks. We begin by observing that block row i, excluding the diagonal block Dj, has its column space spanned by the columns of UKJ- Hence if the number of columns of UK,i, denoted by riK;i is strictly smaller than mi, the number of rows in that block, we can find a unitary transformation qn-,i such that fr

-nH

TT

_™,i-

nK;i (

0

\

367 We now multiply block row i by (ff. The change in the off-diagonal blocks is represented by the above equation since all of them have UK-,1 as the leading term. The i-th block of the right-hand side changes to become H ,

_ mi-nK.ti

(PK-,i\

We also observe that Di, the diagonal block has become q^i^iunitary transformation WK;i such that "is - nK-,i f)

H

- (n

n \,„

H

-

m

n

i~

K;i (

%i-MK;i-njr.

y

Now we pick a

nK;i

-Di;l,l

0

\

Di2i

A;2J.

We then multiply the block column i from the right by w^.t. The change in the diagonal block is represented by the above equation. The off-diagonal blocks in block column i have V^.{ as the common last term. Hence we just need to multiply VK-,1 to obtain i7

_ „„

-

T/

m

i

~ nK;i

( ^ i « ^\

nK;i

\VK;iJ

Since we multiplied block column i from the right by tiif.j, we need to replace the unknowns x^-i

by WK-^XK-X„„

„

_

m

i -

n

K ; i

( ZK;i\

/, v

At this stage the first mi — nn-i equations in block row i read as follows -Di;i,iZjc;i = PK-,1, which can be solved for ZK,I to obtain ZK-,% = D^iipK;iWe now need to multiply the first m* — n,K;i columns in the block column i by ZK-,% and subtract it from the right-hand side. To do this efficiently we observe that the system of equations has been transformed as follows (diag(g^.j) J 4diag(w^. i )) (diag(w.R;;j) x) = diag(q>^.i) 6. If we define the vector \

0

)

we then observe that the stated subtraction can be re-written as follows b = diag(g^. i )6 — (diag(g^. j ) AdiAg(w^.i)) z. We can do this operation rapidly by observing that (diag(g^. i ) Adiag(w;^. i )), has the HSS representation {.Dj}?=1, {UK;i}i=1, {VK;i}i=l, {{-Rfc;i}i=l}*=0> {{Wk;i}i=1} k=0, {{Bk;2i-l,2i} i=1 } k=0, {{Bkt2i,2i-i}j=i }{*=o> a n d using the fast multiplication algorithm in section 2. Of course, the algorithm can (and should) be modified to take advantage of the zeros in DKti, UK-i, and zK.t. Once the subtraction has been done, we discard the first nn — nn;i columns of block column i and the first mj — « K ; J rows of block row i. We observe that this leads to a new system of equations of the form Ax = b, where £ K1

'

_ m< - nK-,i (* \ nK,i V ^K;») '

368 and

A has the HSS representation

{23i;2,2}f=1,

{UK-,i}2i=\,

{VK;t}j=i>

{{RkAti}^ {{WkAt^=0, {{BtiJi-i,2i}£'}£o. {{s*i«,M-i}?:r}^o-

Therefore we are left with a system of equations identical to the one we started with and we can proceed to solve it recursively. Once we have done that we can recover the unknowns x from z and x using the formulas

Case 2. Incompressible off-diagonal blocks. It occurs when all block rows for the system cannot be compressed any further by invertible transformations from the left. In this case we proceed to merge adjacent block rows and columns. We do so as follows f) _ (

D2i-1;2,2

UK;2i-lBK;2i-\,2i^K;2i\

\UK;2iBK;2iai-^K;2i-\ fj

_ f

T>

_ (

\

VK-l;i - y

£>2i;2,2

/ '

UK;2i-lRK;2i-l\ UK;2iRfC;2i

) '

VK;2i-lWK;2i-l\ VK;2iWK;2i

J '

We then see that A has an HSS representation given by the sequences: {Z),}? =1 ,

{u)t;\

{v)t;\

{{^,,}ti}f=- 0 1 .

{{WkAt.)k=0\

{{Bk;2i-i,2i}Ci}h\

{{Bk-,2i,2i-i}i=i }%=<}• Let us denote by A the matrix with this HSS representation (of course, A = A). We then observe that the system of equation is now in the form Ax = b, which is exactly in the form we started with, except that the new HSS representation has depth K — 1. Hence we can solve this system of equations recursively for x. Case 3. N o off-diagonal blocks. Observe that if K — 0 the equations read D\x = b, which can be solve by traditional means for x. This case terminates the recursion. References 1. J. Carrier, L. Greengard, and V. Rokhlin, "A fast adaptive multipole algorithm for particle simulations", SIAM J. Sci. Stat. Comput., 9, pp. 669-686, 1988. 2. S. Chandrasekaran and M. Gu, "Fast and stable algorithms for banded plus semi-separable matrices", submitted to SIAM J. Matrix Anal. Appl., 2000. 3. Y. Chen, "Fast direct solver for the Lippmann-Schwinger equation", submitted to Advances in Computational Mathematics, http://www.math.nyu.edu/faculty/yuchen/onr/intro.htm, 2001. 4. P. Dewilde and A. van der Veen, "Time-varying systems and computations", Kluwer Academic Publishers, 1998. 5. P. Starr and V. Rokhlin, "On the numerical solution of 2-point boundary value problem.2", Communications on Pure and Applied Mathematics, Aug. 1994, vol. 47, no. 8, pages 1117-1159.

PARALLEL FAST MULTIPOLE METHOD FOR LARGE-SCALE COMPUTATION OF ELECTROMAGNETIC SCATTERING

ZHANG YAOJIANG, WU FANG, EDWIN LIM CHAN PING, LI ERPING Division of Computational Electromagnetics & Electronics, Institute of High Performance Computing, 1 Science Park Road, #01-01, The Capricorn, Singapore, 117528 Tel: (65)-64191542, [email protected]

Parallel fast multipole method is developed for RCS computation of large targets on multiple processor systems. Sparse Sub-matrices in FMM are calculated and stored in distributed system and MPI is used to perform the matrix-vector multiplication in iterative solution. The accuracy of code is verified by comparison with other publications. Either the computational complexity or speed-up ratio of parallel FMM and parallel MoM is compared.

1. Introduction In recent years, fast multipole method (FMM) has been developed into the most powerful and efficient algorithm in simulating electromagnetic scattering problems [1][2]. FMM reduces the computational complexity from o(N2) of moment method (MoM) to o(NL5) or 0(N\ogN) in its multilevel version, where AT denotes unknowns. However, theoretical prediction of radar cross section (RCS) of an aircraft with hundreds wavelength size often requires more than ten millions of unknowns to model the current density on its surface. Single processor systems can not provide sufficient memory not to mention unreasonable CPU time required. Therefore, parallel version of FMM needs to be developed to fully utilize multiple processor computing platforms [3]. The parallel codes of fast multipole method are developed and successfully implemented on either UNIX or LINUX operating systems. In parallel FMM, the near-group interactions, aggregation and translation procedures in FMM are distributed stored and computed. Message passing interface (MPI) library is used to communicate among different processors in iterative solution procedure, such as conjugate gradient method (CG). Special attention is paid on storage of sparse matrices to reduce RAM costs. The parallel efficiency of our source codes has been also discussed. The accuracy of our codes is verified by the calculation of other authors. 2. Brief review of the algorithm Either moment method (MoM) or fast multipole method is designed to solve the integral equations in electromagnetic scattering, e.g. following electric field integral equation (EFIE) of conducting targets: n x Einc = jk0rjn x I G(r, r') J(r' )dS'+ - ^ V • £ G(r, r')(V • J(r' ))dS'

369

(1)

370

where n i s the normal vector of the surface and J(r') denotes the current distribution; E'"cis the incident wave and k0, the wavenumber. T] stands for the wave impedance in free-space. G(r,r')is the 3D Green's function and expressed as following in FMM:

GneNGm

Ir,-r,l

G(r,,r,.):

^-*(I-kk)e *"""r''^(k-LV2i I 167T

(2.1) Gn e FGm (2.2)

where Gn denotes the n-th group of triangles in FMM and source point r. e Gn, target r € G • NGm and FG stands for near-group and far-group set of Gm • Symbol TL (k • r„„) represents the translation of the group G„ to Gm and is calculated as following: L

^(k-fm„) = I £(-7)'(2/ + lA (2, (V m J^(k-f m „)

(3)

/=0

In which, /j ( (2) ()is second kind spherical Hankel function and P ; ( ) i s the Lendgre polynomials. In the aid of (2.2) and following the MoM, integral equation (1) can be cast into following FMM linear equations:

where vector [at

a2

A

A

IG

A

A

"21

"22

2G

A

A

"Gl

"G2

a c f and [b,

a, a

2

GG. . a G .

b2

V =

b2

(4)

.bG.

b c ]T stands for the unknows and

incident wave, respectively. Sub-matrix Amn denotes the interaction of group G to G , and is expressed as:

AM„ —

M.

G„ e NG.

V ^ V

G„ e FGn

(5)

where M,, is calculated as conventional MoM. v„ is the aggregation of group G and symbol'+' denotes the transpose conjugate. T,^ is the translation matrix of G to G Therefore, the final form of the FMM linear equations can be written as: (ZjVW+V+TV)a-b

(6)

371 where matrices zNN, T and V are called near-interact, translation and aggregation matrix, respectively. They are all sparse one and thus make the computational complexity from MoM's 0(N 2 )toFMM's 0(NLS).

3. Parallel strategy Most time-consuming part of equation (6) is matrix filling and iterative solving. Therefore the matrices ZNN, T and V are calculated and stored partly in different processors. To do so, groups of FMM are distributed equally or almost equally to each processor,.i.e. equation (4) is partitioned into processors by rows. Matrix-vector multiplication are calculated in each CPU and MPI is used to communicate among CPUs to get the input vectors in each iteration. 4. Numerical examples and discussions To verify our source code which is written in FORTRAN 90, mono-static RCS for open cavity is showed in Figure 1. Our calculation agrees very well with that obtained by other authors [4]. We also have tested our code's complexity in aspect of either memory requirement or CPU time versus those of MoM. Figure 2 (a) and (b) demonstrate the advantage of FMM over MoM. It can be seen that our code has achieved the predicted 0(NLS) complexity. Figure 3 compares the speed-up ratio of FMM and MoM versus different processors used. Here, parallel MoM is the special FMM where all the groups are treated as near group set. It can be seen that parallel MoM has better parallel efficiency than parallel FMM. This can be explained by the fact that in parallel MoM, both aggregation and translation matrix are not calculated and involved in matrix-vector multiplication in iterations. Therefore, there is only one matrix-vector multiplication, i.e. it needs only one time to communicate among processors. On the other hand, from equation (6) there are at least five times communications in parallel FMM. From Fig.3(b), we can see that the efficiency of parallel FMM becomes better when dealing with larger problems. Thus parallel FMM is very promising for large-scale computations of electromagnetics.

e (degrees)

e (degrees)

Fig.l Monostatic scattering from an open cavity.

372 — u

1 —•—FMM — « — MoM

y

"

•

••^••1.2350859*10" 4 -/v"* 07a5

w /

•

\

\

t

y Y*

•SJ*

-•—FMM -•—MoM

lev 31=4

level=3

**^y

,'

••••"• 1.23027*1 O ' - N " " ™ '

]

|-

!

N Unknowns

(a)

(b)

Fig.2 Complexity comparison of parallel FMM and MoM

| |

-1 1 ! — ^ — unknowns=10,368 ^ /

Processor number

(a)

7

^ f Spe ed-up Rat

1 ' 1 • 1 • 1 • — • — P a r a l l e l FMM — • — Parallel MoM

/y£

- * - N =56,448 rj =10,368

: :

Processor Number

(b)

Fig.3 Comparison of speed-up ratio (a) parallel FMM and MoM (b) parallel FMM with different unknowns

Reference

4.

Eric Darve, The fast multipole method: numerical implementation, J. Comput. Phys., 160 (2000) pp. 195-240. R. Coifman, V. Rokhlin, and S. Wandzura, The fast multipole method for the wave equations: a pedestrian description, IEEE Antennas & Propagat. Mag., 35,no.3, (1993)pp.7-12. W.C.Chew, J. M. Jin, E. Michielssen, and J. M. Song, Fast and efficient algorithms in Computational electromagnetics, chapter 4, Artech House, Inc, 2001 J.M.Song, W.C.Chew, Multilevel fast multipole algorithm for solving combined field integral equations of electromagnetic scattering, Microwave & Opt. Tech. lett., 10, no. 1,(1995) pp. 14-19.

T W O C L A S S E S OF P R E C O N D I T I O N I N G T E C H N I Q U E S F O R ELECTROMAGNETIC WAVE SCATTERING PROBLEMS J U N ZHANG

AND

JEONGHWA LEE

Laboratory for High Performance Scientific Computing and Computer Simulation, Department of Computer Science, University of Kentucky, Lexington, KY 40506, USA E-mail: jzhangQcs.uky.edu, [email protected]

Department

CAI-CHENG LU of Electrical and Computer Engineering, University of Kentucky, KY40506, USA E-mail: [email protected]

Lexington,

We investigate preconditioned iterative solutions of large dense complex valued matrices arising from discretizing the integral equation of electromagnetic scattering. The main purpose of this study is to evaluate the efficiency of preconditioning techniques based on incomplete LU (ILU) factorization and sparse approximate inverse (SAI) for solving this class of dense matrices. We solve the electromagnetic wave equations using the BiCG method with the preconditioners in the context of a multilevel fast multipole algorithm (MLFMA). The novelty of this work is that the preconditioners are constructed using the near part block diagonal submatrix generated from the MLFMA. Experimental results show that the ILU and SAI preconditioners reduce the number of BiCG iterations substantially.

1

Introduction

The hybrid surface-volume integral equations for electromagnetic scattering problems can be formally written as follows, {Ls(r,r')

• J3(r') + Lv(rS)

-E + Ls(r,r')

• Js(r')

• Jv(r')}tan = -E&(r),

+ Lv(r,r')

• Jv(r')

= -£

inc

(r),

r e S, r e V,

where Emc stands for the excitation field produced by an instant radar, the subscript "tan" stands for taking the tangent component from the vector it applies to, and Ln,(£l = S,V), is an integral operator that maps the source JQ to electric field E(r) and it is defined as La(ry)

• Jn(r')

= iu>fib f (I + fe^VV) e ^ l ^ ' l / \ 4 n \ r - r'\) • J n ( r ' ) d n ' . Jn We follow the general steps of the method of moments (MoM) 5 to discretize the hybrid surface-volume integral equations, and solve the resultant matrix equation by a multilevel fast multipole algorithm (MLFMA), which is a multilevel implementation of the fast multipole method (FMM). The basic idea of the FMM is to convert the interaction of element-to-element to the interaction of group-to-group. Using the addition theorem for the free-space scalar Green's function, the matrix-vector product Ax can be written as Ax = {AD + AN)x +

VfAVsx,

where V/, A, and Vs are sparse matrices. The FMM speeds up the matrix-vector product operations and reduces the computational complexity of a matrix-vector

373

374 AD + AN

Figure 1. The sparse data structure of a dense matrix A (PIA) from electromagnetic scattering.

product from 0(iV 2 ) to 0(N1-5), where N is the order of the matrix 2 . The computational complexity is further reduced to O(NlogN) with the multilevel fast multipole algorithm (MLFMA) 1. The sparse data structure of a partitioned dense matrix A (the P I A case) is shown in Fig. 1. The left panel of the Fig. 1 shows the block diagonal part AD, the right panel of the Fig. 1 shows the block near-diagonal part AD + Aiv, and the far part of A is scattered in the rest of the area of the right panel of the Fig. 1. We iteratively solve the linear system of the form Ax = b, where the coefficient matrix A is a large scale, dense, and complex valued matrix for electrically large targets. We propose to use an incomplete lower-upper (ILU) triangular factorization with a dual dropping strategy 3 ' 4 and a sparse approximate inverse (SAI) preconditioner 6 to construct a preconditioner from the near part matrix (AD + AN) in the MLFMA implementation. By not using a static (prespecified) sparsity pattern, we hope to capture the most important (large magnitude) entries in constructing both preconditioners, while not to consume a large amount of memory. In our experimental study, we use the BiCG method as the iterative solver, coupled with the two preconditioning techniques.

2

Preconditioners

Most preconditioning techniques, such as the ILU(O), rely on a fixed sparsity pattern, obtained from the sparsified coefficient matrix by dropping small magnitude entries. Some SAI techniques need access to the full coefficient matrix (to construct a sparsified matrix), which is not available in the FMM. We evaluate two preconditoners, the ILU preconitioner with a dual dropping strategy (ILUT) (a fill-in parameter p and a drop tolerance r) 3 ' 4 and the SAI preconditioner 6 , using the no preconditioning case as comparison. The total storage of the ILUT preconditioner is bounded by 2pN. Here r controls the computational cost, and p controls the memory cost. We sparsify the AD + AN matrix with a drop-tolerence ei, and then construct the SAI preconditioner using the sparsified matrix A with two dropping tolerences «2 and e3 which are chosen by a heuristic process. By judiciously choosing those parameters, we are able to construct both preconditioners that are effective and do not use much memory space.

375 3

Numerical R e s u l t s and Analysis

In this section, we present examples to demonstrate the efficiency of the ILUT preconditioner and the SAI preconditioner compared to the case without a preconditioned in speeding up the BiCG iterations. ° The test problems are described in Table 1 and some numerical results are listed in Table 2.

Table 1. Information about the matrices used in the experiments (Ao is the wavelength in meters). cases P1A

level 4

P3A

7

unknowns 816 100,800

matrices AD AD + AN AD

AD + AN

nonzeros 25,122 53,296 3,571,808 7,211,632

target size and description 6x2 Conducting plate 22.25 X 22.25 Antenna array

In the case of P3A, the number of the ILUT iterations and the SAI iterations and the CPU time are relatively small compared to the BiCG method without a preconditioner. The sparsity ratio (ration) of the ILUT and SAI preconditioners is less than 1. Both preconditioners do not need a large amount of memory space. Fig. 2 shows the convergence history of the residual norm of the BiCG method with the SAI preconditioner, with the ILUT preconditioner, and without a preconditioner. It is shown that the BiCG method with both ILUT and SAI preconditioners converges faster than that without a preconditioner. Furthermore, SAI is seen to be faster than ILUT. Eigenvalues of the ILUT preconditioned matrix and the SAI preconditioned matrix are shifted to the right hand side of the origin and they are away from zero. They are also very compactly clustered around 1 (see the middle and third panels of the Fig. 3). In addition, the condition number of the eigenvector matrix of the preconditioned matrices is significantly decreased (the ILUT and SAI preconditoned matrices are closer to normal than the original matrix A). These features explain the good convergence behaviors of the ILUT preconditioned matrix and the SAI preconditioned matrix. According to our numerical experiments, we can see that the ILUT preconditioner and the SAI preconditioner constructed from the near part matrix improves the computational efficiency in the sense of reducing both memory and CPU time. The results show that the BiCG method with the ILUT preconditioner and the SAI preconditioner is robust for solving three dimensional model cases from electromagnetic scattering simulations. More deatiled numerical results can be found in a technical r e p o r t 3 .

a All cases are tested on one processor of an HP Superdome cluster at the University of Kentucky. The processor has 2 GB local memory and runs at 750 MHz. The code is written in Fortran 77 and is run in single precision.

376 Table 2. Numerical data of solving the P3A case (unknowns = 100,800). prec NONE ILUT SAI

r

V

lO-0

30

£1

£2

£3

0.03

0.04

0.02

ration

LiUcpu

itnum

114.07

0.77

423 42

103.69

0.27

31

3018.69 330.21

compcpu 3018.69 444.28

237.61

341.30

ttcpu

nmm Figure 2. Convergence history for solving the P3A case.

(LU^A

MA

%

Figure 3. Eigenvalue clustering of the matrices in the P1A case.

Acknowledgments This research work was supported in part by NSF under grants CCR-9902022, CCR9988165, CCR-0092532, ACI-0202934, and ECS-0093692, by DOE under grant DEFG02-02ER45961, by ONR under grant N00014-00-1-0605, by RIST (Japan), and by the University of Kentucky Research Committee. References 1. W.C. Chew, J.M. Jin, E. Midielssen, and J.M. Song, Artech, Boston, 2001. 2. R. Coifman et al., IEEE Antennas Propagat. Mag., 35(3):7-12,1993. 3. J. Lee, J. Zhang, and C. Lu, Technical Report No. 342-02, Department of Computer Science, University of Kentucky, Lexington, KY, 2002. 4. Y. Saad, Numer. Linear Algebra Appl., l(4):387-402, 1994. 5. Y.V. Vorobyev, Gordon & Breach Science Publishers, New York, 1965. 6. J. Zhang, Appl. Numer. Math., 35:67-86, 2000.

ARBITRARY ORDER EDGE ELEMENT FOR 2D EM SCATTERING

METHODS

K. MORGAN, P. D. LEDGER, O. HASSAN, N. P. WEATHERILL Civil & Computational Engineering Centre, University of Wales Swansea, Swansea SA2 8PP, Wales, U.K. E-mail: [email protected] J. P E R A I R E Aeronautics & Astronautics, MIT, Cambridge, MA 02139, U.S.A. E-mail: [email protected] Electromagnetic wave scattering problems in 2D are solved by means of an arbitrary order edge element approach. The development of an a-posteriori error estimator enables bounds to be placed on the scattering width distribution, which is a computed output of practical interest. Attention is drawn to the possibilities offered by the use of methods based upon reduced-order approximations.

1

Introduction

Edge element methods have proved to be extremely popular in the field of computational electromagnetics. An important contribution was made by Demkowicz and co-workers 1 , who developed a two dimensional hierarchical basis for edge elements which enabled fully adaptive hp approximations to be computed. The approach outlined in this paper follows similar lines, but employs the shape functions defined by Ainsworth and Coyle 2 . An a-posteriori error estimation capability is added by employing the approach developed by the group of Patera and Peraire 3 . The error estimator enables inexpensive, sharp, rigorous and constant free bounds to be obtained for the numerical error in computed outputs of practical interest. The error estimation technique is incorporated within the high order element framework and, for the selected application area of electromagnetic wave scattering problems, the selected output is the scattering width. We also draw attention to the use of reduced-order approximations in scattering simulations. These models are constructed from full finite element solutions for a small set of problem parameters and enable the rapid prediction of outputs for a new set of parameters 4 . 2

T h e E M Scattering P r o b l e m

Consider problems involving the interaction between single frequency electromagnetic waves and a scattering obstacle. The waves are generated by a source located in the far field and the obstacle, which is surrounded by free space, will be assumed to be a perfect electrical conductor (PEC). The surface of the scatterer is denoted by Tj for transverse magnetic (TM) simulations and by T2 for transverse electric (TE) simulations. The classical formulation of the problem is governed by Maxwell's equations and the unknowns are taken to be the scattered electric and magnetic field intensity vectors, E and H respectively. Far from the scattering obstacle, the

377

378 scattered field consists of outgoing waves only. To simulate this condition, a finite solution domain, Qf, surrounding the scatterer is selected and a perfectly matched layer (PML) technique is employed 5 . To achieve this, an artificial material layer, Clp, is added to Qf, with the outer surface of the PML denoted by T 3 . 2.1

Variational

Formulation

The solution is sought in the frequency domain, assuming a time variation of the form e l w t . Then, in terms of an unknown U, which is equal to the amplitude of the scattered electric field for T E simulations and of the scattered magnetic field for TM simulations, a weak variational formulation of the problem may be expressed as 6 : find U € ZD, such that A(U,W)=£{W)

VW€Z

(1)

The spaces employed here are defined by ZD = {v | v e W(curl; Q); n A v = - n A U{ on T 2 ; n A v = 0 on T 3 } Z = {v | v <= W(curl; fi); n A v = 0 on r 2 , n A v = 0 on T 3 }

(2) (3)

1

and this simplified formulation is valid for scattering problems involving prescribed non zero values of u). 2.2

Galerkin

Approximation

With finite element subspaces Zu of Z and Z^ of ZD, the Galerkin approximate solution UH 6 Z^ is such that

A(uH,w) = e{W)

vwezH

(4)

When edge elements are used to discretise the solution domain, an approximation of the W(curl;fi) space is obtained in which the tangential component of the solution is continuous across element edges. The family of arbitrary order triangular and quadrilateral edge elements proposed by Ainsworth and Coyle 2 is adopted. 2.3

Computing the Scattering

Width

In 2D scattering simulations, a non linear output of primary interest is the scattering width, or the radar cross section per unit length. The evaluation of this quantity requires the use of a near field to far field transformation, using solution information obtained on a collection surface, T c , lying in free space and totally enclosing the scatterer. When the approximate solution JJu has been computed, the scattering width integral is evaluated as S(C7H;d>) = C°{UH;)C°{UH;)

(5)

where tj> is the viewing angle, C° (t///; (j>) is defined in terms of an integral over the collection surface, Tc and the overbar denotes the complex conjugate.

379 3

Error Estimation

Outputs computed from solutions obtained on discretisations with a sufficiently high p and small enough mesh spacing h will, in general, be indistinguishable from the exact. However, such solutions can be expensive to compute. It is, therefore, important to be able to evaluate strict upper and lower bounds for specified outputs, such as the scattering width S(Uh, ), which are functions of the computed solution and an auxiliary variable <j>, in the form s-(UH,4>)<S(Uh,<j>)<s+(UH,4>)

(6)

To accomplish this, an extension of the a-posteriori finite element error bound procedure proposed by Sarrate, Peraire and Patera 3 for the Helmholtz equation may be employed. The method, which is capable of dealing with quadrilateral, triangular or hybrid discretisations 7 , reduces to a requirement for solving local Neumann sub-problems inside each element, with the balance between elements ensured by using Demkowicz's edge fluxes8. Linearisation is necessary when non-linear outputs, such as the scattering width, are considered, with the variable <j> interpreted as the viewing angle. Note that, with these procedures in place, an adaptive mesh procedure, based upon the computed error bound gap and with error indicator, A, defined as ^ = \ (s+ ~ s~)

(7)

can be readily implemented 7 . 4

Reduced Order Approximation

Reduced-order, or low-order, models have been shown to provide a powerful method for computing outputs in the areas such as turbomachinery 4 . When reduced-order models are employed, it is important to be able to construct associated rigorous constant free error bounds on the computed outputs 9 . 4-1

Bounding the Complete Scattering Width

Distribution

For scattering problems, a reduced-order model technique can be employed to provide a method of extending the pointwise error bounding capability described above by enabling the rapid calculation of error bounds for the scattering width for the complete spectrum of viewing angles 10 . 4-2

Constructing the Scattering Width Distribution for Different Incident

Angles

For design purposes, the analyst will often be interested in the calculation of the scattering width distribution for all possible incident wave angles. This information may also be rapidly determined using a reduced-order model and associated certainty bounds may be computed 11 .

380 5

Conclusions

An hp edge element procedure for the simulation of 2D electromagnetic wave scattering problems in the frequency domain has been outlined. Arbitrary order edge elements may be employed, with the computational domain truncated by using the PML approach. Error bounds may be determined on outputs of electromagnetic scattering problems. The practicality of using the method to bound the computed scattering width at prescribed viewing angles is of particular interest to aerospace engineers. The possibilities offered by the application of reduced-order modelling techniques to this problem area have also been noted. Acknowledgements Paul Ledger acknowledges the support of the UK Engineering and Physical Sciences Research Council (EPSRC) in the form of a PhD studentship under grant GR/M59112. Jaime Peraire acknowledges the support of EPSRC in the form of a visiting fellowship award under grant GR/N09084. References 1. L. Demkowicz and L. Vardapetyan, Comp. Meth. Appl. Mech. Eng. 152, 103 (1998). 2. M. Ainsworth and J. Coyle, Comp. Meth. Appl. Mech. Eng. 190, 6709 (2001). 3. J. Sarrate, J. Peraire and A. T. Patera, Int. J. Num. Meth. Fluids 3 1 , 17 (1999). 4. K. E. Wilcox, J. Peraire and J. White, Comp. Fluids 3 1 , 369 (1999). 5. J.-P. Berenger, J. Comp. Phys. 114, 185 (1994). 6. P. D. Ledger, O. Hassan, K. Morgan and N. P. Weatherill, Int. J. Num. Meth. Eng. 55, 339 (2002). 7. P. D. Ledger, K. Morgan, J. Peraire, O. Hassan and N. P. Weatherill, Int. J. Num. Meth. Fluids (2002) in press. 8. L. Demkowicz, ed. P. Ladaveze and J. T. Oden, New Advances in Adaptive Computational Methods in Mechanics (Elsevier, New York, 1998). 9. L. Machiels, Y. Maday and A. T. Patera, Comp. Meth. Appl. Mech. Eng. 190, 3413 (2001). 10. P. D. Ledger, K. Morgan, J. Peraire, O. Hassan and N. P. Weatherill, Fin. Elem. Anal. Des. (2002) in press. 11. P. D. Ledger, J. Peraire, K. Morgan, O. Hassan and N. P. Weatherill, submitted to J. Comp. Phys (2002).

PARALLELIZATION OF PRECORRECTED-FFT IN SCATTERING FIELD COMPUTATION YAO-JUN WANG The Computational

Electro-Magnetics & Electronics (CEE) Division, Institute of High Computing (IHPC), Singapore 117528

Department of Electrical and Computer Engineering National University of Singapore, 10 Kent Ridge Crescent, Singapore E-mail: [email protected] LE-WEI LI Department of Electrical and Computer Engineering The National University of Singapore, 10 Kent Ridge Crescent, Singapore High Performance

Computation for Engineered Systems (HPCES)

Singapore-MIT

Alliance (SMA), Singapore/USA E-mail: [email protected]

Performance

119260

119260

Programme

119260/02139

ER-PING LI The Computational

Electro-Magnetics & Electronics (CEE) Division, Institute of High Computing (IHPC), Singapore 117528 E-mail: [email protected]

Performance

Precorrected-FFT(PFFT) is a powerful algorithm for analyzing electromagnetic scattering with arbitrarily shaped three-dimensional objects. For available PFFT code running on a single processor machine, it will be impossible to get results in short time when the number of unknowns is huge. So it is necessary to perform the computation on high performance computers in order to efficiently solve the above problems. Actually for a real object, the scattering on an object due to an incident plane wave need to be computed from a range of continuous incident angles. Computation of different incident angles can be done parallel. When using PFFT algorithm, majority of time is spent on solving the matrix with FFT and the correction operation. If these two parts are parallel executed respectively, execution time can be reduced greatly. This paper presents the parallelization of PFFT algorithm and its implementation for computing large scalable scattering problems on high performance multiprocessor platforms and clusters. MPI(message passing interface) has been used for parallelizing the code.

1 Background High performance computers provide better platforms that solve EM problems. This paper presents the parallelization of Precorrected-FFT (PFFT) algorithm for large scalable scattering problems. The Precorrected-FFT algorithm is an excellent fast algorithm that can be applied in a wide variety of EM fields. Its best cost is 0(NlogN)[l]. Similarly to the other algorithms such as the fast multipole algorithm(FMM), the main difficulty of PFFT is how to approximate the long range potentials and how to compute the local interactions. The basic idea of PFFT is that

381

382

uniform grid potentials are used to represent the long distance potentials and directly calculate the nearby interactions. This includes four steps that are (1) projecting onto a grid, (2) computing grid potentials, (3) interpolating grid potentials and (4) precorrecting, respectively. The figure 1 displays the procedure [1,2].

< .

N

/<3>V

Figure 1. 2-D representation of the procedures of the Precorrected-FFT algorithm ( p = 2 )[1.2]

The code is written by Fortran 90 and runs on IBM p690. 2 Parallel Precorrected-FFT Algorithm In view of the problem of scattering, parallelization can be carried out in two ways. One layer is done according to incident angles and the other is parallelization of PFFT algorithm. 2.1 The first layer of parallelization Generally, 180° x 360° scanning need be done to get a complete distribution of scattering for asymmetric objects. Of course, computation can be reduced by half when objects are symmetric. So for a real object, the scattering on an object due to an incident plane wave need to be computed from a range of continuous incident angles. The whole incident angles can be divided into n groups by the sum of available processors as equal as possible. Each processor is responsible for the computation of a group. 2.2.1 The algorithm of the second layer of parallelization Theoretically, each of four steps of Precorrected-FFT algorithm can be parallel executed. However, the statistics of the execution time of each step shows that the third step (interpolating grid potentials) and the fourth step (correction) occupy most of CPU time, about 10-30% and 40-60% respectively. Let the variable rank represent the number of a processor and the first processor in a group of processors is numbered as po while the other

383

processors in this group are numbered as pi, p 2 ,..., p n , respectively. Then the algorithm of the second layer can be described as follows using pseudo code (scanning a 3-D object): IF (rank .eq. po) THEN Project the panel charges to the grid charges CALL MPI_SCATTER() Iscatter data from po to p0-p„ ENDIF ! compute convolution CALL 1-D FFT() for m times ! along axis x , compute FFT CALL 1 -D FFT() for n times ! along axis y CALL 1 -D FFT() for k times ! along axis z CALL 1 -D FFT() for m times ! along axis x , compute FFT 1 CALL 1 -D FFT() for n times ! along axis y CALL 1 -D FFT() for k times ! along axis z IF (rank .eq. po) THEN CALL MPI_GATHER() Igather data from p0-p„ to p 0 Interpolate the grid potentials to the panels ENDIF Correction Definition ! po-p„ IF(ra«fc.ne.po)THEN CALL MPI_SEND( ) ! send data from pi-p„ to pO ENDIF IF (rank .eq. p0) THEN DOm=l,n IF (m .ne. 1) THEN CALL MPI_RECV() Ireceive data from pi-p„ ENDIF Correction Operation ENDDO ENDIF

2.2.2 Memory allocation The operation of correction definition is sensitive to memory size. The goal of this operation is to finish the correction definition of every unknown before correction starts. Generally, there are a few hundreds of corrections for one unknown. When the unknowns increase to a large size, the memory required by the operation of correction definition will become too huge to be satisfied. In order to solve this problem, the correction definition is divided by the number of available processors (assuming the number is n) into n parts and is implemented individually by each processor because correction definition of each unknown is not related to others. Then the memory required by this operation can be reduced to 1/n' . Because the correction definition is ordered, the main processor can acquire the correction definition from the other processors orderly when correction operation is done. 3 The result of the experiment The result of the first layer is omitted since it is easy to apply such a parallelism. Figure 2 shows an example of the second layer. The scattering on a metal sphere is computed. The wavelength is set to be 1 meter. The

384

surface of the sphere whose radius is 1 meter is divided into 5538 unknowns. -S § j| *! H

550 440 330 220 110

pO*—L

-—n- - •-- m

i

2

3

4

5

— « — Practical Time(s)

527.93

384.26

317.68

281.09

265.12

—•— Ideal Time(s)

527.93

263.97

131.98

65.99

33

1

2

4

8

16

Number of Processors

Figure 2. Parallel Computing Time I

4 Conclusion The experiment results show that the running time is shortened greatly after the computation is parallelized. This proves that the algorithm proposed in this paper is efficient in reducing the CPU time. References 1. J. R. Phillips and J. K. White, A precorrected-FFT method for electrostatic analysis of complicated 3-D structures, IEEE Trans. Computer-Aided Design of Integrated Circuits and Systems, Vol.16, No.l0(Oct., 1997), pp. 1059-1072. 2. Xiaochun Nie, Le-Wei Li, Ning Yuan and Jacob K. White, Fast Analysis of Sattering by Arbitrarily Shaped Three-Dimensional Objects Using the Precorrected-FFT Method, Microwave and Optical Technology Letters(September 20, 2002). 3. N.R. Alum, V.B. Nadkarni and J. White, A Parallel Precorrected FFT Based Capacitance Extraction Program for Signal Integrity Analysis, 33rd Design Automation Conference, DAC 96-06/96 Las Vegas, NV,USA 4. Eleanor Chu, Alan George, Inside the FFT black box : serial and parallel fast Fourier transform algorithms(Boca Raton, Fla.: CRC Press, c2000).

FINITE ELEMENT ANALYSIS OF PHOTONIC CRYSTAL FIBRES RUI YANG AND YILONG LU School of Electrical and Electronic Engineering, Nanyang Technological Univ., Singapore 639798 Email: [email protected] Photonic crystal fibre (PCF) has attracted much intension in recent years because of their unusual optical properties. To accurately evaluate propagation characteristics of PCF's, vectorial wave analysis is necessary. In this paper we will present the computer simulation and modal analysis of PCF's by using a powerful finite element method (FEM) solver. In this solver, a combination of edge elements for the transverse field and nodal elements for the longitudinal field is used together with Perfect Match Layer (PML) to cope with the open domain. Based on the realistic field simulator, one can control the dispersion and polarization properties in PCF's by modifying solely the geometrical parameters of these fibers and open the door for optimal design of better PCF's.

1

Introduction

Optical fibers and integrated optical waveguides are today finding wide use in areas covering telecommunications, sensor technology, spectroscopy, and medicine. However, within the past decade the research in new purpose-built materials has opened up the possibilities of localizing and controlling light in cavities and waveguides by a new physical mechanism, namely the photonic bandgap (PBG) effect. The PBG effect may be achieved in periodically structured materials having a periodicity on the scale of the optical wavelength. Such periodic structures are usually referred to as photonic crystals, or photonic bandgap structures [1-2]. Holey optical fibers (HOFs), which are also known as photonic crystal fibers (PCFs), have attracted a lot of intension recently because of their unusual optical properties such as extra large chromatic dispersion, a wide range single mode operation. The complex nature of the cladding structure of the HOFs does not allow for the direct use of methods from traditional fiber theory. Especially for the novel HOF, operating by the PBG effect, the full vectorial nature of the electromagnetic waves has to be taken in account. In this paper, we present finite element magnetic and electric field models for determining the propagation modes in dielectric wave guiding structures. A combination of edge elements for the transverse field and nodal elements for the longitudinal field is used together with Perfect Match Layer (PML) to cope with the open domain.

385

386

2

Basic Equations and Finite Element Formulation

To analyze electromagnetic wave propagation in an inhomogeneous waveguide, the finite element method is employed in the framework of the Galerkin formulation of the weighted residual method to solve the vector Helmholtz equation: V x - V x E = ^£rE

(1)

Mr

where e-e + a/(jco) and e and a represent the permittivity and conductivity, respectively, of dielectric materials. kl = CQ2M0SO a n d £r=s/eo. Assuming for all of the field components the dependence from the spatial coordinate z of the form exp(-yz), with y-a + jfi as the complex propagation constant, and subdividing the electric field into its transverse (E,) and longitudinal (E z ) parts, we get: E(x, y, z) = [E, (*, y) + z Ez (*, y)]e-v (2) Substituting (2) into equation (1), and splitting it into its transverse and longitudinal parts, we can get: V,x(— V,x e ,)-y 2 —(V t e z + et) = kleret Mr

0)

Mr

72S/tx[—(V,ez + et)^z]^y2kl8rezz

(4)

Mr

where e, = yEt and ez = Ez- And equation (3)(4) must be resolved with the boundary conditions: n x e , = 0 ez = 0 (5) at perfectly conducting material, and: (Vtez + et)n = 0 V , x e , = 0 (6) at magnetic walls. To apply the weighted residual procedure, two sets of basis functions and two corresponding sets of weighted functions have to be defined. Since the Galerkin formulation is adopted, each set of weighting functions is equal to the corresponding set of basis functions. We use the vectorial shape functions a\e) as the set of basis function to express the approximate e|e) to the exact transverse part e, of the electric field on the element (e): ei' ) U,y) = Ee? ) oy ) (Jc,y)

(7)

387

and we use the nodal shape functions a,e) to express the approximate e[e) to the exact longitudinal component ez of the electric field on element (e): e(ze)(x,y) =

±e^)a(Je)(x,y)

(8)

7=1

by using the finite element expansion of the unknown field on element (e), we substitute (7)(8) into (3)(4) and annihilate the residue, we can get: "0

o

0

Ms(te)]-klsAr'M

~|r 77(e)

M

Ms[e)]-kieMe)] Mr

-[G^J1

(9)

Mr

lE(te) Mr

Mr

U

where the entries of the local matrices are given in [3]. After assembling all elements and zeroing the residuals, we can get the final generalized eigenvalue problems. Once the normalized operating frequency £o *s fixed, we can compute the propagation and attenuation constants of the characteristic modes of the guiding structures, which can be used to plot the dispersion diagram. 3

Numerical Results and Conclusion

Here we specifically analyzed this kind of photonic crystal fibers which have several rings of air holes around the core. Figure 1. the geometry of PCFs with a ring and two rings of six airs holes is given. Figure 2. is the calculated propagation constant verse working frequency band. Where the dotted line, is just the main propagation mode of common circular air waveguide. The solid line is when there are several rings of air holes around the core, no matter how many rings around the core; there is almost no any difference. Because that the most part of the transmission power are confined in the area inside the first ring. The outer side rings influence propagation properties very slightly. The similar results have been presented in [4]. Based on this efficient analysis method, we can carefully study the unusual optical properties of PCFs, and we can combine with other optimization algorithm such as Genetic Algorithm, we can optimize the design of the geometry of PCFs.

388

O OI

O

O OIO

o •MD— o o -o

6

Figure 1 PCFs with a ring and two rings of air holes

xicf without any air holes with a ring of six hoies

I

r x 10 s

Figure 2 Freq VS Propagation Constant

References 1. Broeng J, "Photonic Crystal Fibers: A New Class of Optical Waveguides". Optical Fiber Technology 5, 305-330 (1999) 2. Bjarklev A. and Riishede J, "Photonic crystal fibers - a variety of applications". Proceedings of the 2002 4th International Conference on Transparent Optical Networks, Volume: 2, 97 -97 (2002) 3. Koshiba M. and Inoue K., "Simple and efficient finite-element analysis of microwave and optical waveguides". Microwave Theory and Techniques, IEEE Transactions on, Volume: 40 Issue: 2, 371-377 (1992) 4. Bjarklev A., Broeng J., Barkou S. E. and Dridi K., "Dispersion properties of photonic crystal fibers". ECOC'98, Madrid, Sept. 1998.

PARALLELIZATION OF FAST MULTIPOLE METHOD USING MPI ON IBM HIGH PERFORMANCE COMPUTERS WU FANG, ZHANG YAOJIANG, EDWIN LIM CHAN PING, LI ERPING Division of Computational Electro-magnetics & Electronics, Institute of High Performance Computing, 1 Science Park Road, #01-01 The Capricorn Singapore Science Park 11. Singapore 117528 E-mail: wufang @ihpc.a-star.edu.sg Massively parallel, distributed memory computers provide the increases in computing performance needed to solve the largest problems in computational electromagnetics. Here, we present a parallel implementation of the fast multipole method (FMM) by using Message Passing Interface (MPI) as the communication back-end. The issues and options in its parallelization are identified, and domain decomposition strategies to suit these are implemented. Good parallelization is exhibited, with the most costly parts of the algorithm displaying essentially linear speedup. Demonstrations using the supercomputer IBM p690 are given. The parallel fast multipole algorithm presented here is scalable, portable, and efficient.

1

Introduction

Electromagnetic scattering analysis for radar cross-section (RCS) prediction and electromagnetic compatibility analysis present very large computational demands. Massively parallel, distributed memory computers provide the increases in computing performance needed to solve the large-scale problems. Therefore, the parallel fast multipole method (FMM), a powerful and efficient algorithm in solving large-scale computational electromagnetics problems, is developed. In order to achieve portability over various multi-computers, the Message Passing Interface (MPI) [1] was used for communication. Some of the unique characteristics of the parallel implementation are presented in this paper. To effectively parallelize the sequential FMM code in the distributed memory system, a non-blocking communication scheme is implemented to reduce communication and synchronisation overhead. Good load balancing among processors is achieved by using carefully designed group partitioning technique. By implementation of dynamic matrix allocation and shrinking strategies in the parallel program, the method achieved is portable, scalable, and efficient. The parallel FMM code is written in MPI FORTRAN 90, and it has been implemented in many different platforms, such as Unix C shell of IBM supercomputer p690 model 681, and IBM Linux Cluster X-series 330. The numerical results show that, by using parallel implementation, the memory consumed in each processor reduces close to half and the running time decreases sharply when the number of processor is doubled. The paper will also present the advancements of the parallel code. 2

Fast Multipole Method

Fast multipole method was initially proposed by V. Rokhlin for speeding up the solution of acoustic wave scattering [2], and later was extended to Maxwell equations by R.Coifman, et al [3]. Research group of Prof. Chew further developed it into multilevel version and successfully applied it into large-scale simulation of aircraft RCS [4]. By

389

390

reducing the computation of far field interaction, FMM transforms conventional dense matrix of moment method into sparse one as following: where ZNN

(1) Zm,x+{VTv)x=b stands for the interaction of near field and is calculated exactly by MoM while

V and T denote aggregation and translation matrix in FMM, respectively. V* is the transpose conjugate matrix of V . Vector x and b represent the unknowns and exciting source respectively. A sequential version of the fast multipole method consists of four main modules: parameter assignment, coefficient matrices calculation, Conjugate Gradient (CG) iterations, and Radar Cross Section (RCS) calculation. We paralleled the sequential 3D fast multipole method using Message Passing Interface (MPI) as the communication back-end, so that the implementation achieves the portability. 3

Parallel Implementation

As we found, when 3D FMM sequential code is implemented, nearly 95% of the CPU time is consumed on coefficient matrices calculation and CG iterations, therefore the parallel strategies are focused on these most time consuming potion of the code. First of all, we need to fully understand the structure of these coefficient matrices^ , T and NN. All these coefficient matrices in equation (1) are sparse ones, and their typical sparse structure can be shown as

Figure 3-1 Sparse Matrices structure of equation (1)

To efficiently solve this essential equation (1) by iterative solver such as conjugate gradient method (CG), we applied some unique parallel strategies: Scaling and shrinking the coefficient matrices so that all the computing power available are utilized effectively, optimal communication scheme is used to achieve the communication efficiency, and load balancing is controlled by group mapping method. 3.1

Scaling and shrinking the Coefficient Matrices

In the parallel code, instead of storing whole matrix in one processor in the sequential code, only part of these sparse and large-scaled coefficient matrices are calculated and saved in one processor. As shown in Figure 3-2, the shaded blocks are non-zero elements of aggregation matrix based on group sequence. If the number of groups in the coefficient matrix is G, and the number of processors used is P, only G/P sized matrix are calculated and saved in each processor. It is necessary to emphasize that only the non-zero elements are saved in memory, based on group sequence. To further saving the memory, coeffcient matrices are allocated and deallocated dynamically. Using this kind of data allocation method minimizes memory used in each processor. Compared with sequential code, the memory consumed in each processor reduced sharply, and the CPU time reduced

391

correspondingly. This technique of shrinking arrays improves the scalability of the parallel code.

Figure 3-2 Memory distribution of Aggregation Matrix for multi-processor

3.2

Optimal Neighboring Communication

Although only part of the coefficient matrices is stored and calculated locally within one processor, there is no communication established among processors before CG iterations, due to the geometric information are saved locally in each processor. In the CG iteration, to exam the vector x in equation (1), the matrix-vector multiplication is implemented within every step of the iteration. Two communications are involved to parallel the multiplication. The following is the parallel strategy applied to CG iteration: 1. The far field matrix-vector multiplication, V*TVx, for a given x: a) Multiplying Ml=Vx within each processor, NO communication established. b) BroadcastingMi among processors and multiplying M 2 = TM,(= TVx) locally within each processor, communication established. c) Finally calculating V+A/2(= V*TVx), NO communication established. 2. The near field matrix-vector multiplication, ZNNx • There's NO communication established. Summing up the result vectors of far field and near field, passing it to one processor to do the termination control, communication established. The communications in stepl (b) and step 3 can not be avoided. In this case, to ensure that the communication scheme is optimized, the computation and communication are overlapped as much as possible and only minimum number of messages are communicated among processors. 3.

3.3

Group mapping for Load Balancing

In this section, we describe a group mapping scheme for balancing the CPU load across the processors. As illustrated in Section 3.1, the non-zero elements of coefficient matrices are sorted in group sequence. Since the elements in each group are automatically appointed based on the finest level of the geometric partition, the number of elements in each group is similar. The outer loop decomposition is based on these data blocks to distribute the jobs evenly to processors. In this way, processors receive the nearly same number of groups and quite even workload. The advantages of the group mapping strategy include (1) the computation load among processors is about equal when the number of elements of each group is uniform (or close to uniform) and (2) the computation of matrix coefficients can be done without communication among processors.

392

4

Parallel Results and Performance

The parallel 3D FMM code was implemented and tested on IBM supercomputer, IBM p690 model 681 which has 7 nodes with 32 CPU (1.3GHz) per node and 64G RAM on each node. The parallel implementation of FMM provides an efficient way to reduce the memory and time consumed, which are critical factors while simulating Large-scale electromagnetic scattering problems. The memory consumed in one processor and the elapsed time used are nearly reduced to half when the number of the processor doubles, as shown in Figure 4-1 and 4-2. Bapsed Time vs. Number of CPUs

Memory consumed in 1 CPU vs. Number of CPUs 100000

1

100

jj

10000

I

1000 ~s~ _

I f

100

UJ

10 2

2 4 £ Number of CPUs used

4

8

Number of CPUs used

Figure 4-1 Memory consumed in one CPU

Figure4-2 Elapsed time used

The speed-up ratio, shown in Figure 4-3, presents that the parallel implementation performs even better when the number of unknowns increases. The load balance is achieved during the simulation (Figure 4-4). Speed-up Ratio vs. Number of CPUs • ™s—

3 6.0

^^

56,448 unknow ns 10,368 unknowns ideal speed-up ratio

= 5.0 |

4.0

5-3.0

I

I

€LL ^

W~

!

./j**^^"^ I

2.0 1.0

_,

,

j

Number of CPUs used

Figure 4-3 Speed-up Ratio for different number of unknowns

Figure 4-4 Load Balance when solving 56,448 unknowns

References 1. M. Snir, S. W. Otto, S. Huss-Lederman, D. W. Walker, and J. Dongarra. MPl: The Commplete Reference. Scientific and Engineering Computation Series. The MIT Press, Cambridge, MA, 1996 2. V. Rokhlin, Rapid solution of integral equations of scattering theory in two dimensions, J. Computat. Phys., vol.86, no. 2, pp. 414-439, 1990. 3. R. Coifman, V. Rokhlin, and S. Wandzura, The fast multipole method for the wave equation: a pedestrian description, IEEE Antennas Propagat. Mag., vol.35, no.3, pp.7-12, 1993. 4. W. C. Chew, J. M. Jin, E. Michielssen, and J. M. Song, Fast and efficient algorithms in computational electromagnetics, Artech House Inc, Boston, London, 2001.

Parallel Unstructured Meshes Approach for t h e Simulation of Electromagnetic Scattering

O. Hassan, J. Jones, B. Larwood, K. Morgan and N. P. Weatherill Civil & Computational Engineering Centre, University of Wales Swansea, Swansea SA2 8PP, Wales, U.K. E-mail: [email protected] A numerical procedure for the simulation of 3D problems involving the scattering of electromagnetic waves is presented. The solution algorithm employs an explicit finite element procedure for the solution of Maxwell's curl equations in the time domain using unstructured tetrahedral meshes. A PML absorbing layer is added at the artificial far field boundary that is created by the truncation of the physical domain prior to the numerical solution. The complete solution procedure is parallelised and several large scale examples are included to demonstrate the computational performance that may be achieved by the proposed approach and to evaluate the limitation of the present software on the AHPCRC T3E-1200 with 1024 processors.

1

Introduction

The accurate simulation of 3D electromagnetic scattering problems of current industrial interest, in realistic time scales, poses major computational challenges. We will address some of these challenges in the context of problems involving the interaction between waves, generated by a source in the far field, and a scatterer of general shape. Difficulties associated with mesh generation are reduced by adopting the unstructured mesh approach, with a fully automatic unstructured mesh generation procedure 4 ' 8 . The solution algorithm employed is based upon the application of an explicit linear Taylor-Galerkin finite element procedure 3 to Maxwell's curl equations. With this method, both the electric and magnetic fields are assumed to vary in a continuous piecewise linear fashion 6 . The non-reflective boundary condition, that must be imposed at the truncated far field boundary that is created to enable numerical simulation, is handled by surrounding the computational domain by an artificial perfectly matched layer (PML). The parameters in the PML equations are defined in such a manner that the amount of reflection from the far field boundary is decreased l'2. The use of the PML is found to lead to a significant reduction in computational costs compared to those associated with the use of traditional local absorbing boundary condition approximations. To enable the solution of large scale problems on current computer platforms, the complete simulation process is parallelised. The computational performance that can be achieved by the resulting capability is demonstrated by including the results of a number of scattering simulations involving plane single frequency incident waves. 2

T h e Governing Equations

Consider the simulation of scattering of single frequency plane incident electromagnetic waves by an obstacle that is surrounded by free space. It is assumed that

393

394 the incident waves are produced by a general source located in the far field. In three dimensions, Maxwell's curl equations for a general linear isotropic material, of relative permittivity e and relative permeability //, can be written, in conservative vector form, and in the dimensionless form dU dt

dFk _ dxk

(1)

where

[/ =

y,H* tEs

S =

( 1 - M ) dt. dEl (1-6) dt

i = 1,2,3 j = 4,5,6

(2)

Here E s and H s denote the scattered electric and magnetic field intensity vectors respectively, E 8 and H* denote the incident electric and magnetic field intensity vectors respectively and Ejke denotes the alternating symbol. The total field is decomposed into an incident field and a scattered field. The incident field can be dropped since it will be specified by the problem definition and will automatically satisfy the Maxwell's equations. 3

Numerical Solution Algorithm

An approximate solution to the scattering problem is obtained by using a two-step finite element Taylor-Galerkin procedure 6 . This procedure, which is notionally second order accurate in both time and space 3 , is outlined briefly here for completeness. The solution of equation (1) is advanced over one timestep, from time level t = tm to time level t = tm+\ = tm + At, in a two step fashion. With the computational domain represented by a general unstructured grid of 3-noded linear triangular elements, the solution \j{m} and the fluxes Fk^m^ at timei = tm are linearly interpolated, over each element S in the grid. In the computational implementation, the solution at time £ m +i/2 = tm + At/2 is obtained by employing the forward difference approximation according to k g{m} _ 9F £ dxk

At/2

{m}

(3)

This results in a piecewise linear discontinuous approximation to the solution at the time level < m +i/2- The solution at the time level tm+\ is obtained following a Galerkin approximate variational formulation 6 , The resulting equation takes the form {m+l} u{jm}) M (uy f S{*}JVjdft- f F{n*}Nidn At Jei (4)

E "

+ JM

°xk

395 The quantities required at time level < m +i/2 are to be computed using the values obtained for U^*' in the first step of equation (3). Here Fn denotes a normal flux on FjVi, which is computed according to the boundary condition being simulated, and M[j denotes the entries in the consistent linear finite element mass matrix. Equation (4) may be readily solved by either lumping this matrix or by explicit iteration 6 . Material interface and perfect conductor boundary conditions are imposed through the boundary integral term in equation (4). This term is evaluated using a local characteristic decomposition at the boundary 7 . The implementation of this procedure in the current context has been described in detail elsewhere 6 and will not be repeated here. Through this approach, the boundary conditions may be regarded as being imposed in a weak sense only. The non-reflective condition that is required at the truncated far field boundary is achieved by the addition of a P M L l to the exterior of the computational domain. In the examples presented here, the truncated outer boundary is always taken to be a regular hexahedron and the PML is discretised using a structured mesh of tetrahedral elements. The formulation which is implemented follows the work of Bonnet and Poupaud 2 , 4

Parallel Implementation

This basic algorithm has already been validated for a number of different scattering problems. However, the nature of the algorithm means that the required mesh size will increase rapidly when the method is applied to the solution of problems involving the electrically large scatterers which arise when realistic frequencies and geometries are considered. Such simulations will require the use of significant computational resources and, in this case, the use of parallel computers becomes essential. It should be noted that the success of this route will require not only a parallel implementation of the basic Maxwell equation solver but, in addition, the effective parallelisation of the mesh generation and solution visualisation stages. The approach adopted for parallel mesh generation is based upon a geometrical partitioning of the domain 9 . The complete domain is divided into a set of smaller sub-domains and a mesh is generated independently in each sub-domain. The combination of the sub-domain meshes produces the mesh for the complete domain. A manager/worker model is employed in which the initial work is performed by the manager, before distributing the mesh generation tasks to the workers. There are a number of different approaches available for serially decomposing a given unstructured mesh. However, for the current application, it is envisaged that the mesh data sets will be too large to load onto one processor. Therefore, the partitioning process has to be parallelised and distributed amongst the processors at all times. The present implementation utilises the ParMetis library for the partitioning 5 . This procedure produces high quality partitions in a fast, robust and parallel manner. In the parallel implementation of the solution algorithm, elements are owned by only one domain and are not duplicated, while points are owned by one domain and are duplicated. This strategy enables data locality to be achieved during the gather

396

y-AAA/V "•I..*-

Figure 1: Scattering of a plane wave by a coated PEC sphere of diameter D = 3A showing (a) the computed contours of -E| (b) comparison between the exact and computed distributions of the scattering width.

process, from points to elements, and the scatter process, from elements to points, and hence there is no need to communicate. For each time step, the interface nodes obtain contributions from more than one domain. 5

Numerical Examples

Two examples are considered to illustrate the performance of the integrated software. The first example involves scattering of a plane single frequency incident wave by a coated perfectly conducting sphere. The sphere diameter D = 3A and the dielectric coating is of thickness t = 0.25D. The coating is characterised by the material properties t = 2.56,/x = 1. The mesh employed in the region between the sphere and the far field boundary consists of 3 296 694 tetrahedra and 599 399 nodes. The structured PML region contains 820 380 tetrahedra and 149 160 nodes. The solution is advanced through 60 cycles of the incident wave and the computed contours of £ | a r e displayed in Figure 1(a). The exact and the computed distribution of the scattering width are seen to be in very good agreement in Figure 1(b). The final example uses the procedure to simulate the scattering of a plane wave by a PEC aircraft. The aircraft length is 10 wavelengths and the mesh employed consists of approximately 7.2 million elements and 1.35 million nodes. The PML is

, jtv\ ' A

J

It

K |

n i

Figure 2: Scattering of a plane wave by a PEC aircraft showing (a) computed contours of / / | on the aircraft surface (b) the predicted distribution of the scattering width

located at a distance of one half wavelength from the aircraft and has a total thick-

397 ness equal to one wavelength. The PML region, consisting of 10 layers of elements, has approximately 1.4 million elements and 0.27 million nodes. The solution was advanced for 40 cycles and the computed contours of /f| on the aircraft surface are shown in Figure 2(a). The computed distribution of the RCS is displayed in Figure 2(b). 6

Conclusions

A numerical procedure that enables the parallel simulation of three dimensional electromagnetic scattering problems using automatically generated unstructured tetrahedra meshes has been presented. The solution algorithm employs a scattered field formulation and a two step Taylor-Galerkin time stepping scheme. The truncated far field boundary condition is imposed by the addition of a PML. Parallel mesh generation is accomplished by a Delaunay procedure, following a geometrical partitioning of the domain. A number of computationally challenging examples have been included to demonstrate the numerical performance of the proposed procedure. References 1. Berenger J. P., A perfectly matched layer for free-space simulation in finitedifference computer codes, Journal of Computational Physics, 1994;114: 185200 2. Bonnet F. and Poupaud F., Berenger absorbing boundary condition with time finite-volume scheme for triangular meshes, Applied Numerical Mathematics 1997; 25: 333-354 3. Donea J., A Taylor-Galerkin method for convection transport problems, Interernational Journal for Numerical Methods in Engineering, 1984; 20: 101— 119 4. George P. L., Automatic Mesh Generation. Applications to Finite Element Methods, Wiley: Chichester, 1991 5. Karypis G. and Kumar V., Multilevel k-way partitioning scheme for irregular graphs, Journal of Parallel and Distributed Computing 1998; 48: 96-129 6. Morgan K., Hassan O. and Peraire J., An unstructured grid algorithm for the solution of Maxwell's equations in the time domain, International Journal for Numerical Methods in Fluids, 1994; 19: 849-863 7. Shankar V., Hall W. F., Mohammadian A. and Rowell C , Theory and application of time-domain electromagnetics using CFD techniques, Course Notes, University of California Davis, 1993 8. Weatherill N. P. and Hassan O., Efficient three-dimensional Delaunay triangulation with automatic point creation and imposed boundary constraints, International Journal for Numerical Methods in Engineering 1994; 37: 20052040 9. Weatherill N. P., Hassan O., Morgan K., Jones J. W. and Larwood B., Towards fully parallel aerospace simulations on unstructured meshes, Engineering Computations" 2001; 18:347-375

DEVELOPMENT OF PARTING LINE GENERATION TOOLS FOR A 3D CAD INJECTION MOULD SYSTEM

W. M. CHAN Institute of High Performance Computing, 1 Science Park Road, #01-01 The Capricorn, Science Park II, Singapore 117528, SINGAPORE E-mail: [email protected]

Singapore

S. L. LIEOW Kas Equipment and Trade, 623 Aljunied Road, #04-04 Aljunied Industrial Complex, 389835, SINGAPORE E-mail: [email protected]

Singapore

Parting line tools for use to generate core, cavity and inserts in a 3-dimension CAD injection mould system are presented in this paper. These software tools provide efficient ways to generate parting lines in complicated solid model mould products. Using these tools parting lines are generated by selecting portions of the product such as face, point or edge. The more advance parting line tools require selection of a combination of edge, face or point to define the cutting plane. Mathematics of 3-dimension global co-ordinate transformation to local 2-dimension work-plane co-ordinates is formulated. This formulation is used in one of the tools to generate the parting line. Some examples of how these parting line tools work are shown using a commercial 3-dimension CAD software.

1

Introduction

As product life cycle decreases the demand to shorten mould design time increases [1]. In many cases design changes and modifications are made to the product. A way to shorten design time is to automate the mould design process [2]. Parting line generation is one of the key components in mould design. Tools developed to aid parting line generation can thus help reduce the design time. Researchers have developed ways to aid parting line generation. Wong et al. [3] proposed a method to determine the cutting plane of complex shaped product. Their method uses an algorithm that slices the product. Nee et al. [4] described a methodology to determine optimal parting lines. A methodology that generates non-planar parting lines and surfaces was presented by Nee et al. [5]. Hui [6] introduced a blockage test that determines the interference between product, side core and split core to aid parting line positioning. Ravi and Srinivasan [7] proposed a parting line generation method using section and silhouette. Other research focuses on rules to generate parting lines. Nine rules that can be used by the mould design engineer to develop a suitable parting line in the product are presented by Ravi and Srinivasan [8]. These rules are projected area, flatness, draw, draft, undercuts, dimensional stability, flash, machined surfaces and directional solidification. Ganter and Tuss [9] proposed a method to locate the parting line for cast parts using a set of rules: center of gravity, principle axes and direction of draw specified by the user. Parting line generation for complex parts involve more engineering judgement and knowledge. The parting line position of the core and cavity is not simple to determine using algorithms and rules alone. Often engineering judgement and experience is used to determine the parting line.

398

399

Some parting line generation tools are developed to aid the generation of core, cavity and inserts using engineering knowledge and judgement. These tools provide efficient ways to generate parting lines. Using these tools parting lines are generated by selecting portions of the product such as face, point or edge. The more advanced parting line tools require selection of a combination of edge, face or point to define the cutting plane. These tools are presented in the following sections. Examples of how some of these parting lines tool work are shown using commercial 3-dimension CAD software. 2

Parting Line Tools

Eight parting line tools are presented as follows: 1. Cutting plane on a point: This function defines the cutting plane on a selected part point as shown in Figure 1. The cutting plane orientation is selected to be perpendicular to one of the three major axes. This plane can be shifted by a userdefined amount along the axis. Also, the user can adjust the plane angle. The right figure shows the part cut by the plane. 2. Cutting plane perpendicular to two points on a straight line: This function positions the cutting plane perpendicular to two points on a straight line. Two points are selected to define the straight line. The user can specify an amount to shift the cutting plane along the straight line. 3. Cutting plane perpendicular to two points: This function positions the cutting plane at the centre of two selected points. The orientation of the plane is perpendicular to the direction of the two points. A user-defined amount can be specified to shift the cutting plane between the two points. 4. Cutting plane parallel to part face: The cutting plane is defined as parallel to the selected part face, see Figure 2. This cutting plane can be shift by an amount in the direction of either into or out of the part. The right figure shows the mould cavity after the cutting process. 5. Circular cutting plane: This function defines a circular cutting plane. Selecting the circular hole on a part generates the circular cutting plane with the same diameter as the hole. Changing the cutting plane diameter creates an annulus (see Figure 3). 6. Cutting plane perpendicular to the edge and positioned on the middle of the edge: This function defines a cutting plane that is perpendicular to a selected edge. The function automatically positions the cutting plane on the middle of the edge. Shifting of the cutting plane is achieved by a user-defined amount. 7. Cutting plane defined on the middle of the face with rotate plane option: This function defines a cutting plane that is perpendicular to the selected face, see Figure 4. Position of this cutting is at the middle of the face. The user can rotate the cutting plane direction. 8. Cutting plane defined by edges: Selecting the part face defines that the cutting plane is perpendicular to that face. The cutting profile is defined by selecting the edges (see Figure 5).

400

Figure 1. Cutting plane on a point.

Figure 2. Cutting plane parallel to part face.

"8F

Figure 4. Cutting plane defined on the

Figure 3. Circular cutting plane.

middle of the face with rotate plane option.

Figure 5. Cutting plane defined by edges.

Figure 6. Global to local workplane co-ordinate transformation.

3

Co-ordinate Global to Local Workplane Transformation

Global co-ordinate to local work-plane co-ordinate transformation is used when creating cutting planes in orientations other than those of the major axes. Figure 6 shows a local work-plane. Points 1, 2, 3 and 4 are global coordinates. Point 4 is the mid-point between line 13. Local co-ordinate transformation is shown below.

a = yj{x\ - xlf + (y\ - yl)2 + (d - zl)2 b = 0.5xyj(x\ -xlf x4 = 0.5x(xl-jc3)

• fa^ti2

;y4 = 0.5x(_vl-)>3)

0 = 2cos

^

+ (yl - y3)2 + (zl - z3)2

z4 = 0.5x(zl - z 3 )

w3 = acosf?

u\ = a

v3 = asin0

vl = 0

M2 = 0

v2 = 0

401 The work-plane co-ordinates for points 1, 2 and 3 are (ul, vl), (u2, v2) and (u3, v3), respectively. Note that this method of transformation does not involve matrix rotation and thus certain blind spots are avoided. This transformation is used in the "Cutting plane defined by edges" function to position the cutting plane. CONCLUSION This paper presented eight tools developed to generate parting lines in a 3dimension CAD injection mould system: cutting plane on a point; cutting plane perpendicular to 2 points on a straight line; cutting plane perpendicular to two points; cutting plane parallel to part face; circular cutting plane; cutting plane perpendicular to the edge and positioned on the middle of the edge; cutting plane defined on the middle of the face with rotate plane option; and cutting plane defined by edges. These tools provide an efficient and simple method to generate the core, cavity and inserts. Usage of these tools is demonstrated in commercial 3-dimension CAD software, SolidDesigner. Also, mathematical co-ordinate 3-D global to 2-D local work-plane transformation is formulated. REFERENCES [1] Altan T., Lilly B. W., Kruth J. P., Konig W., Tonshoff H. K., Van Luttervelt C. A. and Khairy A. B., "Advanced Techniques for die and mold manufacturing", Annals of the CIRP, 42(2), 707-716, 1993. [2] Fu M. W., Fuh J. Y. H. and Nee A. Y. C , "Undercut feature recognition in an injection mould design system", Computer-Aided Design, 31, 777-790, 1999. [3] Wong T., Tan S. T. and Sze W. S., "Parting line formation by slicing a 3D model", Engineering with Computers, 14, 330-343, 1998. [4] Nee A. Y. C , Fu M. W., Fuh J. Y. H., Lee K. S. and Zhang Y. F., "Determination of optimal parting direction in plastic injection mould design", Annals of the CIRP, 46(1), 429-432, 1997. [5] Nee A. Y. C, Fu M. W., Fuh J. Y. H., Lee K. S. and Zhang Y. F., "Automatic determination of 3-D parting lines and surfaces in plastic injection mould design", Annals of the CIRP, 47(1), 95-98, 1998. [6] Hui K. C, "Geometric aspects of the moldability of parts", Computer-Aided Design, 29, 197-208, 1997. [7] Ravi B. and Srinivasan M. N., "Computer-Aided parting surface generation", Proceedings ASME Manufacturing International Conference, Atlanta, 125-129, 1990. [8] Ravi B. and Srinivasan M. N., "Decision criteria for computer-aided parting surface design", Computer-Aided Design, 22, 11-18, 1990. [9] Ganter M. A. and Tuss L. L., "Computer-assisted parting line development for cast pattern production", Transactions of the American Foundrymen's Society, 795-800, 1990.

SIMULATION OF TEMPERATURE AND STRESS FIELD IN DEPOSITION PROCESS FOR RPST BY HOMOGENIZATION METHOD GUILAN WANG, ZHIHUA XU, HAIOU ZHANG State Key Lab. of Plastic Forming Simulation and Die&Mold Tech. Huazhong Science & Technology. Wuhan, 430074, Hubei, P. R. China Email: [email protected], [email protected]

University of

Rapid plasma spray tooling (RPST) is a process that can quickly make molds from rapid prototyping or nature patterns without limitation of pattern's size or material. In previous research, two-scale asymptotic homogenization theory was introduced to predict the effective properties of plasma sprayed coating as functions of pore volume fraction, and the temperature field in deposition process for RPST was simulated with two dimensional plane models. The purpose of this paper is to simulate the temperature and stress field and explore a way for the simulation of curved substrate models through a axisymmetric rotating model. The macro-micro mathematical and mechanical models are established by homogenization method. The effect of scanning path of spray gun in deposition process for RPST is discussed and the simulation for the spray on axisymmetric rotating substrate model in deposition process is performed by the developed FEM software system deposition.rapid plasma spray tooling, homogenization theory, axisymmetric, finite-element method

1

Introduction

Rapid plasma spray tooling (RPST) has gained more and more attention because it can be used to make metal molds from rapid prototyping or nature patterns [1]. During the process of metal spraying, the molten metal particles impact onto the substrate with high speed, then deposit and freeze layer by layer so as to form porous coating with the residual stress. It is necessary to research on eliminating or reducing the residual stress which is one of the main factors that result in coating deforming, crazing, peeling, etc. For the traditional continuous media mechanics method cannot be used effectively to describe the microstructure and the mechanical behavior of porous coating, homogenization method, as an effective method able to be applied in many areas of physics and engineering, is applied to this work. In previous work, the coating growth and pore formation have been simulated, and the homogenization method has been applied to model and simulate the temperature field with the flat substrate model [2,3]. In this paper, the effect of scanning path in deposition process for RPST is discussed through the simulation of stress field. As is all known, there are many axisymmetric and non-axisyrnmetric concavo-convex shapes on the surface of mold in practical manufacture, thus, the simulation for the axisymmetric rotating substrate model is also performed in this paper.

402

403

2 Modeling for temperature and stress field by homogenization method An effective approach to Temperature displacement,strain,stress obtain the temperature Macro model Macro model and stress field of discontinuous media is, by homogenization method, to establish relative mathematical M M M M M M M M M M t* J [ - 1 l - l l - l M [ • ] M M M [ • J M M M W M M M M M M models which can be used Micio model Micro model to reflect the influence caused by the :-ySSsK&.; ;,,.;«.:;.: , sis^ssMi&ssaas&ci^ discontinuity of material Figure 1. Analysis of the temperature and stress field without considering microstructure at each point. The homogenization method is based on the idea of asymptotic expansion of 3>£ (x), as shown in Ref. [4]

1}

l - H - l l - l l - l 1*1 [ » l [»1 [ • ] ( • ] [»1

3>e ( x ) = <E>°(x, y) + eO1 (x, y) + • • • (e = xly(0 < e « 1 ) )

(l)

where <E>£ (x) is the field function, for example, in the temperature field, O e (x) should be replaced by TE (x), x denotes the macro-scale variable, and y denotes the micro-scale variable. According to the principle of virtual work, the increment format of thermal stress yields

f E^d^dAv'dQ=\ ,jkl

Jil'

dxt ?r

dxj ?>r

E^Ael^-dQ "kl

i^

U

(2)

?r

In respect to the homogenization theory, extend Aw£ into an asymptotic expansion Au£ (x) = Aw° (x, y) + eAu1 (x, y) + •

y = xl e

(3)

Substitute formula (3) into (2), the equilibrium equation, which describes the macro -micro mechanical behavior, can be available like this

f E* M M

JY

>

'

dx,

dyj

J y =

f E,Mi h.

'

^M.y dxj

, Vv G VY (4)

where Aekl = CCkl \+ 'T—'T)= CCklAT, OCkl is the coefficient of thermal expansion and E"L is the matrix of elastoplasticity increment.

404

3 FEA results and discussion 3.1 Effect of scanning path in deposition process By simulating the temperature and stress field with four different kinds of scanning path (Fig.3): (a)vertical V=0.1m/s to the long side; (b) parallel to the Environment Temp. 20°C long side;(c)spire out; (d)spire in, Initial Temp.: 20°C Time step: the effect of scanning path in At = 0.02 s deposition process is discussed. Powder feeding: 100mm m=60g/min Fig.2 is a FEM model with geometry size signed in model. The model boundary conditions 4mm are set as: on the two side areas, the right and the left, the first temperature boundary condition Figure 2. The macro flat substrate model. T=20°C.The heat flux on the lower area is • 10w/m2. The physical n parameters of the substrate and the powder have been a " b given in Table 1. Compare the von Mises stress peak f i value as shown in figure3, the result show that the peak value in (c) spire out Figure 3. Von Mises Stress vs. Scanning Path is the lowest. Table 1 Material Physical Parameters Parameters

Substrate

Powder

Heat Conductivity (w/(m • °C))

50

50

Specific Heat (J/(kg • °Q) Thermal Expansion Coefficient (m/°C)

520

520

1.3E-05

1.7E-05

Density (g/cm3) Elasticity Module (MPa) Poisson Ratio

7850

7850

1.0E+05 0.3

1.0E+05 0.3

Yield Intensity (MPa)

200

200

Plasticity Strengthen Coefficient (MPa)

5.0E+04

5.0E+04

405

In addition, the pictures show that the peak value of stress at longer straight line is higher than that on the short straight line, and the stress in the fields of turn is smaller than that on the long straight line 3.2 The simulation in the deposition process of the cylinder substrate Fig.4 is a quarter part of a cylinder substrate model, we take inner radius=60mm, outer radius= 65mm and axial length=80mm. Its boundary conditions are set on the model. The displacement constraint (the yellow) on the two side circumferential areas is set to zero, the first temperature Figure 4. A quarter part of cylinder substrate model boundary condition (the blue) is T=20°C on the axial areas, and the heat flux (the red arrow) on the inner area is q=10w/m2. Under the same calculating conditions as given in (section 3.1), the coating growth, the temperature and stress field are simulated as shown in Fig.5.

.s

M-

Figure 5. The coating growth, temp, field and stress field

4 Conclusion 1. Homogenization method can be applied to simulate the temperature and stress field in Deposition Process for RPST; 2. The peak value of von Mises stress with the spire out scanning path is the lowest, which assist to maintain the stability of original shape,

406

furthermore, the scanning path should consist of fold lines and curves as many as possible and the line segment should be as short as possible to reduce the peak value of stress. 3. The temperature and stress field with axisymmetric rotating substrate models can be simulated by the developed FEM software system, which gives a cue that the simulation is possible for arbitrary geometry shape models in the future. Acknowledgements This research was funded by the Ministry of Science and Technology and the National Natural Science Foundation of China through research grants 2001AA421150 and 50075032, respectively. The authors would like to thank Dr. Yanxiang chen References 1. 2.

3.

4.

H. Zhang, G. Wang, T. Nakagawa, in: T. Nakagawa Ed. Proceedings of the 8th International Conference on Rapid Prototyping, June 12_13, Tokyo, Japan, 2000, p. 444. YanxiangChen, GuilanWang, Haiou Zhang, Numerical Simulation of Coating Growth and Pore Formation in Rapid Plasma Spray Tooling, 5* Asia-Pacific Conference on Plasma Science & Technology, 13th Symposium on Plasma Science for Materials, 2000.9.10-13, Dalian, China. GuilanWang, YanxiangChen, Haiou Zhang, Homogenization theory applied to plasma sprayed coating: modeling and numerical simulation of the temperature field, in: H. Bin (Ed.), Computer Aided Production Engineering CAPE2001, Professional Engineering Publishing Limited, IMechE, London, UK, 2001, pp.355-358. B. Hassani, E. Hinton *, A review of homogenization and topology optimization I-analytical and numerical solution of homogenization Equations, Computers and structures 69 (1998) 719-738

GEOMETRIC MODEL AND NUMERICAL SIMULATION FOR THE LAYING PROCESS OF WIRE ROPE GUILAN WANG, JIANFANG SUN AND HAIOU ZHANG State Key Laboratory of Plastic Forming Simulation and Die & Mold Tech., Huazhong University of Science and Technology, Wuhan, 430074, PR China E-mail: [email protected] ABSTRACT - A mathematical model of wire rope considering space geometric structure and characteristic of the laying process is proposed with self-rotating ratio. Depending on the position in the stranded rope the wires can be in the form of single, double and triple helices, the vector of which can be obtained. And a program for calculating boundary conditions of a three-dimensional finite element modeling is developed to create data which can be accessed by ANSYS for nonlinear analysis. Additionally, die deformation of once laying and secondary laying acting on the wires of a strand or wire rope is analyzed. The results show better agreement with the previous analytical solutions. Keywords: Geometric model, laying forming, self-rotating ratio, forming stress, FE analysis, wire rope

1

Introduction

Many researchers have devoted to the behaviors of stranded rope under axial loads (tension, shear, bending and torsion) [1-3]. But they fail to predict the complicated laying process of wire rope. Especially the laying process with self-rotating ratio is rarely studied. Self-rotating ratio is defined as a ratio of angular velocity of wire /strand wound around its own axis in the opposite direction to its laying direction to the angular velocity of wire /strand wound around the centroidal axis of strand /rope, which is an important parameter of laying process and is called as self-rotating ratio of wire rope kr and that of strand ks, respectively. In this paper we present geometric model of the laying process including the classification of wires and the vector equations of helices in stranded rope. Structural discretization and boundary conditions are given for FE analysis. Moreover, we solve numerial examples of once laing and secondary laying with different self-rotating ratio. Finally, concluding remarks are given. 2

Geometric mathematic model

A typical stranded rope with independent wire rope core (IWRC) is composed of two major structural elements. One is the strand and the other is the core. It is assumed in this work that all wires have a circular cross-section and remains circular when deformed. The centroidal axes of

407

408

a wire and a strand are selected to represent the path of the wire and the strand used to study the geometric characteristics that are related to deformation. The generatrices of each wire are as well studied because the self-rotating ratio is taken into account. There is at most one straight wire located in the center of a rope. The remaining wires and their generatrices can be classified geometrically into three groups: single helices, double helices or triple helices. When a straight strand is laid, the free end of outer wire moves and the other end is fixed. The out wire has a single helical form because it is wound only once around a straight axis. If a strand is helically wound around into a rope, the center wire of the strand has a single helical form. All of the other wires have a double helical form because they are wound twice, once around the strand axis and another around the rope axis. Furthermore, if the self-rotating ratios of strand and rope are considered, generatrices of the helical wire have triple helical form in rope because the helical wire is wound thrice, once around the strand axis, another around its own axis and the other around the rope axis. The centroidal axes of both the wire and the strand can be considered to be lying on right circular cylinders developed into a plane. Some basic relationships can be established by using the developed views shown in Figure 1. Without self-rotating ratio of rope spread of shown in Figure 1(a), the length of the helical wire straight wire rope Sr and the length of strand Ss in rope are: sr=rrertgp

rr9r

s.= c o s / ?

r

where rr and /?r are helix radius and helix angle of rope, respectively, 6r is angle of rope rotation. Similarly, in strand Ss and Swc are:

Developed view of strand and helical wire in IWRC rope

Ss=rsestgps,

e "

"ire

=

COS^,

where rs, (5S and Swc are helix radius, helix angle of strand and the spread length of helical wire, respectively, 6S is angle of strand rotation. Because Ss in rope equals that in strand, ds can be obtained. However, if the self-rotating ratio of rope is taken into account shown in Figure 1(b), 9S is t

t

changed into 6S . 6S and 0S can be expressed:

409

-kr)8r

-M,=(- r zosP tgP s

parallel to x axis single helix parallel to ns axis Figure 2. Helices model in IWRC Rope with self-rotating ratio

r

(1)

s

As shown in Figure 2, let A and B presented by vector R and P, be a point of centroidal axis of strand and a point of axis of helical wire, which have single helical form and double helical form, respectively. The vector equation of R for single helix in global system is expressed below. ns, b s and ts which configure the Frenet-frame are normal vector, binormal vector and tangential vector at point A. Bold face letters identify vectors or

matrix. R

rr cos Qr 1 *4,1 = 'jyA )• = •{>•, sine } f - cos Gr ]

(2)

f sin Pr sin 6r 1

ns=\ - s i n # r )• , b s =-j-siny5 r cos# r \

- cos Pr sin dr 1 •\ COSy9r COS^ f

\

(3)

\ 0 lj l[ cos fi, lj \ siny9r lj The vector Q traces the axis of a double helical wire in Frenet-frame ns-bs-ts. Because the head of R is located exactly at the tail of Q, the vector P can be obtained through vector addition. So the vector equation of P is: ^

ti

rr cos8r + rs(-cosOr cos#, + sin/?r sin#r sin#.

P = \yB }•=•{ rr sinf9r +r,(-sin^ r cos^ -siny9r sin^r sin05 ) \ \*B\

(4)

rrtgfir6r +rs cosfir sin^ s

In a similar way the vector equation of triple helix can be derived. 3

FE analysis results and discussion

Three-dimensional solid brick elements were used for the structural discretization. When a wire rope is laid, the displacements of nodes at the free end of the helical wires are calculated by a program, which is developed with C++. After inputting geometric data, discretization rules, material and process parameters, the files of boundary conditions accessed by ANSYS for nonlinear FE analysis can be created.

410 Table 1. The geometric data and process parameters of strand/wire rope.

strand/wire rope IWRC1X7 Core Stran

d

OWRC7X7)

Diameter of helical wire (mm)

Pitch length (mm)

Diameter (mm)

Diameter of center wire (mm)

8.50

2.86

2.86

102.5

—

2.34

2.66

52.3

0.0,0.5,1.0, 1.2 1.0

—

2.14

2.34

52.3

1.0

Self-rotating ratio

With JIS SWRS72A wire a simple 1 X 7-wire strand and IWRC 7X 7-wire rope have been analyzed in this paper. Young's modulus, yield stress and coefficient of work-hardening of the material are 183.9 GPa, 1290 MPa and 0.033, respectively. Table 1 details the geometric data and process parameters. The pitch length and self-rotating ratio of IWRC 7X7 are 133 mm and 1.0. The deformation of the helical wire in 1 X 7-wire strand is depicted in Figure 3. With different self- rotating ratio interior axial stress is compressive stress while exterior axial stress is tensile stress. When ks equals zero, shear stress in the cross-section is the largest; the distribution of Von Misses stress is concentric circle, and the center of cross-section is elastic area while the outer is plastic area far more than the elastic area, which results in smaller elastic spring back. With the increase of ks the distribution of tensile stress and compressive stress are increasing while the absolute value of shear stress is decreasing; the plastic area of Misses stress is decreasing and the elastic area interlacing with the plastic area is increasing, which results in larger spring back. When ks equals 1.0, shear stress is the smallest, which helps to reduce the ratio of broken-off wire; concentric circle distribution of Misses stress is changed into band shape and the plastic area is far less than the elastic. When ks equals 1.2, distribution of Misses stress is changed into concentric circle again. Additionally, the distribution of equivalent plastic strain is similar to that of Misses stress. But when ks equals 1.2, concentric circle distribution of plastic strain is less apparent than that of Misses stress. Furthermore, it can be deduced that with nonzero self-rotating ratio the helical wire in strand is in close contact with the center wire and the property of the whole strand is good because of smaller shear stress, equivalent plastic strain and torsion stress. So taking shape stability, safety and tensile strength into consideration, the reasonable value of self-rotating ratio should be slightly more than 1.0.

411

The deformation of IWRC 7 X 7-wire rope is shown in Figure 4. The stress and strain are non-uniformly distributed along the helical wire. With the position change of helical wire in stranded rope, the absolute values of stress and strain in the secondary laying are apparently more than those in the once laying. And the self-rotating ratio with the value of 1.0 is favorable.

Figure 3. Stress and strain of once laying in IWRC 1 X 7-wire strand

4

Figure 4. Stress and strain of secondary laying in IWRC 7 X 7-wire rope

Conclusions

A geometric mathematical model for the laying process of wire rope has been presented for FE analysis. Both once laying and secondary laying examples indicate that a reasonable self-rotating ratio should be slightly more than 1.0. The results have shown better agreement with the results in literature [4]. A model that considers more complex cross-section and contact condition of wire rope should be adopted in future research. And this is being undertaken in our current research. Acknowledgements The authors gratefully acknowledge the support of the Ministry of Education of China through research grant [1998] 679.

412

References 1. Costello, G. A., Theory of Wire Rope (Springer Verlag, New York, 1997) pp. 24-102. 2. Utting W. S. and Jones N., The response of wire rope strands to axial tensile loads: partD. International Journal of Mechanical Science 29(9)(1987) pp. 605-619. 3. Jiang, W. G. and Henshall J. L., A novel finite element model for helical spring. Finite Elementa in Analysis and Design 35(2000) pp. 363-377. 4. Guilan Wang and Haiou Zhang, The computational simulation by elasto-plastic FEM for the laying of metal wire rope. China Mechanical Engineering 12(5)(2001) pp. 7-10.

AGENT-BASED COMPOSABLE SIMULATION FOR VIRTUAL PROTOTYPING W. X I A N G Institute of High Performance Computing,! Science Park Road, #01-01 The Singapore Science Park II, Singapore 117528 E-mail: [email protected]

Capricorn,

S. C. FOK, G. THIMM Design Research Centre, School of Mechanical & Production Engineering, Nanyang Technological University, Nanyang Ave, Singapore 639798 System performance evaluation without real physical prototype is an attractive feature in virtual prototyping. This paper proposes an agent-based composable simulation framework to address the challenges in virtual prototyping. The concept is to use an agent to manage a component. A circuit of components is then equivalent to a configuration of agents. Domain agent is proposed to represent various virtual components in a system. The implementation of "composable" simulation in virtual prototyping evaluation depends on the communication and collaboration of the multiple agents. A case study is done in domain of fluid power system design. This work describes initial effort towards the development of an intelligent distributed environment for the virtual prototyping.

1

Introduction

A prototype normally has to be developed for evaluation of system design. A general system prototyping can be classified into physical prototyping and virtual prototyping. The physical prototyping process involves the evaluation by the real commercial components. It can be tedious, time consuming, and expensive. While virtual prototyping can be regarded as a computer-aided design process, which consists of modeling and simulating tools to address the broad issues of evaluation under various operating environments [1]. To fully exploit the advantages of virtual prototyping, following challenges need to be addressed. • Integration Various features of the system prototype such as system dynamics, assembly, and maintenance have to be considered, since in the real physical world, slight changes in one domain often have profound implications in others [2]. • Composability A promising idea in the future trend of a simulated-based-design is the concept of prebuilt models. However, composability is still a frontier subject in modeling and simulation, and current capability is limited [3]. • Distributed coordination To integrate virtual components to a virtual system requires more than a simple conversion of each component feature model into an individual entity. It requires a mechanism to allow a group of component models to communicate and engage in cooperative tasks and capable of adapting to changing circumstances [4]. • Interaction and reality The interaction and reality challenges in virtual prototyping concerns many implementation details like web-accessibility, interactive and operation, etc [5].

413

414

This paper proposes an agent-based composable simulation framework to address the above challenges. It regards each component as an agent and the system as a multi-agent system. First, an agent-based virtual component representation is introduced. Section 3 further addresses the composable simulation based on agent communication and collaboration, a simple case study is presented for validation. Finally is the conclusion. 2

Agent-based Virtual Component Representation

In a specific system, components interact with each other to give the overall system function. This fundamental concept is analogous to an agent-based framework. If a component can be represented as an agent, then a circuit of components is equivalent to a configuration of agents. The agents will cooperate and interact to achieve the overall features of the system. An approach to integrate the component features through an object-based virtual model, controlled by an associated domain agent is introduced in this paper. Figure 1 shows the representation of domain agent. The domain agent essentially consists of three parts: domain knowledge set, control unit and the component model.

* List of acquaintances * Domain knowledge Douiata XBOmWgs Set Domain Agent

Figure 1 . Representation

of domain

agent

The domain knowledge set contains essential data and knowledge required by the agent to perform its activities. Unlike the component data, which remains static, some knowledge needs to be continuously updated either by an agent itself or by a system designer. The knowledge base has an acquaintance list of other agents that it can directly communicate within the system. This acquaintance list reflects the change of assembly. The control unit is responsible for the communication with other agents and the subsequent reaction. The message parser first parses an incoming message. Then the message handler analyzes the message using the agent's knowledge properties or component model's features, and finally responds to the message if necessary. The ensemble of control units of all domain agents constitutes a part of the coordination mechanism of the system. The concurrent execution of the control units of the agents determines the emergence behavior for the resulting system. A virtual component must provide sufficient information for consideration in the product life-cycle support activities including design, analysis, test, documentation, assembly, and administration. The virtual component model is defined as following: Definition 1. A virtual component model is a 4- tuple: ModelComponem : (/, G, N,Pr), where I is the set of component interface features, i.e. the set ofports for component communication, G is the set of geometrical features representing the physical properties of the component, i.e. the 3-D graphics model with position and orientation information, N is the neural network model representing the component's behavior, Pr is the set of product attributes associated with the component.

415 3

Agent-based Composable Simulation

The system performance is caused by the interaction among all components in a designed system. Since agents are used to manage components and to simulate the components interaction by inter agent communication in a distributed way, the implementation of composable simulation depends on the communication and collaboration of the multiagent system. 3.1

Agent Communication and Collaboration

Agent communication requires a common language. All messages passed in proposed virtual prototyping are written in Knowledge Query Manipulation Language (KQML) format. Agent collaboration is realized by two ways: direct collaboration and collaboration through a control agent. Direct collaboration is used to reduce the message flow within the system and to obtain a stronger harmony among the domain agents, whereas the other achieves a higher-level management. Composable simulation is proposed based on the agent communication and collaboration. Domain agents register for managing different components' virtual models. These domain agents monitor the changes of the important performance parameters of the relative controlled components, communicate, recalculate and propagate the state when they check any changes. By inter agents' communication and collaboration, virtual prototyping system's dynamics is simulated. 3.2

Agent-based composable simulation

The idea of agent-based composable simulation is proposed based on the agent communication and collaboration. Domain agents register for managing different components' virtual models. These domain agents monitor the changes of the important performance parameters of the relative controlled components, communicate, recalculate and propagate the state when they check any changes. By inter agents' communication and collaboration, virtual prototyping system's dynamics is simulated. Such agent-based composable simulation is validated by a simple case study in domain of fluid power system. 3.2.1

Case study: a simple validation experiment in domain of fluid power system

A simple experiment is configured according to the circuit in Figure 2. The experimental transient value of pressure P, flowrate Ql, Qr, and Qp are measured by transducers located in the system. Domain agent 'pump-agent' and 'valve agent' respectively monitor component 'pump' and 'pressure relief valve' and to exchange the pressure and flowrate according to the rule: the pressure at the connection port is the same, while the flowrate sums to zero. When an agent receives KQML message of changing value occurred in system, it triggers the calculation of the behavior model (neural network model) and generates a new output based on the network's inputs. Then this new output value is propagated to other communicated agents. With such a communication, the dynamic behavior can be simulated. Such agent-based composable simulation can be described as a combined behavior model by following equations.

416 -calculated pressure P

*

experimental pressure P

— ^

Assure relief valve

Figure 2. The circuit of the fluid power system

P=Nrdief_,AP{-k\QA-k)]

Figure 3. The validation result

(2)

The validation experiment was set to change the Ql (flowrate through load) and then the pressure was measured accordingly. Validation result was the comparison between the calculated pressure through our agent-based composable simulation and the measured experimental pressure shown in figure 3. The result shows that the proposed agent-based composable simulation can evaluate the dynamic performance of a fluid power system. Moreover, the errors may expect to be reduced by organizing a more accurate experiment and by building more accurate behavior models. 4

Conclusion

A framework of an agent-based composable simulation is proposed for virtual prototyping. Since the integrated virtual component models were used to represent individual components, composable simulation can be realized by combining all available virtual models for forming a virtual prototype of a final designed system, and getting the system evaluation easily. The idea of composable simulation is analogous to an agentbased framework. The structure of the domain agent is introduced. Composable simulation is implemented based on the communication and collaboration among all domain agents. It is validated by the experimental of a simple fluid power system. References 1. Shyamsundar, N and Gadh, R., Internet-based collaborative product design with assembly features and virtual design spaces, Computer Aided Design, 33 (2001), pp 637-651 2. Fok, S. C , Xiang, W. and Yap, F. F., Feature-based Component Models for Virtual Prototyping of Hydraulic Systems, Int. J. Adv Manu. Tech., 18(9), (2001), pp665672. 3. Diaz-Calderon, A., A composable simulation environment to support the design of mechatronic Systems. Ph.D. Thesis. Carnegie Mellon Univ., Pittsburgh, PA. (2000) 4. Wooldridge, M. and Jennings, N. R., Intelligent Agents: Theory and Practice. The Knowledge Engineering Review. 10(2), (1995), ppl 15-152. 5. Kruger, W.and Bohn, C.A., The responsive workbench: a virtual work environment. Computer, 28(7), (1995), pp.42-48.

KNOWLEDGE-BASED RAPID VIRTUAL ENGINEERING SYSTEM FOR PRODUCT AND TOOLING DESIGN R. D. JIANG Institute of High Performance Computing, 1 Science Park Road, #01-01 Capricorn, Singapore Science Park II, Singapore 117528 E-mail: iiangrd®ihpc.a-siar.edu.sg T. W. LIM Molex Singapore Pte Ltd, 110 International Road, Jurong, Singapore 629174 E-mail: [email protected] B. T. CHEOK Ministry of Defence, 303 Gombak Drvie, #03-29 Singapore 669645 E-mail: [email protected]

This paper presents a knowledge-based rapid virtual engineering system for product and tooling design. The integrated knowledge-based design system consists of product design, progressive die design, injection mould design and process planning and NC code generation system. In order to achieve higher productivity, accuracy and design re-use, a database is built from proven past engineering design complemented by a series of CAE simulation studies. The engineering knowledge base covers different requirements, design intent, performance and various know-hows for the design of product, progressive die and injection mould, etc. The knowledge base is captured, maintained, and accessible by a master model that resides in the enterprise's knowledge repository. This makes it possible for multi-disciplinary optimization of products and processes, and the designers have high degree of confidence to the product quality and reduced time-to-market.

1

Introduction

Product design is a highly complex process. There are no common rules nor de facto procedures to follow. Even in the same company, the products may vary in a wide variety and change with time. To be a product designer, he/she has to consider the product quality itself and its influence to the tooling design, manufacturing process and assembly condition, etc. In the connector industry, the product design has a special close relationship with the die and mould design process. One of the solutions is to call project meetings among product designers, die designers and mould designers so that different groups of designers who concentrate on different areas can come to such a design that the inter-relationship and affections between the upstream and downstream are resolved. However, this process needs transfer of information among designers and errors may be introduced. This also makes the product design cycle become longer. This paper proposed a Knowledge-Based Rapid Virtual Engineering System (KBRVES) for the product and tooling design of connectors. The integrated software environment can perform product design, die design and mould design functions that make use of the company's proven design rules. All the engineering standardization and heuristic rules can be reused and expanded. With the company's knowledge base, the KBRVES system has the capability to overlook the product design, die design and mould design processes and reduce the lead-time and errors.

417

418 2

System Structure

The system is made up of four subsystems - The product design system, progressive die design system, injection mould design system, process planning and NC code generation system, as shown in Figure 1. There are a knowledge-base engine and engineering knowledge base to support the integrated intelligent product and tooling design system. KB-RVES for Product and Tooling Development

....JJ-.

<£

Product Design

Feature Input

Planning

Tr Unfold

Z3EI Design

Firing Rules

Rules

^r Nesting

3D Solids

^r Templates

TT

3T

KB-Die

KB-Mould

=l>

Initializing

Mould base

v Load Part

Pocket

IT

I

Drawings

JL

Shrinkage

FEA <^

^>

Structures

I

\ Layout

Drawings

HI

Process Planning & NC Code Generation Figure 1. System Structure of KB-RVES

2.1

Product Design

To develop a product from an idea to manufactured parts ready for assembly includes quite a number of processes. This needs industrial designers and engineers working very closely to design and produce the products. A multidisciplinary team comprised of computer science, electrical design, mechanical design, and manufacturing skills must be assembled. For innovative product design and short time-to-market requirement, it is necessary to incorporate a database of good engineering knowledge based on past successful cases and a series of computer-based CAE simulation studies. The schematic structure of product design system is shown in Figure 2. The engineering knowledge base for product design is composed of several types of design knowledge [1,3,5]: Heuristic rules, engineering table and CAE simulation. The heuristic rules are proven successful design which are represented using conditional statements in the form IF THEN . The consequent part may be an action. The engineering tables are set up by the reliability-test data. CAE simulation is used to determine certain critical design parameters. This simulation can also be webbased. An interface is designed to perform the design violation check and recommendation based on the engineering knowledge. With the recommendation from the system, the designer will have the confidence to model his product in a 3D CAD environment.

419 3D CAD Environment Feature-based Modeler

3T Interface (Design Violation & recommendation)

Heuristic Rules

Engineering Tables .

JX

_IE_

CAE Simulation

"""IE

Company Engineering Knowledge-base Figure 2.Schematic Structure of product design system

2.2

Knowledge-based Progressive Die Design (KB-Die)

Progressive die design is an important component of tooling engineering, it is a skillintensive task and experience-driven process. Incorporating the design know-how into the system will make the design lead-time reduced significantly. The use of KB and solid modeling technology also allows the system to be tightly coupled with FEA analysis programs such that "on-line" metal forming simulation can be conducted during the design stage. This sub-system accepts the output of product design system. The major design processes are [2]: 1. Feature Modeling: This is the input module in which all sheet metal features are input and identified. These features are hole, burring, lance, emboss, half-cut, coining, marking, bending, etc. The information entered is stored as a Features-Tree that would be used for down-stream design activities. A solid model of the part is produced automatically for verification. 2. Unfold: This module unfolds the 3D stampings modeled in Feature modeler. 3. Template manager: This module manages the various die templates used by a company to design the "non-working areas" and ancillary tooling arrangements of the die. It can be used to build up and manage a library of die templates 4. Process Planning: This is the key stage in which domain expert knowledge is applied. Operation arrangement, design of working and non-working areas, etc. are all completed in this process. 5. Die configuration: After process planning, it is time to fire the rules to complete the design of the components and plates and establishing their relationships. 6. ToolViewer: The output from the system can be viewed using the ToolViewer. You can view a single component, a sub-assembly or the entire die assembly. 7. Auto-dimensioning: This module extracts the design information from upstream design phase and automatically produce the detail drawings of plates and components, assembly drawings, BOM, etc. 2.3

Knowledge-based Mould Design (KB-Mould)

There are a number of theoretical research and practical system for injection mould design [4]. The productivity improvement is limited for a generic design system of injection

420

mould. A system with built-in company rules and customized company standards has great advantage in terms of shortening design lead-time. The knowledge-based mould design subsystem adopts a lot of design rules and several standard mould bases. It can be divided into initialization and detailed mould design: 1. Initialization: In this step, shrinkage analysis, number of cavities, cavity layout and mould base selection, etc. are carried out. 2. Detailed mould design: Ejection direction, parting line/surface, core/cavity blocks and inserts design, feeding system, cooling system and venting system are completed with the support of various libraries and knowledge base. Finally a complete assembly tree is setup and 2D drawings, BOM are generated. 2.4

Process Planning and NC Code Generation

This subsystem is integrated with KB-Die and KB-Mould systems and takes the outputs from these two subsystems in either 3D or 2D format. It is capable of offering users rich knowledge-based engineering capabilities and enhancements for interactively or automatically recognizing manufacturing features from die and mold design, planning manufacturing processes and generating NC codes. 3

Conclusion

With the increasing complexity of new products and stringent requirements of customers, more and more time is spent in translating a concept between designers, engineers, modelmakers, and tool builders. This paper proposed a knowledge-based rapid virtual engineering system to build a good bridge over this multidisciplinary team so that the concept refinement and design optimization can be achieved. The system has the support of proven enterprise design rules, which are managed and expended by a knowledgebased shell. Benchmarks on several subsystems have shown that the design lead-time has reduced significantly. 4

Acknowledgements

The authors wish to thank our colleagues at IHPC and Molex Singapore Pte Ltd for their support and collaboration throughout all the projects. References 1. Riel. Object Oriented Design Heuristics, Addison-Wesley, 1996 2. B.T. Cheok, A.Y.C. Nee, Configuration of Progressive Dies, Artificial Intelligence for Engineering Design, Analysis & Manufacturing: Vol.12, No.5, 1998, pp 405-418. 3. G. Chryssolouris and K. Wright, Knowledge-Based Systems in Manufacturing, Annals of the CIRP Vol.35/2/1986, pp 437-439. 4. Altan T., Lilly B. W., Kruth J. P., Konig W., Tonshoff H. K., Van Luttervelt C. A. and Khairy A. B., Advanced Techniques for die and mold manufacturing, Annals of the CIRP, 42(2), 707-716, 1993. 5. Mroczkowski, Robert S. Electronic connector handbook : theory and applications, New York : McGraw-Hill, cl998.

VIRTUAL AESTHETIC DESIGN: ARCHITECTURE AND SOME RESULTS Weishi Li, Shuhong Xu, Gang Zhao, Yinglin Ke+ Institute of High Performance Computing, Singapore, ^Zhejiang University, P.R.China 1 Science Park Road, #01-01 The Capricorn, Singapore Science Park II, Singapore 117528 ^Dept. of Mechanical Engineering & Automation, Zhejiang University, P.R. C. 310027 [liws, xush, zhaog]@ihpc.a-star.edu.sg A novel CAVE-based approach for seamless digital aesthetic design is presented in this paper. In this approach, the virtual prototype is initially constructed using virtual sculpturing, and then refined through virtual measurement. The obtained 3D model can be visually evaluated and/or accurately analysed using finite element method. Finally, the accepted model is reconstructed using standard surface representations. Several of the key technologies, such as virtual measurement and B-Spline curve fairing, are also discussed.

1

Introduction

The market success of consumer products strongly depends on their aesthetic character, i.e. the emotional reaction the product is able to evoke, since the product functionality and quality of different companies have more and more adjusted to one other. The aesthetic design consists of two phases: conceptual design/styling and elaborated design. The application of computer in conceptual design is not very predominant in contrast to the great success of computer application in elaborated design. Physical models are still indispensable in most cases and reverse engineering is unavoidable. This results in digital gap between conceptual design and elaborated design. The gap causes great loss of data quality when the surfaces are reconstructed!!]. Consequently, closing this gap will greatly improve the design quality and reduce time consuming for product design also. The main obstacles for digital aesthetic design are how to provide stylists a intuitive design environment and how to evaluate the digital models without fabricating the physical models[l]. In this paper we present a novel architecture for seamless digital aesthetic design in CAVE. In this architecture, the whole product design process can be implemented without the aid of physical models absolutely. 2

Aesthetic Design in CAVE

Conceptual design phase of aesthetic design is a creative design process. In contrast to commercial CAD (Computer Aided Design) systems that are lack of support for creative design, Virtual Sculpturing systems provide opportunity for the stylists to work out these kind of creative designs with their familiar tools in VR (Virtual Reality) environments. Commercial Virtual Sculpturing systems, like FreeForm™ of SensAble Technologies, are available now. As shown by Riedel et al[2], Modeling in CAVE has several advantages, such as "intuitive working", "real time interaction" and "full-scale modeling" and CAVE is more suitable for product design in contrast to other VR technologies. Then Virtual Sculpturing in CAVE will give stylists a new environment for aesthetic design. The digital models derived from virtual sculpturing are constrained by triangular faceted close surfaces that are suitable to display in VR environment. But the surfaces should be reconstructed using B-Spline surface or other standard surface representation in

421

422

order to refine the designs in CAD systems. Designers that are experienced in RE (Reverse Engineering) often complain that it is very difficult to understand and recognize the design intent of stylist on a limited size 2D screen, whereas in CAVE it is not a problem any more. As in CAVE the designers can view the virtual prototypes in 3D and draft 3D curve frameworks intuitively on the full-scale model, they have more freedom to grasp the design intent of the stylist. Base on CAVE, a virtual aesthetic design workflow is shown in Figure 1. Virtual Sculpturing Here 2D sketching is one phase of JL ~ Virtual Sculpturing and not shown in Vinunl I'romlvpes Z( Visual the workflow. The virtual prototypes Evaluation 3E attained from Virtual Sculpturing Virtual Measurement systems are evaluated aesthetically JL ~ and/or using CAE (Computer Aided Refined Virtual l'rototypes j== Engineering) systems. Fraunhofer IAO[2] makes considerable success in JL ~ Conceptual Surface Reconstruction CAD data evaluation in CAVE-like Design VR environment and we will not JL — in CAVE discuss this problem in this paper. Surface Model CAE is optional and applicable for the products whose shapes are important for their function, like ships and sports CAD cars. The evaluation outcome is fed back to give the designer some clues to Refined Surlike Model ^ Engineering Design modify the initial design. The method called Virtual Measurement, which Figure 1. Virtual Aesthetic Design Workflow used in prototypes refining, is discussed in the next section. The approved prototype is reconstructed using standard surface representation. The succeeding process is the same as the current design process. For assurance the refined surface model could be transformed into triangular faceted surfaces to be evaluated in CAVE again. Surface reconstruction is a main topic in RE. What we are interested in is how to obtain the 3D curve framework for complex surface reconstruction with good quality. Firstly, the curve framework should be defined according to the designer's design intent. This is achieved as a doomed advantage of modeling in CAVE; secondly, the curves should have good quality, in other words, the curves should be fair enough. Our experience in local fairing of B-Spline curves is given in Section 4. 2D sketching and Virtual Measurement also benefit from this technique. The main advantage of this workflow is the design is implemented wholly in a digital environment and the quality loss caused by transformation between physical model and digital model is eliminated. Secondly, aesthetic design is not pure art, it should meet a wide range of constraints and goals[3], where digital model have apparent advantage over physical model. Thirdly, aesthetic design in digital environment also facilitates the stylist to turn their thought perfectly into reality. Some characteristics, like symmetry and rotate, which are very difficult to model on physical model, are very easily to achieve in digital environment. Fourthly, it could help the stylists and designers to achieve a higher level of elaboration in the early development phases for a better decision support. Lastly, the time consuming for product development could be apparently decreased to meet the requirement of the rapid changing market.

I

423

3

Virtual Measurement

The quality of the virtual prototypes should be improved, just as physical model should be polished. A method, so called virtual measurement, is applied to amend the flaws of the virtual prototypes. Firstly four B-Spline curves that form a loop and surround a surface region with bad quality are created and the two non-adjacent curves are defined over the same set of knots with full multiple knots at the beginning and end of the knot sequence. A Coon's surface interpolating the four curves, as shown in Figure 2, is created. For a vertex of the triangle meshes if a projected point could be found on the Coon's surface and its distance to the projected point is less than "••"•"•""•y^-'-^v^iV::^ the user defined distance, the vertex is inside the surface /,':\if;Vv.\'jViV»ViV:!Vi-:V:V:«V-.'vv.V-A region defined by the four curves and its parameter is set according to its projected point. Then the boundary of ' i l i ^ - s l * ! ! " ^ the surface region is extracted and the surface region is Figure 2. Coon's surface separated through trimming the surface along the created from 4 boundary boundary of the surface region. Finally, B-Spline surface is fitted to vertices of the surface region. The fitted B-Spline surface is evaluated by visual analysis and used as a new local virtual model to finish the local sampling and meshing. In the re-sampling process, the boundary points of the surface region are preserved to make sure the boundary points are in correspondence to the initial surface and virtual measuring points outside the boundary are deleted. The boundary points and the interior points are triangulated. At last the triangular mesh is merged into the initial surface. An example is illustrated in jj Figure 3 to demonstrate the I practicability of this method. All surface meshes are rendered with (a) Initial local surface (b) after local measurement Fienre -V F.xamnle of virtual measurement Gouraud-shaded triangular facets. 4

Local Fairing of Cubic B-Spline Curves

Fairness is an important indicator for industrial aesthetics and curve fairing is one of the most fundamental functions for aesthetic design. For local fairing algorithms, two questions should be answered. The first one is how to fair a point. The second one is which point should be faired. Farin et al[4] proposed a simple algorithm for "knot removal" and apply it to fair planar cubic B-Spline curves. Unfortunately, Farin's knot removal algorithm may cause the fairness of the curve to be worse than the initial curve in some cases and as a consequence the curves cannot be faired reasonably without prerequisite. As shown in Figure 4, a curve is defined by control polygon c = {d\, i = 0,1,2,3,4} and knot vector •={'o='l='2<'3<'«

= 13 = t6}. After removing a

knot f3, the new control polygon constructed by Farin's algorithm is C = {D,,J = 0,1,2,3}. But the

Figure 4. A counterexample of Farin's knot removal algorithm

424

fairness of the new curve is worse than the initial one. The least squares approximation(LSA) method performances better than Farin's algorithm. Sapidis et al[5] considered the "worst point" as point should be faired in each iteration. But the fairing iterations often stop before maturing as we tested. Being inspired by curve lofting with physical spline, we propose that, if the worst point could not be faired, the less worse point should be tried to fair. An improved fairing algorithm is obtained according to the above analysis: 1. Evaluate the internal knots; 2. Remove the worst knot using LSA method. If the new curve is fairer than the old one, go to the first step, else undo last iteration; 3. remove the less worse knot using LSA method; 4. If the new curve is fairer than (b) (a) the old one, go to the first step, else undo last iteration; 5. If no bad knot, exit, else go to the third step. Figure 5 shows a curve faired with Sapidis' algorithm, UGII V15.0 and our algorithm. The curvature of the curve is drawn as porcupines to make all wiggles and unfair regions visible. The curve shown in Figure 4(d) is fairer than the other ones. 5

(c) (d) Figure 5. Cubic B-Spline curve faired with different algorithms: (a) the initial curve; (b) faired with Sapidis' algorithm; (c) faired with UGII VI 5.0; (d) faired with our algorithm

Conculsion

A virtual aesthetic design workflow is presented in this paper. Our goal is to provide product designers with a seamless digital product design technique to improve the design quality and reduce time consuming. Two key technologies, B-Spline curve fairing and virtual measurement, are also discussed. Examples demonstrate their efficiency in improving the quality of curves or triangular faceted surfaces respectively. Reference 1. C. Werner Dankwort and Gerd Podehl. A new aesthetic design workflow - results from the European project FIORES. In P. Brunet, C. Hoffmann & D. Roller (eds.). CAD tools and algorithms for product design. Springer, 2000. 2. Riedel Oliver, Ralf Breining, Ulrich Hafner and Roland Blach. Use of immersive projection environment for engineering tasks. SIGGRAPH 1998 Course 14. 3. Westin, SH, Computer-Aided Industrial Design. Computer Graphics(1998) pp.49-52. 4. Farin G, G. Rein, N. Sapidis and A.J. Worsey, Fairing cubic B-Spline curves. Computer Aided Geometric Design 4(1987) pp.91-103. 5. Sapidis N. and G. Farin. Automatic fairing algorithm for B-Spline curves. Computer Aided Design 2(1990) pp. 121-129.

NUMERICAL SIMULATION AND EXPERIMENT ON PREDICTION FOR RETENTION FORCE DONG HONGZHI Institute of High Performance Computing, 1 Science Park Road #01-01 The Capricorn Singapore 117528 E-mail: donghz@ihpc. a-star. edu, sg LIM TONGWAH, LOW BOONHENG Molex Singapore Pte. Ltd., No.110, International Road Jurong Town Singapore 629174 FEM is an effective tool to solve the prediction of retention force in pin withdrawal from housing in design of connector assembly. The research in the paper focuses on generating finite element contact model for predicting the retention force. Under consideration of actual assembly, contact and mechanics models are analyzed as well as FEM model. And then, the calculated results are given, followed by the comparison with experimental results. Lastly, the analysis is offered to the errors in calculation so that the models can be improved in the further application.

1 Introduction A connector provides a separable interface between two subsystems of an electronic system. [1] The assembly and disassembly in pin and housing are very common in the design of connector system. Insertion force exists in assembly of pin and housing. On the contrary, retention force just keeps the pin in the housing during the disassembly of connectors. However, they are different in magnitude. This is why the research was done in the paper. Calculation of retention force is one of the important processes in the design of electronic connector assembly. It is very difficult for designers to predict its magnitude. The actual application often depends on designers' experience. And, experiments in laboratories are also very tedious. One effective method to calculate and predict the retention force is finite element analysis based on numerical simulation technology, which saves time and cost and gives designers the reference parameters. [2] 2 Setup of finite element analysis model 2.1 Analysis of contact model The analysis of retention force is a typical contact problem. For contact between a deformable body and a rigid body, the constraint associated with no penetration is implemented by transforming the degrees of

425

426

freedom of the contact node and applying a boundary condition to the normal displacement. This can be considered solving the problem: [2] aa K

cto

(1)

K

bb.

ba

where K is the system stiffness matrix, u is the nodal displacement, and / is the force vector, a represents the nodes in contact which have a local transformation, and b represents the nodes not in contact and, hence, not transformed. Of the nodes transformed, the displacement in the normal direction is then constrained such that^ 5 n is equal to the incremental normal displacement of the rigid body at the contact point. 2.2 Modeling of retention force Retention force is the friction force between pin and housing. The actual physics of friction continues to be a topic of research. Hence, the numerical modeling of the friction has been simplified to two idealistic models. The most popular friction model is the Adhesive Friction or Coulomb Friction model. This model is used for most applications with the exception of bulk forming such as forging. The Coulomb model is: [2]

afi<-non*t

(2)

Kl where, an is the normal stress, afr is the tangential (friction) stress, /j. is the friction coefficient, tis the tangential vector in the direction of the relative velocity. 2.3 Setup of FEM model The geometric model and position of housing and pin are described in Figure 1. The analysis model is considered as a symmetric one with the boundary conditions shown in Figure 1, in which the displacement on the boundary ab is fixed alone X and 7 direction, boundary be and ad are fixed along the Y direction. Retention force takes place in the course of withdrawal of pin from housing. The FEM model after mesh is shown in Figure 2.

427

Figure 1. Geometric model and boundary conditions

Figure2. FEA model

Figure 4. The first step in pin withdrawal

Figure 3. End of pin insertion

Friction force

Increment Figure 5. The second Lockbarb out of withdrawal

Figure 6. History curve of friction force at the contact node in withdrawal process

3 Computational results and comparison The basic input parameters are as follows:

428 Table 1. Basic Input Parameters

Parameters

Materials

Value

Isotropic Zenite 6130

Poisson ratio 0.25

Young's modulus 21,000Mpa

Friction coefficient 0.12

When the pin inserted the housing completely and after the first step in pin withdrawal, the friction force status is shown in Figure 3 and Figure 4. When the second Lockbarb was out of housing in withdrawal, the distribution of friction force is represented in Figure 5. What we concern is the maximum friction force on every contact node of housing. So, the Figure 6 describes the history curve of friction force at the contact nodes with calculated increment in withdrawal process. For the sake of verifying the accuracy of calculation for the further application, the experiments were done in a company laboratory. During the withdrawal of pin from housing, the measured maximum retention force is 8.319 N, the calculated one 11.010 N. And, the latter is 32.3% more. So, it is seen that the difference still exists in simulation results compared with the experiments. 4 Conclusions As we know, FEM is an effective tool to solve the problem in engineering application. The accuracy of results will depend significantly on the accuracy of FEM model in the preprocessing. From the above calculation, it can be seen that FEM application in analysis of retention force has been approved that we can use it to predict retention force as well as insertion force. However, the accuracy of material properties, geometric model and FE model is vital. In the next research, the more accurate results will be achieved by refining the mesh, especially in the contact area and setting up more exact material properties. References 1. Robert S. Mroczkowski, Electronic Connector Handbook-Theory and Applications, McGraw-Hill. (1998) pp. 1.1 2. Stephane Kugener, Simulation of the Crimping Process by Implicit and Explicit Finite Element Methods. AMP Journal of Technology 4(1995) pp. 8-15 3. MSC.Marc Manual Volume A: Theory and User Information. MSC software (2001)

FAILURE PROBABILITY OF WIRE BONDING PACKAGES F. WANG, Y. Y. WANG, C. LU

Division of Computational Mechanics, Institute of High Performance Capricorn, Singapore Science Park II, Singapore 117528. Email:

Computing, #01-01 The [email protected]

This paper presents a methodology for elastic-plastic analysis of the wire bonding packages. The methodology/model is based on the finite element method for rate sensitive materials and applicable to very large deformation processes. Nonlinear spring elements are adopted to simulate the gradual formation of intermetallic bonding layer between the ball and the electric pad. The effect of temperature increase on material property is taken into account. Ultrasonic power is simulated as displacement boundary condition. The mechanical behavior of the silicon chip during the wire bonding with the ultrasonic irradiation is thus revealed. Failure probability with reference to the stress distribution is discussed. The developed FE methodology is simple but effective to evaluate the failure probability of wire-bonding packages.

1

Introduction

In wire bonding process, there are a lot of parameters that influence the quality of the chips. It is difficult or impossible to determine the effect of all the active factors by experiment or analytical model. On the other hand, research in the area of FE analysis has produced more and more computationally efficient algorithms enabling simulations with greater detail and accuracy for electronic packages. Therefore, there is an increasing tendency to apply FEA in electronic packages for design purpose or to explore the detailed information on performances of the electronic packages [1-3]. In the present study, the advanced FEA software is used to simulate the gold wire bonding process. Since the ball has a very large deformation, it is not very accurate using general meshing technique even using a very fine mesh for the ball and the bond pad. An adaptive meshing technique is implemented to perform this elastic-plastic large deformation analysis. Special nonlinear spring elements are adopted to model the gradual formation of the intermetalic bonding between the gold ball and the pad. Detailed information of the overall stress distribution is then derived. Since cratering at the ballpad interface is the most common failure pattern of a wire bonding package observed in experiment or industrial practice, failure probability regarding to the cratering at the ballpad interface is evaluated according to the derived stress level.

2

Finite Element Model

In the present study, the target gold wire bonding package is taken from the published paper [1] to do a benchmark test of the present FEA methodology. The material properties and also the loading conditions are consistent with the data provided in [1]. The schematic diagram for the geometry of the wire bonding packages are shown in Figure 1. The substrate has length and width of 310 um, the thickness of 300 um. The radius of the circular aluminium terminal is 75 um. Thickness for the aluminium terminal is 1 um. The radius of the gold wire ball is 35 um. The thickness of the oxide file is 1 um. The

429

430

properties for the different material are listed in Table 1. The unit system is defined selfconsistently as length - (im, time - |xs, mass - |ig, force - mN and stress - GPa. Table 1. Material Properties adopted in the present study. Material Gold Capillary Al Terminal Oxide Film Si*

Young's Modulus (GPa) 68.6 313.6 70.3 100.0 166.4

Poisson's Ratio 0.44 0.23 0.345 0.224 0.26

Rate of Strain Hardening (MPa) 1459

Yield Stress (MPa) Eq.(l)

-

-

Figure 1. Schematic diagram of wire bonding process.

Taking advantages of the symmetric geometry, loading and boundary conditions, only a half of the whole structure is modelled with appropriate boundary and loading definition. The FE mesh is shown in Figure 2. The compression bonding load is proportionally varied from 0 N to 0.98 N during 1 millisecond. The 110 KHz frequency ultrasonic wave is simulated as a displacement of the tool to reach its peak value during half period of a cycle and move back to original position during the other half cycle. Adaptive meshing technique is adopted since very large deformation is expected for the gold ball. Nonlienar spring elements are used between adjacent nodes of the gold ball and the pad after the ball is pressed onto the electric pad. The spring element are only applied to those area that the intermatelic bonding occurs. The spring element are assigned an almost naught value initially and an extremely large value to simulate an rigid connection after the intermetalic bonding is assumed to form. The effect of high temperature on the material property is reflected on the strain rate dependency of the yield stress for gold ball as follows, fjy=32.7+0.057 e (MPa)

(1)

431

Figure 2 Finite Element Mesh.

3

Results and Discussion

Figure 3 gives the calculated Von Mises stress distribution of the pad, the dialect and the substrate when the gold ball is fully compressed on the pad. For comparison, the result give in [1] is also shown in Figure 3. It is see from the Figure 3 that the maximum Von Mises stress derived in the present study is 593.6 MPa. It is comparable with the results given in [1], which is 547.2 MPa. The percentage error is 7.82% of the current result. The effect of the ultrasonic power is seen from the increase of the shear stress. Figure 4 shows the transverse shear stress distribution before and after the ultrasonic power is applied. Since it is agreed by many researchers that the formation of the intemetallic layer between the ball and the pad depends on the transverse shear stress, where a larger transverse shear stress improves the intermetallic bonding. In view of this, the application of the ultrasonic power improves the intermetallic bonding since it the increases the shear stress from 177.8 MPa to 180.5 MPa.

432

The application of the ultrasonic power also increases the maximum principal stress from 227.4 MPa to 327.7 MPa (see Figure 5). The maximum principal stress is usually considered the reason to cause cratering at the ball-pad interface. Failure probability because of the ball-pad interface cratering has the following form in terms of the principal

5, Kises (AVA. c r i t . : 75%) — - +5.936G-01 • - 4-5.44.U-01 • - *4 .9478-01 M - M.452a-Gl

Result from [11 Max. 547.2 MPa

1 0

p

zzFms.

i

mj)\ .200/

-50-

B

Present study

4-3^636-01 +2.968a-01 +2.474e-01 +1.979e-01 +1.4S5a-01 *9.9Q3a-02 *4.958e-02 +-1.303e-04

/

-"V , _^.100

100-

J

1

wir«_boncUne (t=iwn, u=l) OOBi t h i r t _ l . o d b ABRQUS/Explicit. 6.2-4 Step• Stap-i rncramant 5304356s s t a p Tima = 992.7 Primary Var; S, Miaas Deformed V a n U Deformation S e a l s Facto:

100

I Oct 22 1 S I 5 6 I U 1WUJST 2002

Figure 3. Won Mises Stress Distribution in the Pad, the Dialect and the Substrate.

After the ultra sonic is applied

Before the ultra sonic is applied

I

&L

Mil

m m.

Figure 4. Shear Stress Distribution in the Pad, Dialect and the substrate.

stress,

= 1 - exp

P = \ — exp - P

t°mr

a v

m

-M

(2)

J

where p and m are the two Weibull parameters. The values of p, m, and a m for the ballpad interface are not very clear. However, it is known the product p/(am)m is proportional to the area of the sample. Suggestion was made by some researchers that am is equal to 360 MPa with the assumption that the mean fracture stress is independent of the area of

433 Before the ultra sonic is applied

After the ultra sonic is applied

mm

wirft_hondinfl

(tal UJ5S, Stap Tiira -

9

Figure 5. Maximum Principal Stress Distribution in the Pad, the Dialect and the substrate

the structure, p, m may be given the value of 0.6 and 11 respectively for the area of the 70 mm2. In the current study, the sample area is the pad-ball interface (D=35 urn). Thus, the product p/(om)m can be reduced by 1.21e-5. The reduction can be implemented by adopting the values of 7.26e-6 and 360 MPa for p and a m respectively. Different combination of values can be adopted to give the same product p/(am)m, but there is no difference in the predicted value of the cratering rate P. In the present study, the derived maximum principal stress at the pad-ball interface before and after the application of the ultrasonic power are 227.4 MPa and 327.7 MPa respectively. Substituting the parameters of p, am, and m and the maximum principal stresses into Eq. (2) gives the failure probability before and after the application of the ultrasonic power as Pl =1.06xl0~ 7 , P = 5 . 9 x l 0 - 6 respectively. So the failure probability of the wire bonding package is increased five times by application of the ultrasonic power. 4

Conclusions

Through the above analysis of the derived results, it can be concluded that a FE methodology for overall analysis of the wire bonding package has been successfully established. The derived stress distribution in the pad, dialect and the substrate are comparable with the results reported by other researchers. The function of the ultrasonic power to improve the ball-pad intermetallic bonding and to reduce the reliability of the packages has been well predicted with the present methodology. Reference T. Ikeda, N. Miyazaki, K. Kudo, K. Arita, H. Yakiyama, "Failure Estimation of Semiconductor Chip during Wire Bonding Process", Journal of Electronic Packages, Vol. 121, pp. 85-91, June, 1999. B. Chylak, S. Kumar and G. Perlgerg, "Optimizing Wire Bonding Process for 35 (im Ultra-Fine-Pitch Packages", SEMICON, Singapore 2001. Y. Takahashi, M. Inoue, "Numerical Study of Wire Bonding — Analysis of Interfacial Deformation Between Wire and Pad", Journal of Electronic Packages, Vol. 124, No. 1, pp. 27-36, March 2002.

SHAPE CONTROL OF SMART COMPOSITE PLATE STRUCTURES BASED ON ACTUATOR SHAPE OPTIMISATION QUAN NGUYEN AND LIYONG TONG School of Aerospace, Mechanical & Mechatronic Engineering, The University of Sydney, NSW, 2006, Australia. E-mail: quan @aeromech. usvd, edu. au

Bldg. J07

This paper presents a coupled alternating loop optimization system (CAWS) for shape control of smart plate structures, in which loci and the size of piezoelectric actuators as well as applied voltages are treated as design variables. CALOS is a two-stage process that consists of the linear least square (LLS) method employed to search for the voltage distribution in a given actuator configuration and the sequential linear programming (SLP) method used to optimize the shapes and loci of the actuator patches for given voltages. An illustrative example is given to validate CALOS.

1

INTRODUCTION

Shape control of a smart structure can be obtained through optimising the applied electric fields, loci and sizes of the piezoelectric actuators attached to the host structure. In this field, many researchers have focused on shape control via finding the optimal actuator electric field for matching the desired shape. In this instance, the electric field in an actuator is the design variable and the optimal value is that which seeks to minimise the difference between the actuated and desired shapes. Koconis et al1 developed analytical methods for determining the optimal values of voltages applied to fixed shaped actuators for achieving the specified shapes for sandwich plates and shells. Chee et al employed the 3rd order plate theory for mechanical deformation and the layer-wise theory for modelling the electric field in the finite element formulation. In this paper, the actuator shapes are fixed and the optimal electric fields are determined using LLS. In the majority of published research on shape control, the shape and location of an actuator are not treated as design variables. This might lead to a high-energy consumption because high voltages may be required. To achieve improved shape control with minimum energy consumption, we propose to investigate piezoelectric actuator design optimisation (PADO). In PADO, the location, shape and size of an actuator are optimised in addition to the value of the applied electric field. In this paper, a couple alternating loop optimisation system is developed to implement PADO for shape control of smart structures.

2

FINITE ELEMENT FORMULATION

In the finite element formulation, the mechanical deformations and the electric fields are modeled using the 3rd order plate theory and Layer-wise theory. For the quasi-static shape control of a smart structure, Hamilton's variational principle can be employed to develop the final system of equations, in terms of the nodal variables including both mechanical and electric quantities, as given by K

K

w II ^*

434

-Q.

435

Where us and
OPTIMISATION METHODS

Shape control can be achieved by minimising the least squared error function N

.

Lnm=Y(w'd-w'c)r

I AN • wd

an

d wc are the desired and calculated displacements

;=i

respectively and N is the total number of selected matching nodes. CALOS is a two-stage process that uses the LLS and SLP methods in an alternating and coupled fashion. 3.1

Linear Least Square (LLS) Technique3

This method minimises the least square error function between the actuated and the desired shapes. It relies on the system being linear between the voltage and displacement variables. The solution of the LLS method can be written as

«4c w Nc4Vrk} Where {(/)}, [wd] and [C] are the voltage, desired displacement and the Influence Coefficient Matrices, respectively. 3.2

Sequential Linear Programming (SLP) Technique4

The shape control can be expressed in a standard form by linearising the objective function and constraints as follows: Minimise V

(vk - xk_1)

(^ ~ number of iteration)

r)\MC

i - ( x k - x kJ ) < ( 1 +factor^? Subject to < (l-factor)wf < — 3x, ' 1

i =i N j = 1...M

(xi-xML)<xj<(xi+xML) Where x are the design variables and xML is the move limit to ensure the linear approximation. M is the number of design variables. The convergence of the optimisation process is checked by the following criterion: |(Lnmlc-Lnm][.i)/Lnmk.1| < £;, or xML< e2 where £i, e2 are convergence tolerance. 3.3

Coupled Alternating Loop Optimization System (CALOS)

In CALOS, the first stage uses a LLS method to determine the optimal voltage distribution to produce the desired shape with fixed actuator geometry. The second stage seeks to optimise the geometrical configuration by minimising Lnm with fixed applied voltage using SLP. Stage one and two are repeated until the convergence condition is achieved. The flowchart of this process is shown in Figure 1.

436

(Start Piogiaiiy ^\ / Cyh e. c k e d , \\

LLS Solver

"

Jtera tion >1 Q Return V, Itera tion = l n Nn

>

SLP Solver

Yes

<

No " "^Checked \ ^Lnm or x M U />— Yes

1

IF

End Pro grarry

^Return new Geometry N

)

Figure 1

4

NUMERICAL EXAMPLE

Consider a cantilever plate with three piezoelectric patches and clamped at its left edge. The plate has a length of 0.15 m, width of 0.06m and consists of 2 layers of thickness 0.01m. The desired shape is defined as wd(x,y)=10"5cos(10x-l). The plate material has the following stiffness constants: Cn=c22=c33=82.68GPa, c]2=Ci3=c23=27.56GPa, C44=c55=c66=26.5GPa. The FE model of this plate is composed of 10 elements. The three active patches are bonded to the plate; these are modeled using six elements. Patches #1 and #3 are bonded at both ends and patch #2 is bonded in the middle of the plate as shown in Figure 2. The patch properties are: the cu=c33=84.8GPa, c22=29.68GPa, ci2=Ci3=c23=36.35GPa, c44=C55=c66=24.2Gpa; the x„=x22=15.3xl0-9F/M, X33=15.0xl09 F/M and the d31=254pm/V, d32=-204pm/V, d33=374pm/V, d24=484pm/V, d15=584pm/V. In this example, the movement of the selected design points are assumed to be only in the x direction as shown in Figure 3. ilE

f:-:|~- !4 p : ' --sis

•<|W...^p

feH Figure 2

s „

m-,M •o*'*c-* +o+

JV;«

•O

Figure 3

The optimisation process converged at the 4th iteration. Table 1 gives the results of the voltages and Lnm at each iteration. In the 1st iteration of the 1st stage, the fixed geometry as given in Figure 2 has been used to calculate the voltage (V;) distribution, the results are shown in column 1. The 2nd stage uses V, as known and optimises the actuator shape. After the solution for the 1st iteration has converged, the process return to the 1st stage of the 2nd iteration to re-calculate Vj as given in column 2. A converged solution is reached once the difference between Lnmnew and Lnm0u is within a relative error (<10~6). It is seen that initially the total energy and shape error function is 1006.563 volts & 2.12xl0"5 respectively and the final optimum values are 892.768 volts & 8.25xl0"6. Thus CALOS

437

can reduce the total applied voltage by approximately 13.31% and the error function Lnm by 61.08% . The final geometrical shape of the structure is shown in Figure 4. Tablel: Iteration history of CALOS Iteration V1 V2 V3 Total Lnm

5

1 393.22 477.663 135.68 1006.563 2.12E-05

2 377.641 445.996 121.464 945.101 1.38E-05

3 363.607 433.006 138.257 934.87 1.07E-05

4 355.259 394.238 143.271 892.768 8.25E-06

CONCLUSIONS

This paper considers a broader problem formulation of quasi-static shape control of smart plate structures, in which both actuator configurations and applied voltages are optimised for a shape matching. As an implementation of piezoelectric actuator design optimisation (PADO), a coupled alternating loop system is proposed and then validated using an illustrate example 6

ACKNOWLEDGEMENTS

The authors are grateful to the support of Australian Research Council under the Discovery Projects grant scheme (DP0210716). REFERENCES Koconis DB, Kollar LP & Springer GS, "Shape Control of Composite Plates and Shells with Embedded Actuators. II Desired Shape Specified", J. Composite Materials, 28 (3) (1994) pp. 262-285. Chee, C , Tong, L. and Steven, G., "A Buildup Voltage Distribution (BVD) algorithm for shape control of smart plate structures", computational mechanics, Vol. 26,2000, pp 115-128. 3. Rorres, C. and Anton, H., Application for linear algebra, New York, Wiley 1979. 4. R.H. Gallagher O.C. Zienkiewicz, Optimum Structural Design; theory and applications, Chapter 7, London, New York, Wiley 1973 5. Nguyen, Q. and Tong, L., " Shape Control of Smart Composite Plates Structures With Non-Rectangular Shape PZT Actuators", Proceeding of the Third Australasian Congress on Applied Mechanics, 2002, pp 421-426

NUMERICAL INVESTIGATION OF MICRO-SCALE SHEET METAL BENDING USING LASER BEAM SCANNING Z.Q. ZHANG AND G. R. LIU Department of Mechanical Engineering, National University of Singapore 10 Kent Ridge Crescent, Singapore 119260 E-mail: engp0581 @nus.edu.sg X. M. TAN Institute of High Performance Computing, 1 Science Park Road #01-01 The Capricorn, Singapore Science Park II, Singapore 117528 E-mail: [email protected]

The numerical techniques such as the finite element method (FEM) are revolutionizing the conventional trial-and error methods in industry today. These methods have proved to be very useful in product development and process design. The parametrical numerical simulation of a laser beam scanning on a micro-scale suspension flexure has been carried out, which includes a non-linear transient indirect coupled thermal-mechanical analysis accounting for the temperature dependency of the thermal and mechanical properties of the materials. From the 3D FE simulation, relationships between the permanent bending angle of the flexure and control parameters of scanning are established and this will help to predict the bending angle of the macro sheet metal under any other scanning condition.

1

Introduction

Laser techniques has been used widely in a serial of engineering processes, such as cutting of complex shapes, drilling on curved surfaces, surface treatment, and the welding of dissimilar metals. The rapid, flexible, lowcost and precise laser forming technology has also attracted considerable attention in the sheet metal forming in recent years [1, 2, 3, 4]. The laser forming process is a thermal-mechanical coupling process. It utilizes the thermal stress induced by laser irradiation to form structural elements into different shapes. High-powered laser irradiation yields high temperature gradients between the irradiated surface and the neighboring material. Non-uniform thermal stresses occur due to the temperature distribution. The material deforms plastically once the thermal stresses exceed the yield point of the material. In a precision device, a small angular distortion of the micro-scale sheet metal inside is unacceptable. To solve this kind of problems, the laser bending technology has more advantages than traditional mechanical adjustment methods. It can achieve the desired accuracy within a short

438

439

time, which leads to low cost. Also, the laser bending technology is a noncontact technology; this will satisfy some special technology requirements. Instead of trial and error experiments, the numerical simulation method using FEM is applied to find out solutions such as heating patterns and control parameters in the process etc. The three-dimensional FE model built up for the micro-scale stainless plate is illustrated in Figure 1. The width of the model is 2.5mm and length of the model is 11mm. The thickness of the model is 0.025mm. The objective of laser heating operation is to adjust the bending angle within 0°~1°. With enough simulation cases being performed, the relationships between bending angle 0 and heating parameters are established, which will be much helpful for practical operations.

ftH

^

•^.Scanning H area

^ (a) isometric view

(b) top view

Figure 1 Illustration of the micro device for adjustment

2. Solution equations Laser-bending process is a thermal-mechanical coupling process. To find out the time-dependent temperature distribution, three assumptions have been used to simplify the general equation of the three-dimensional transient temperature: the laser bending is being performed under the melting temperature of the material; the convection term is much smaller than the heat-transfer terms, so it can be neglected;the material is isotropic. Using Galerkin method, the controlling equation for calculation of temperatures and their dependency on time can be described as follow: [C(D] {7(f)} + [K(T)] {T(t)} + {Q(t)} = 0

where, C(T) denotes the temperature-dependent specific-heat matrix, K(T) denotes the temperature-dependent conductivity matrix, Q(T) is the heat flux vector and T(t) and f(t) are the time dependent nodal temperature and the time derivative of the nodal temperature vector, respectively. This

440

equation can be solved using the Newton-Raphson procedure and the Newmark integration method. On the other hand, equations of thermal stresses and strains can be described by [M(T)]{U(t)} + [C{T)]{u(t)} + [K(T)]{u(t)} + {F(t)} + {F'\t)} = 0

where M(T), C(T) and K(T) are the temperature-dependent mass, the damping and the stiffness matrices, respectively. F(t) is the external load vector and F'"(t) is the temperature load vector. {u(t)}, {u(t)} and {ii(t)} are the displacement, velocity and acceleration vectors, respectively. 3. Finite-element Simulation of the Laser-bending Process The laser beam heat input is modeled as a moving heat source over the surface of the micro-scale sheet metal. The distribution of the heat input is given in the form of the thermal flux density which obeys a normal distribution as follows [4]: 1=—rexp(

r

)

where I is the thermal flux density of the laser beam, a is the absorption coefficient of the sheet metal surface, P is the laser beam power, rb is the laser beam radius and r is the distance from the center of the laser beam. So the mean thermal flux density within the area of the laser beam scanning on the sheet metal surface is: /„ = — I I(2m)dr = — I — r e x p ( ml * nrl * mb2

r)rdr = rb2

— mb2

AISI type 304 stainless steel has been used for the micro-scale suspension flexure. Since the temperature-dependent material properties are important for the accurate calculation of a temperature distribution, the material properties of the specified stainless steel with temperature effect are taken from references [5, 6,7]. When temperature varies from 20° to 1000°, Density of the stainless steel changes from 7.9g/mm3 to 7.49g/mm3 while the thermal conductivity increases from 14.6w/(mK) to 25.2w/(m-K) and the specific heat value shift from 470 J/(kg-K) to 675 J/(kg-K). On the other hand, temperature-dependent mechanical properties will affect the mechanical simulation results seriously. The temperature effect on thermal expansion coefficient, Young's Modulus and yield stress etc. of the stainless steel are also taken into consideration. When temperature increases from 20° to 1000°, Young's Modulus decreases from 193Gpa to

441

128GPa, thermal expansion coefficient changes from 17|im/(m-0C) to 20fxm/(m-°C) and yield strength decreases from 410MPa to 66MPa. In this simulation, Newton-Raphson method is used to avoid the possible divergence. 4. Parametric Study of the Laser Bending Process by FEM The effects of parameters including laser input power (P) & laser scanning time (t) on angular deformation caused by a laser heating process are investigated by this FEM simulation. When a laser beam irradiates on the model surface, the temperature of the irradiated area increases rapidly within very short time. A sharp temperature gradient is established in the sheet metal along thickness. The thermal-mechanical response of the sheet metal is depicted qualitatively in Figure 2, which is explained as follows.

I (a) Temperature distribution

(b) Displacement distribution

Figure 2 Simulation results from both thermal and structural analysis

During the heating period, compressive stresses arise because of the thermal expansion of the heated zone and the bulk constraint of the

Power (w)

0.024

0.76

(a) Bending angle changing with power

(b) Bending angle changing with angle and time

Figure 3 Bending angles due to different scanning conditions

442

materials surrounding the heated area. Plastic deformation occurs in some high temperature zones as the yield strength decreases and the thermal expansion coefficient increases with the increasing temperature. As a result, thermal expansion of the materials causes the target to bend away from the laser beam after cooling down. With a lot of simulation results collected, the relationship between heating parameters and bending angle are established. Figure 3 shows the effect of the laser power and scanning time on the final bending angle both in curve and surface formats. 5. Conclusions This study presents numerical simulation methods on micro-scale sheet metal bending using laser scan beam. With consideration of the timedependent material properties, simulations of a specified stainless steel suspension flexure inside a micro device have been performed to find out the bending angle due to various scanning conditions of the laser beam. With relationships between bending angle and scanning conditions, the angle due to any scanning condition can be easily predicted from the curves or surfaces. References 1. Vollertsen F., Geiger M. and Li W.M., FDM-and FEM-simulation of laser forming: a comparative study. In: Advanced Technology of Plasticity III, Ed. by Z.R. Wang, Y. He (1993) pp. 1793-1798. 2. Kermanidis Th. B., Kyrsanidi An. K. and Pantelakis Sp. G., Numerical simulation of the laser forming process. In: metallic plates, Proc. 3rd Int. Conf. on Surface Treatment '97, (Oxford, UK, 1997) pp. 307-316. 3. W. Li, M. Geiger, F. Vollertsen, Study on laser bending of metal sheets, Journal of Lasers 25(9) (1998) pp. 859-864. 4. Hu Z., Labudovic M., Wang H. and Kovacevic R., Computer Simulation and Experimental Investigation of Sheet Metal Bending Using Laser Beam Scanning. Int. J. Machine Tools & Manufacture 41 (2001) pp. 589-607. 5. Harvey P.D. (Ed.), Engineering Properties of Steel American Society for Metals. (Metals Park, OHIO 44073, 1982). 6. Brandes E.A. and Brook G.B. (Eds.), Smithells Metals Reference Book seventh ed. (Reed Educational and Professional Publishing Ltd, 1998.) 7. Davis J.R. (Ed.), Stainless steels, ASM specialty handbook. (Ohio: ASM International, 1994).

THREE-DIMENSIONAL FINITE ELEMENT STUDY OF THE ELASTIC FIELDS IN QUANTUM DOT STRUCTURES Q.X. PEI AND C. LU Institute of High Performance Computing, 1 Science Park Road, Singapore E-mail: [email protected]

117528

The elastic fields in the self-organized quantum dot (QD) structures are investigated in details by three-dimensional finite element analysis for an array of lens shaped QDs. Emphasis is placed on the effect of elastic anisotropy of the materials. It is found that the elastic anisotropy strongly influences the distributions of strain, stress, and strain energy density in the QD structures. By changing the elastic anisotropy ratio and the cap layer thickness, substantially different distributions of strain energy minima on the cap layer surface are obtained, which may result in various QDs ordering phenomena such as vertical alignment, partial alignment or complete misalignment.

1

Introduction

Quantum dots (QDs) have drawn great attention due to its potential application in the fabrication of a wide variety of novel optoelectronic and microelectronic devices, such as light emitting diodes, photovoltaic cells, and quantum semiconductor lasers [1]. Selfassembled QDs can be grown layer by layer to form ordered nanostructures via the Stranski-Krastanow growth mode that consists of three-dimensional (3D) inlands growth on a two-dimensional (2D) wetting layer. It is well understood that such self-alignment is due to long-range elastic fields induced by the misfit strain between the QDs and the substrate [1,2]. It is also well known that the elastic fields produced by the QDs substantially modify the electronic band structure and so strongly affects the performance of the electronic devices [3]. Hence, the elastic fields in and around the QDs have to be studied in order to obtain a well ordered QDs structure and improve the performance of the electronic devices. The elastic fields in and around the QDs can be analyzed with the atomistic approach, the analytical continuum approach, and the finite element (FE) approach [4]. Comparing to the other two approaches, the FE technique is more powerful and can be used for structures of any geometry shape. In this paper, we report on a three-dimensional FE calculation of the elastic fields induced by an array of lens shaped QDs with wetting layer, which are submerged in a semi-infinite half space. We present a detailed study of materials anisotropy effects on the elastic fields in the multiple QDs system and on the vertical ordering of the QDs. 2

Analysis Method and Conditions

A schematic of the lens shaped QD array is shown in Fig. 1, assuming the QDs are distributed uniformly. Due to symmetry of the structure we only analyze the central square area surrounded by the heavy dotted lines, which covers one complete QD and four quarter QDs. In the model, the distance ( D ) between the two side QDs is taken to be 45 nm, while the thickness of the wetting layer ( WL ) is taken to be 1 nm. The base diameter ( d) of the lens shaped QD is taken to be 24 nm with its height ( h ) being 6 nm. The 3D finite element model of the QDs structure is constructed and analyzed with MSC/MARC,

443

444

1

Cap /

"»

•>

/ » / * > y+~

-> / .//.

i >-•

'

WL"

Sub

l0-1/ X

Figure 1. Schematic of the array of lens shaped quantum dots.

a commercial FE code. The QD lattice constant ad is taken to be bigger than the matrix lattice constant a with the lattice misfit £ n = ( a . — a )/a =0.04. This lattice mismatch is modeled by employing a pseudo thermal expansion of the QDs. For materials of cubic crystals, the degree of elastic anisotropy is usually characterized by the anisotropy ratio A = 2C44 /(C,, — C12) with A = 1.0 being isotropic elasticity. It is roughly equal to the ratio between the values of Young's modulus along the <111> and the <100> directions [5]. The A values for some semiconductor materials are: PbTe 0.27, PbSe 0.29, PbS 0.51, Si 1.56, Ge 1.64, GaAs 1.83, InAs 2.08, ZnTe 2.04, and ZnS 2.53. To cover most of the semiconductor materials, A values are taken in the range of 0.25 to 4.0 in our investigation. The elastic constants C„= 150, Cl2= 50, Cu= 50 GPa are used for the elastic isotropy case A = 1.0, while for each of the four elastic anisotropy cases A = 0.25, 0.5, 2.0, and 4.0, C„ and C12 are kept unchanged with Cu adjusted to make A equal to the corresponding value. Calculations are also carried out for different cap layer thickness with the ratio of cap layer thickness to dot height, H/h, being 2.0, 3.0,4.0, 5.0, and 6.0 respectively. 3

Results and Discussion

We first analyze the calculation results with the cap layer thickness H/h = 2.0. Fig. 2(c) shows the contour plot of the strain 8xx distribution in the Y = 0 plane for the case of elastic isotropy A = 1. Due to the lattice parameter of the QD and the wetting layer being larger than that of the matrix, it can be seen that negative 8xx, i.e. compressive strain, occurs in the QD and the wetting layer. In the matrix, positive £xx, i.e. tensile strain, occurs in the regions directly below and above the QD, while small compressive strain exists in the regions near the QD corners. The influence of elastic anisotropy on exx distributions in the Y = 0 plane is shown in Figs. 2(a), 2(b), 2(d), and 2(e) for the cases with the elastic anisotropy ratio A = 0.25, 0.5, 2.0, and 4.0 respectively. Significant difference of exx distributions can be observed in these maps. In the QD, the magnitude of 8xx contour increases with A changing from 1.0

445

Figure 2. The strain £xx distributions in the Y = 0 plane for different values of anisotropy ratio A.

to 4.0, while it decreases with A changing from 1.0 to 0.25. In the matrix around the QD corners, the exx contour shapes are more horizontally narrowed with A increasing from 1.0 to 4.0, while they become more horizontally elongated with A decreasing from 1.0 to 0.25. This clearly show that when A > 1, the [100] and the [TOO] directions are the elastically soft directions and thus the strain £xx decays rapidly in these directions. When A < 1, the [100] and the [TOO] directions are elastically hard directions and thus the strained region extends further away along these directions. Figure 3(c) shows the distributions of strain energy density on the cap layer surface for the case of elastic isotropy A = 1 and the cap layer thickness Hlh = 2.0. It can be seen that there are pronounced energy minima at positions directly above the buried QDs. These energy minima may make the QDs of next layer to nucleate there preferentially, which result in vertical alignment of newly formed QDs with the buried QDs. It can also be seen in Fig. 3(c) that small satellite energy minima exist in the midway between the buried QDs. Figs 3(a), 3(b), 3(d), and 3(e) show the distributions of strain energy density on the cap layer surface for the anisotropy cases A = 0.25, 0.5, 2.0, and 4.0 respectively. As A increases from 1.0 to 2.0 and further to 4.0, these satellite minima have developed into local minima as seen in Figs 3(d) and 3(e), which may lead to additional QDs formation in the next layer, and thus some newly formed QDs may be misaligned vertically with the buried QDs. However, as A reduces from 1.0 to 0.5 and further to 0.25, the satellite minima gradually disappear and only the pronounced local minima at the top of the QDs remain as seen in Figs. 3(b) and 3(a), which may result in a fully vertically aligned QD structure. When the cap layer thickness is increased to Hlh = 3.0, the strain energy distributions on the cap layer surface are shown in Figs. 4(a)-4(e). For the elastic isotropy case A = 1.0, it can be seen in Fig. 4(c) that besides the pronounced local energy minima at positions directly above the QDs, there are some satellite minima at positions between the QDs. As A increases from 1.0 to 2.0, these satellite minima develop into pronounced local minima as seen in Fig. 4(d), which may result in a partially misaligned QD structure. As A increases further to 4.0, the original pronounced local minima above the QDs disappear and only the local minima between the QDs remain as seen in Fig. 4(e). In this situation, a totally misaligned structure may be formed. As A reduces from 1.0 to 0.5 and further to 0.25, it can be seen from Figs. 4(b) and 4(a) that the satellite minima disappear and only the pronounced local minima at the top of the QDs remain, which may result in a vertically aligned QDs structure. The strain energy distributions on the cap layer surface for the other cap layer thicknesses are also obtained in this study. The calculation results show that the elastic anisotropy and the cap layer thickness greatly influence the energy distribution on the cap layer surface, which may result in different QDs ordering structures.

446

Figure 3. The influence of the elastic anisotropy on the distributions of strain energy density (in GPa) at cap layer surface for the cap layer thickness H/h=2.0.

Figure 4. The influence of the elastic anisotropy on the distributions of strain energy density (in GPa) at cap layer surface for the cap layer thickness H/h=3.0.

4

Conclusions

The three-dimensional finite element approach is used to calculate the elastic fields induced by an array of lens shape QDs. The effects of elastic anisotropy of the materials on the elastic fields are investigated in details. It is found that the elastic anisotropy has significant influence on the elastic fields. Therefore, in calculating the elastic fields of QDs structure, the isotropy approximation should not be used, especially when the material exhibits strong anisotropy. It is also found that the elastic anisotropy and the cap layer thickness have strong influence on the distribution of the energy minima at the cap layer surface. Thus, various QDs ordering structures such as vertical alignment, partial alignment or complete misalignment may be obtained with the changing of the materials anisotropy and the cap layer thickness.

References 1. Shchukin V.A. and Bimberg D., Spontaneous ordering of nanostmctures on crystal surfaces, Review of Modern Phys. 71 (1999) pp.1125-1171. 2. Liu P., Zhang Y.W., and Lu C , Self-organized growth of three-dimensional quantum-dot supperlattices, Appl. Phys. Lett. 80 (2002) pp.3910-3912. 3. Schmidt O.G., Eberl K., and Rau Y., Strain and band-edge alignment in single and multiple layers of self-assembled Ge/Si and GeSi/Si islands, Phys. Rev. B 62 (2000) pp. 16715-16720. 4. Liu G.R. and Jerry Q. S., A finite element study of the stress and strain fields of InAs quantum dots embedded in GaAs, Semicond. Sci. Technol. 17 (2002) pp.630-642. 5. Holy V., Springholz G., Pinczolits M., and Bauer G., Strain Induced Vertical and Lateral Correlations in Quantum Dot Superlattices, Phys. Rev. Lett. 83 (1999) pp.356-359.

DIRECTIONAL DEPENDENCE OF SURFACE MORPHOLOGICAL EVOLUTION OF HETEROEPITAXIAL FILMS P. LIU 1 , Y.W. ZHANG 2 , C. LU 1 Institute of High-Performance Computing, Singapore E-mail: liuping @ ihyc-a-star. edu. ss, luchun @ ihpc. as tar, edu. sg 2

Department

of Materials Science and Institute of Materials Research and Engineering, University of Singapore, Singapore E-mail: [email protected]

National

A three-dimensional continuum method is used to simulate the surface morphological evolution of a heteroepitaxially strained film. In the formulation, the film surface evolves through surface diffusion driven by the gradient of the surface chemical potential, which includes the elastic strain energy, elastic anisotropy and surface energy. Our simulations reveal that the elastic anisotropy strength effects markedly the self-assembly of quantum dots. In addition, it is shown that the island alignment, the island spacing and the island size are related to the elastic anisotropy strength.

1

Introduction

During heteroepitaxial growth, a film may undergo a growth mode transition, that is, from a layer-by-layer growth mode to a threedimensional growth mode[l]. Through such a transition, the film forms a rippled structure, which eventually breaks up into islands. Since these islands are normally dislocation-free, the self-assembly process may be used to fabricate quantum dot arrays, which have many potential applications in microelectronic and optoelectronic devices. The performance of these devices requires a uniform and regular arrangement of quantum dot arrays. Although many attempts have been made to grow a uniform and regular array of quantum dots through self-assembly[2-6], so far there are no reliable procedures to do so. The surface roughness and subsequent island formation are caused by the competition between the strain energy and surface energy of the system. During the surface evolution, the total strain energy decreases while the total surface energy increases. The first-order perturbation has been carried out to analyze the critical condition of strain-induced surface roughening for both elastic isotropic films[7-10] and elastic anisotropic films [11,12]. These analyses have shown that for a perturbed wavelength X, if X > A,c, where Ac is the critical wavelength, the strain energy will dominate the process, and therefore the island formation becomes

447

448

energetically favourable; while if X < Xc, the surface energy will dominate the process, and therefore the surface will remain flat. Of particular interest is the film having elastic anisotropy. In this scenario, the critical wavelength depends not only on the surface orientation [12], but also on the elastic anisotropy strength. Therefore, tuning the elastic anisotropy of the film may change the dot spacing and alignment, providing another degree of freedom to manipulate the self-assembly of quantum dot growth. In this paper, the effect of the elastic anisotropy on the surface evolution is examined. Our attention is focused on the dependence of island selfassembly, i.e., the island alignment and island spacing on the elastic anisotropy strength. 2

Formulation

Consider an elastically anisotropic thin film with initial thickness hf and lattice spacing

af

heteroepitaxially grown on a thick elastically

anisotropic substrate with lattice spacing as. The mismatch strain is defined as e0 = \af — as )las. Furthermore, it is assumed that the film and substrate have the same elastic properties. The surface chemical potential can be written as

z=Zo+n(a>-*r)

(!)

where %0 is the chemical potential of the bulk material, Q is the atomic volume of the diffusive atom, G) = <Jij£iJ/2is the strain energy density, K is the mean curvature, and y is the film surface energy, which is assumed to be isotropic. A linear elastic relation between the stress and strain is assumed, i.e., O^ = C^Sy, where Cijkl is the component of the elastic modulus tensor, o.t. is the component of stress tensor, and £tj is the component of the strain tensor. In the present simulation, the annealing process is assumed. Based on the conservation of mass, the surface evolution equation can be written as vn=DVs2Z (2) where v„ is the normal velocity, D = DsSjkBT, Ds is the surface diffusion coefficient, Ss is the diffusive layer thickness, kB is the

449

Boltzmann constant, and Tis the absolute temperature. Eq. 2 can be written in the following weak form \vnSvndA= \DVs2%SvndA (3) s s where integration is over the film surface. By assuming a symmetrical condition and applying the surface divergence theorem, Eq.3 can be rewritten as

\vn8vndA=\D%Vs\dvn)dA S

(4)

5

Since the above weak form is very stiff due to the term K in the chemical potential %, a semi-implicit Euler scheme is introduced to integrate the above equation[ 13]. For an elastic isotropic crystal, i.e., A — 1, there are only two independent elastic constants: the elastic modulus E and the Poisson's ratio v. For the FCC or BCC crystals, there are three independent elastic constants in the reference coordinate system, namely, Cu, Cn and C^. The elastic property can also be expressed in the elastic modulus E, the Poisson's ratio v and the elastic anisotropy strength A. The two sets of elastic constants have the following relationships: 2 E = (c* + CuCn -2C, 2)/(Cn + C12), v = Cn/{CU + C12), and A = 2C44/(C11 - C12). The strain energy density in the initially flat film is ^o

=

V^ii + ^11^12 ~^n)£o/^n

•

The calculation procedures are as follows: suppose the shape of the stress free reference at time t is known; the deformation and diffusion occurring at a subsequent infinitesimal time interval At will be determined. Firstly a finite element method is used to calculate the strain and stress along the film surface, then the strain energy density can be obtained. Secondly, according to the surface geometry, the surface curvatures can be obtained. Thirdly, a finite element method is used to determine the velocity of the surface in the reference configuration at time t. Finally the change in the shape of the surface during the time interval At is then deduced; the procedure is repeated and the shape of the surface is calculated as a function of time. In the calculations, the following scheme of normalizations is used: co* = a)/co0, I* = lco0/y, t = ?^1D((W0/^)4, where / is the length scale and t is the time scale.

450

3

Results and Discussions

We will examine the effect of elastic anisotropy on the surface roughening and island formation. We will vary elastic anisotropy strength A, while keeping the elastic modulus E and the Poisson's ratio v fixed. All the simulations start from a same random surface. The unperturbed film surface is the {100} surfaces. 3.1 A=l The simulation results for A=1.0, i.e., the isotropic case, have been extensively reported[14]. It was shown that at the initial stage, the surface evolves into random ripples. Subsequently the ripples break up into islands, which are randomly distributed. Due to the elastic interaction, the islands are able to self-organize to a certain extent. Thereafter, the islands undergo ripening. As the ripening process proceeds, the larger islands expand under the expense of smaller islands. 3.2 A>1 For A=2.0, our simulation showed that the ripples are formed markedly along the <100> directions. Subsequently the ripples break up into islands, which are also aligned along the <100> directions. The island array is more uniform and regular than the isotropic case. The island spacing and island size are increased with the increase of A. Similarly, the island array also undergoes ripening. For the case with a stronger elastic anisotropy strength, A=4.0, the island formation is shown in Fig. l(a)(b)(c)(d). At the initial stage, the ripples are predominately along the <100> directions as shown Fig. 1(a). Although the ripples break up into islands as shown Fig. 1(b) and (c), which are similar to the previous cases, however, in this case, the island array is remarkably uniform and regular as shown in Fig. 1(d). The island spacing and size are further increased. Unfortunately, the island array also undergoes ripening.

451

(d)

W

Figure 1 The surface evolution with A=4. (a) the surface develops into ripples, which are predominately along the <100> directions; (b) and (c) the ripples break up into islands, which are predominately aligned along the <100> directions, and (d) the islands self-organize into a fairly uniform and regular array.

(a)

'

(b)

Figure 2 The surface evolution with A=0.5. (a) the surface develops into ripples, which are predominately along the <110> directions; (b) the ripples break up into islands, which are predominately aligned along the <110> directions, and (c) the islands self-organize into a fairly uniform and regular array, a dislocation-like defect can be seen, and (d) the islands undergo ripening.

3.3 A<1 For A=0.75, the simulation results showed that the ripples are strongly aligned along the <110> directions. Subsequently, the ripples break up into islands, which adopt a fairly uniform and regular array. The island array is also strongly aligned along the <110> directions. The island

452

spacing and size are decreased as compared to the isotropic case. This island array still undergoes ripening. A surface evolution process for A=0.5 is shown in Fig.2(a)(b)(c)(d). These results show a similarity with A=0.75. But the alignment along the <110> directions is stronger as shown in Fig.2(a)(b). The islands adopt an almost uniform and regular array except for some regions where exhibit dislocation-like defects as indicated in Fig.2(c). The island spacing and size become even smaller. Still the island array undergoes ripening as shown in Fig.2(d). In summary, it is clearly shown that the elastic anisotropy strength has markedly effects on the surface roughening and island morphology. When A>1, with the increase of elastic anisotropy, the surface ripples and formed islands become increasingly aligned along the <100> directions. The island arrays become increasingly uniform and regular. The island size and the averaged island spacing are increased. For the cases of A<1, with the decrease of elastic anisotropy strength, the surface ripples and formed islands become increasingly aligned along the <110> directions. The island arrays become increasingly uniform and regular. The island size and the averaged island spacing, however, are gradually decreased. In all of these cases, the island arrays undergo ripening. 4

Conclusions

We have used a three-dimensional finite element method to investigate the effect of elastic anisotropy on the island formation. It is shown that when the elastic anisotropy strength is greater than one, with the increase of elastic anisotropy, the ripples and islands are becoming increasingly along the <100> directions, the ability of self-assembly of these islands is increased, and the island size and island spacing are increased; whereas when the elastic anisotropy strength is smaller than one, with the decrease of the anisotropy strength, the ripples and islands are becoming increasingly along the <110> directions, the ability of self-assembly of these islands is also increased, but the island size and island spacing are decreased.

453

References 1. J.Y. Tsao, Materials Fundamentals of Molecular Beam Epitaxy. Academic Press, Boston, MA, (1993). 2. J.A. Floro, G.A. Lucadamo, E. Chason, L.B. Freund, M. Sinclair, R.D. Twesten, and R.Q. Hwang, Phys. Rev. Lett. 80,4717(1998). 3. P. Sutter and M.G. Lagally, Phys. Rev. Lett. 84,4637 (2000). 4. R.M. Trop, F.M. Ross and M.C. Reuter, Phys. Rev Lett. 84, 4641 (2000). 5. G. Springholz, V. Holy, M. Pinczolits, and G. Bauer, Science, 282, 734 (1998). 6. X. Deng, J.D. Weil, and M. Krishnamurthy, Phys. Rev. Lett. 80, 4721 (1998). 7. R.J. Asaro and W.A. Tiller, Metall Trans. 3, 1789 (1972). 8. M.A. Grinfeld, Sov. Phys. Dokl. 31, 831 (1986). 9. D. J. Srolovitz, Acta metal. 37, 621 (1989). 10. L.B. Freund and F. Jonsdottir, J. Mech. Phys. Solids, 41, 1245 (1993). 11. H. Gao, Modern theory of anisotropic elasticity and applications, Edited by J.J. Wu, T.C.T. Ting and D.M. Barnett, SIAM, Philadelphia, 139 (1991). 12. Y. Obayashi and K. Shintani, J. Appl. Phys. 84, 3141 (1998). 13. Y.W. Zhang, A.F. Bower, L. Xia, and C.F. Shih, J. Mech. Phys. Solids, 47, 173 (1999). 14. A.G. Cullis, D.J. Robbins, A.J. Pidduck.and P.W. Smith, J. Cryst. Growth, 123, 333 (1992). 15. C.S. Ozkan, W.D. Nix, and H. Gao, Appl. Phys. Lett. 70, 2247 (1997). 16. Y.-W. Mo, D.E. Savage, B.S. Swartzentruber, and M.G. Lagally, Phys. Rev. Lett. 65, 1020 (1990).

FORMING OF NANOSTRUCTURED MATERIALS: NUMERICAL ANALYSIS IN E Q U A L C H A N N E L A N G U L A R E X T R U S I O N OF M A G N E S I U M , ALUMINIUM AND TITANIUM ALLOYS

B.H. HU AND J.V. KREIJ Singapore Institute of Manufacturing Technology (SIMTech), 71 Nanyang Drive, Singapore 638075 E-mail: [email protected] Equal channel angular extrusion (ECAE) is a promising technique for producing ultra-fine grained (UFG) or nanostructured materials based on the principle of simple shearing. Through analysis, it is shown that only the geometrical factor , namely, the half-angle of the two intersecting channels, and the number of ECAE passes, N, affect the effective strain. The equivalent linear reduction ratio, ro/ri, is derived to describe the size reduction effect of an object such as a grain. The most effective intersecting angle (2) is 90°. Compared to traditional area reduction extrusion, the deformation effect is equivalent to an area reduction ratio of 1 million or a linear reduction ratio of 1022 after 12 passes of ECAE. Magnesium AZ31B, aluminium 6061 and pure Titanium were used for the study. Three types of die designs for ECAE of each alloy were proposed and numerically analysed. The effective strain, von mises stress, equivalent area reduction ratio and equivalent linear reduction ratio were compared for the three types of die designs based on the simulation results using ANS YS/LYDYNA. The parameter N,i_,„m, namely, the number of passes of ECAE required to reduce lOOu. structure into lOOnm structure, was calculated for each design. A grain size of lOOu, can be deformed into nanostructure through as few as 12-17 passes of ECAE.

1

Introduction

The research and development in nanotechnology has attracted tremendous attention world-wide[l]. One of the important aspects of nanotechnology is the development and application of nanostructured bulk materials, as this type of materials is visible, practically-usable and engineerable. Three-dimensional nanostructures are important to manufacturing industries, when high or supper mechanical properties are required. This is extremely so where light-weight structural metals such as magnesium, aluminium and titanium are used. Equal channel angular extrusion (ECAE), also named as equal channel angular pressing (ECAP), is one of the most promising techniques that can produce ultra-fine grained (UFG) or nanostructured materials. It is claimed that lOOnm or even smaller grain sizes can be produced. Some achievements have been reported by researchers from USA, Korea, Russia, etc. [2-6]. However, information or in-depth analysis on die design and the related specific flow and deformation behaviour has not reported. As part of the effort of an on-going project in casting and forming nanostructured light alloys, ECAE is being evaluated and used in SIMTech in forming of nanostructured light alloys such as magnesium, aluminium and titanium. 2

The effective strain in an ECAE process

An ECAE die consists of two channels that have an equal square cross-section meeting at a sharp bend, intersecting at an angle of 2<5, as shown in Fig. 1. A billet, for example, a metal ingot, is placed in the top channel or the bottom channel. The billet is then forced by a press into another channel and undergoes a simple shear process in a thin layer at the

454

455

cross plane of the channels. Heavy uniform deformation can be imposed throughout massive billets without a change in cross section [3-5]. Given p is the punch pressure, o 0 the yield stress, yxy the shear strain and e the effective strain, to conduct a complete simple shearing process, the p should be at least the following equation, namely E q . ( l ) [3].

Fig-1 Sketch of ECAE[3]

The simple shear process can be repeated many times by putting back the previously extruded material to the equal channels for the next pass extrusion. The total shear strain (yxy) after N passes is given by Eq (2)[4-6]. Based on the von Mises' yield criterion [6-7], the effective strain, e, after N passes of ECAE, will be given by Eq (3). When (£ is 45°, the biggest e can be achieved. p = -^o-0ctg«>

(1)

Yxy=2NctgO

(2)

e = ^ctgO>

(3)

If comparing the effectiveness in mechanical deformation of ECAE to the conventional forward extrusion, the equivalent area reduction ratio (Ao/AO deduced from the total effective strain will be given by Eq (4), where Ao and A] represents the inlet and the outlet cross-section areas in conventional forward extrusion. If r0 and r! are the respective edge lengths (for a squared cross section) or the diameter (for a circled cross section) of the inlet and the outlet of the conventional extrusion die channels, the Eq (4) can be converted into an "equivalent linear reduction ratio", namely, ro/r1; shown in Eq (5). It can be used to describe the linear reduction effect of an object, such as the size of a grain.

A

o

( ctg0)

_0- = e A l

f

V3

'o K (4)

(

Jrctga))

J U _0.=eV3 r l V Al

(5)

It can be seen that only the geometrical factor O, namely the half-angle of the two intersecting channels, and the numbers of ECAE passes, N, affect the effective strain. The punch pressure is only determined by O and the yield stress of the material (o 0 ). The most effective intersecting angle (20) is 90°, namely a
3

Conceptual design of ECAE dies and numerical analysis

The design of ECAE has been so far considered as classified information and is not well known or understood by the public. It is interpreted as that the technology or the mechanism regarding the die design and the plastic deformation of ECAE has not been fully developed yet. In order to understand the behaviour of the plastic deformation

456

before cutting a physical metal die, some conceptual die designs were proposed and numerically analysed. This is to minimise the development time and cost. A commercial FEM software ANSYS/LS-DYNA was used numerical simulation. The alloys used for calculation were magnesium AZ31B, Aluminium 6061 and the commercially pure (CP) titanium. The yield stress used for numerical simulation is 229 MPa for AZ31B, 55MPa for 6061 and 280MPa for CP-Ti. The elastic modulus used is 45 GPa for AZ31, 63GPa for 6061 and 118GPa for CP-Ti. The Poisson's ratio is 0.30 for AZ3 IB, 0.35 for 6061 and 0.36 for CP-Ti. Three types of ECAE die designs based on the most effective intersecting angle (20) were numerically analysed. The Design 1 has 90° sharp corners at the intersecting of the both channels. The inner diameter of the channels for this study is 15mm. The blank billet used for the numerical analysis is 15mmxl5mm in the cross section area and 50mm in length. Fig. 2 is the simulation result on the effective strain of the AZ31B specimen during the first pass of ECAE at a stage of 60% extruded. It indicates that the largest strain exists in the corner region, where the simple shear occurs. The effective strain along the shear plane is about 1.10, which is converted to an area reduction ratio of 3 or a linear reduction ratio of 1.73. This value is very close to the theoretical value of the simple shear process. The simulation also shows the stress at ,

.

•

, • ,

-,™*m

, • .

,

,

^' A1 ? ,, ^H?'1™

stral

°

of

ECAE of AZ3IB (Design 1)

the sharp corner is as high as 390MPa, which may lead to potential problems such as worn-die, cracks and/or entrapment of "dead' materials at the intersecting corners, etc.. To overcome these potential problems, fillets were added respectively to the inner and outer corners of the ECAE die to form a new die design, namely Design 2. The blank metal billet is still 50x50 in cross section and 50mm in length. The radii of the fillets for the inner and outer corners are 7.5mm and 22.5 mm respectively. Fig. 3 is the simulation result on the effective strain of the ^,7=™™^" iS;*s AZ31B specimen during the first pass of ECAE at a stage of 60% extruded based on Design 2. It indicates that the largest strain still exists in the corner region, where the simple shear occurs. The effective strain along the shear plane is about 0.36, which can be converted to an area reduction ratio of 1.43 or a linear reduction ratio of 1.2. The stress at the intersecting *• R 3 corner is reduced to about 290MPa. It indicates that Design 2 8- Effective strain of . „ . . . . , , . ,. . j . ECAE ofAZ31B (Design 2) will minimise the tendency of worn-die, cracks and/or entrapment of "dead' materials at the intersecting corners faced by Design 1. However, compared to Design 1, the effective strain is reduced from 1.10 to 0.36. It means a big reduction in the mechanical deformation effect. To compromise the advantages and side effects of both Design 1 and Design 2, another design, namely Design 3 was proposed. It is a design with no fillet for the inner corner but a fillet for the outer corner with a radius of 15 mm. The blank metal billet is 50x50 in cross section and 50mm in length. Fig. 4 is the simulation result on the effective strain of the AZ31B specimen during the first pass of ECAE at a stage of 60% extruded based on Design 3. It indicates that the

457 largest strain still exists in the corner region, where the simple shear occurs. The effective strain along the shear plane is about 0.85, which can be converted to an area reduction ratio of 2.34 or a linear reduction ratio of 1.53. The stress at the intersecting corner is changed to 310MPa. The advantages of the Design 2 are kept, but the effectiveness of mechanical deformation in terms of effective strain is greatly improved from 0.36 to 0.85, namely the area reduction ratio from 1.43 to 2.34, or a linear reduction rate from 1.20 to 1.53.

&JKT

"\

Fig. 4 Effective strain of ECAE of AZ31B (Design 3)

The simulation analysis results are summarised in Table 1, where Ei represents the effective strain after the first pass of ECAE. Total effective strain e after N passes will be Nei.The parameter, N^,,,,, represents the number of passes of ECAE (rounded to the nearest integer) required to reduce of 100^ structure into lOOnm structure, based on the Eq (4)-(5). Namely, N ^ n m is given by Eq (6), where the ratio ^,=100,000/100=1000.

N.

[L n (-^-) 2 ]/ e i =13.8/e

= (L. n A

n

(6)

"1

'i

It can be seen that, to deform a microstructure (lOOu) into a nanostructure (lOOnm), 13 passes are needed for Design 1, 16 for Design 3 and 38 for Design 2. Being considered the fact that Design 1 may result in many potential problems due to high localised concentration stress and the N R ^ n m value, Design 3 is recommended to be used for the next stage of practical ECAE experiments. Table 1 Comparison of three designs for ECAE

Design

El

1 2 3

1.10 0.36 0.85

Ao/Ai(times)

TQ/TI (times)

^U->nm

^max

3

1.73 1.20 1.53

13 38 16

390 290 310

1.43 2.34

Similar numerical analysis was also conducted on ECAE of aluminium and titanium based on the previous three types of die design. Fig.5 shows the simulation results respectively for 6061 (Fig. 5-a) and CP titanium (Fig. 5-b) during the first pass of ECAE at a stage of 60% extruded based on Design 3. It indicates that the largest strain still exists in the corner region, where the simple shear occurs. The average effective strains along the shear plane are about 0.90 for 6061 and 0.81 for CP titanium, which can be converted to area reduction ratios of 2.46 and 2.25 or a linear reduction ratio of 1.57 and 1.50 for 6061 and CP titanium. According to Eq (6), to deform microstructure (100u) into nanostructure (lOOnm), 15 and 17 passes of ECAE are needed for 6061 and CP titanium respectively.

mP^

I)

i \

;W *

1

"£r

•

II

|l (a)

(b)

Fig 5 Effective strain of ECAE of Al-6061 and CP-Ti

458

4

Conclusions

Only the geometrical factor , namely the half-angle of the two intersecting channels, and the numbers of ECAE passes, N, affect the effective strain. The most effective intersecting angle (2«D) is 90°, namely a O of 45°. Compared to traditional area reduction extrusion, after 12 passes, the deformation effect is equivalent to an area reduction ratio of 1 million or a linear reduction ratio of 1022. The ECAE process based on three different die designs with an intersection angle of 2 of 90° was numerically simulated. The Design 3, with a fillet for the outer corner but no fillet for the inner corner, is recommended to be used for the next stage of practical ECAE experiments. It reduces the stress concentration at the intersecting corner but maintains a high effective strain of 0.85 in ECAE of the AZ31B magnesium alloy, which is equivalent to an area reduction ratio of 2.34 or a linear reduction ratio of 1.53. The average effective strains along the shear plane are about 0.90 for 6061 and 0.81 for CP titanium, which are equivalent to area reduction ratios of 2.46 and 2.25 or a linear reduction ratio of 1.57 and 1.50 accordingly. 16 passes of ECAE is needed for the magnesium AZ31B alloy, 15 for aluminium 6061 and 17 for CP-titanium to deform microstructure (100|i) into nanostructure (lOOnm).

Acknowledgements This study is part of the SIMTech's NanoAlloys project (C02-P-008AR) in casting and forming nanostructured light alloys funded by the Agency for Science, Technology and Research (A*STAR) of Singapore. References 1. M.C. Roco, J. of Nanoparticle Research, Kluwer Academic Publ., Vol. 3, No. 5-6, pp 353-360, 2001 2. I. Kim, W.S. Jeong, J. Kim, K.T. Park & D.H. Shin, Scripta Materialia 45 (2001) pp 575-581 3. L.R. Corwell, K.T. Hartwig, R.E. Goforth & S.L.Semiatin, Materials Characterisation 37, (1996), pp 295-300 4. H.G. Selam, Proceedings of 1CCE/9, San Diego, July 1-6 (2002), pp 689-690 5. V.M. Segal, Mat. Sci. & Eng. A197 (1995), pp 157-164 6. S.C. Chen, Z.Q.Wu, B.H.Hu, J. Liang & B.J. Wu, Hot-working Technology, Tsinghua University (1992), pp 150-199 7. B.H. Hu and J.v. Kreij , Forming of Nanostructured Materials (I) -Numerical Analysis of Plastic Deformation in Equal Channel Angular Extrusion (ECAE) of Magnesium AZ31B Alloy, Submitted to Journal of Materials Processing Technology.

THE DEVELOPMENT OF STANDARD PART DATABASE FOR PROGRESSIVE DIE DESIGN ZHONGHUI WANG Institute of High Performance Computing,! Science Park Road Singapore 117528 Progressive die has been widely used to mass produce metal stamping for electrical, electronic and mechanical applications. Of all the components within a progressive die set, standard parts account for a big portion of design work, which requires a lot of interactions for designer to do part selections and parameter specifications. Thus, the performance of data retrieving from the standard part database is an important factor in shortening tooling design lifecycle. Excel-based worksheets are now very popular in storing standard part parameters. However, the existing tool to retrieve Excel-based data is not sufficient due to its low speed for interactive operation in die design system. Besides, the content in a table cell still needs to be evaluated before it can be used for design. This paper reports a method to expedite the data retrieving efficiency by use of a set of predefined keyword-format files that can be easily accessed. A converting tool has been developed to convert the original Excel file into keyword-format files. Retrieving functions like searching and matching are available for these files. The proposed database is made up of the original excel files, the intermediate keyword files and tools for file conversion and data searching. The practical examples have proven the efficiency of the proposed database in our Knowledge-based Die design system. Keyword: Database, Standard Part, Progressive Die Design, Computer Aided Design, Data retrieving

1.Introduction: Progressive die design is one of the important design activities in tooling industry. However, progressive die design is still a complex, skill-intensive task and experience-driven process [1][2]. A typical progressive die set includes metal plates such as Die shoes, Punch plate, Backing plate, Stripper plate, Die plate, and various inserts such as punches. During the design of each plate, standard parts like Fasteners, Dowel pins, Spacers, Springs, guiding pins and bushings would be used for purpose of locating the plate, or transmitting force or mechanic movement. These components can sum up to hundreds or even thousands in quantity. Thus, standard part design takes a big portion of the overall design task. Designers need to visit the standard part database for part specification or for parameter selections or for material properties. This imposes a requirement for efficient data retrieving from the database, especially for the progressive die design system nowadays. Due to its flexible format, Excel worksheet has getting more popular for industry to store engineering data. For example, the Japanese standard part provider MISUMI p l has already distributed its catalog for standard component in Excel worksheets. The "catalog of standard components for press dies" is one of the most frequently used standards for tooling design. A common way to interact with Excel data would run through the open database connectivity protocol (ODBC). ODBC provides a call-level API that different database vendors implement via ODBC drivers specific to a particular database management system (DBMS). Applications can use this API to call the ODBC Driver Manager, which passes the calls to the appropriate driver. The driver, in turn, interacts with the DBMS using Structured Query Language (SQL). Such process hierarchy is quite heavy. An ODBC driver for Excel worksheets has been provided by Microsoft. Nevertheless, the efficiency of the driver for retrieving is inadequate for design interaction according to our experience. The details will be explained on the comparison section. Another aspect is that the industrial data stored in Excel sheet is only "Raw" data, which requires proper parsing tool to analyze and identify before use. Take the Block Lifter Set as an example, its block thickness "T", has a series of values like "16,20, 25, 30, 35, 40" within one cell,

459

460 which is also referred to as the enumeration format. During data retrieve, the string for "T" should be obtained first, and then we need to parse the string into an array of real values before it can be used for matching or selections. Other tables, for example Lifter Pin Set, its spring length "FL", take the form of "20-5-70" in one cell, which is called the incremental format, and actually means that FL may have the value from 20 to 70 and 5 is the increment. For this format, the start value, increment and end value has to be identified before data searching. In order to settle the two drawbacks in the Excel worksheets, we propose a method to improve the data retrieving by organizing file storage and way to access. A set of key-word formatted files are used as intermediate files to store Excel data, and then data fetching is implemented. The proposed database is made up of the original excel files, the intermediate keyword files and tools for file conversion and data searching. The practical examples have demonstrated the improved efficiency of the proposed database in our Knowledge-based Die design system. 2.Proposed method: The use of database is a necessity for a progressive die design CAD software system. However, a major factor in a user's satisfaction or lack thereof with a database system is its performance. If the response time for a request is too long, the value of the system is diminished. The performance of a system depends on the efficiency of the data structures used to represent the data in the database and on how efficiently the system is able to operate on these data structure [4"5' Although ODBC provides a common way to interact with Excel worksheets, the process hierarchy is quite heavy as explained earlier. Such speed may not be acceptable for the interface of die design system, where the data retrieving and parameter selection are enormous. As text-format files are easy to be accessed by common stream library, we use predefined keywords to define the format for the tables originated from excel files so that each table and its field titles and data column can be easily identified. Our proposed database model use the Excel data as the "Raw" source, and use the keyword-formatted file as the source for data retrieving and matching. A set of tools has been available, which include a tool to convert the Excel Data source into keywordformatted source, and tool to do data retrieving and matching. The overall structure of the database model is illustrated in Fig 1.

Domain aDDlications Fig 1. The overall structure for the Database (StdBase)

2.1

Data storage

To represent a specific table in the data source, a set of keywords is used to define the table format. Each table is delimited by the "TABLE_BEGIN" and "TABLE_END" pair, in which field

461 titles are preceded by "COLUMN_HEAD", and column data are delimited "COLUMN_BODY" and "COLUMN_END" pair. TABLENAME indicates the name corresponding table. All the keywords, the data titles and data value will be prefixed by symbol '#', so that each data segment can be separated and identified during data querying. example for a title block table named Molex (Table 1) is illustrated below:

by of the An

Table 1 A table named Molex in the Excel work sheet

TBNO* TBName Width 1 MOLEX A4P 178 2 MOLEX A3L 377 MOLEX_A2L 553 3 The corresponding keyword-formatted table #TABLE_BEGIN # TABLENAME=MOLEX #COLUMN_HEAD #TBNo* #TBName #Width #Height #COLUMN_BODY #1 #MOLEX_A4P #118.0*200.0 #2 #MOLEX_A3L #377.0 #205.0 #3 #MOLEX_A2L #553.0 #314.0 #COLUMN_END #TABLE_END

Height OriginX 200 0 205 0 314 0 is as follows:

#OriginX #0.0 #0.0 #0.0

OriginY 0 0 0

#OriginY #0.0 #0.0 #0.0

TextHeight 2.5 2.5 2.5

#TextHeight #2.5 #2.5 #2.5

Based on the keyword-formatted text, a set of tool has been developed to parse the text block and retrieve data items. It will be proved in the next section that data retrieving from keyword formatted file is much faster than that from the Excel file. 2.2

Data Matching

In order to meet the requirements for progressive die design, a standard part database should provide the functions for designer to match with an initial value. For example, a typical matching method should provide the following functions: 1) Get the maximum (or minimum) value for a parameter in the table; 2) Get the closest allowable value for a parameter in the table if an initial search value is used; 3) Get the allowable value that is greater but closest to a search value; and so on. Implementation of these functions is simple if each cell in a table holds a single value, however is complicated if multiple values are held in a cell like the enumeration format and incremental format as mentioned on the Introduction section. Here, we aim at the general case with a mixed combination of enumeration format and incremental format like "20,30,40, 45-5-60, 70,80,90". We have to identify how a cell data is organized so as to separate each value. As the incremental format is characterized as three values connected by two '-'s, the whole string in the cell can be parsed into two groups of fragments corresponding to the enumeration format and incremental format. The enumeration group can be stored in data array, say, the Enum_Array. while the incremental group is stored in a list structure with three double members, and we name the list as Inc_List. By sorting the Enum_Array and the Inc_List, We then shall be able to implement the above mentioned basic functions. The matching algorithm for these functions has been realized both for the Excel format based matching method and keyword format based matching method, and the details will be described in later paper.

462

3. Comparison: In this section, we conduct a comparison between the previous ODBC based Excel data retrieving and the proposed keyword-format based retrieving method. Two testing programs, the DBFetch and DBContain are employed for this purpose. DBFetch uses the Excel driver provided by Microsoft to retrieve excel-format data, while DBContain is our program to retrieve keywordformat data. As to the data source, a standard part catalogue, the Guide Post Set, is chosen for the comparison. Guide Post Set has 10 diameter profile tables (each has 9 rows and 23 columns), one equivalent table, and 81 length profile tables (each has 6 rows and 6 columns). DBFetch and DBContain are supposed to process two operations: Select all the tables from the Guide Post Set catalogue Fetch all the column data from a table, whose name is "RS" in the Guide Post Set. The result of the comparison is shown in Table 2. Table 2. The efficiency comparison between DBFetch and DBContain

Operation Operation 1 Operation 2

Time used by DBFetch (Tl) (Seconds) 36.12 2.285

Time used by DBContain (T2) (Seconds) 0.08 0.01

Speed of DBContain DBFetch (T1/T2)

over

451.5 228.5

From Table 2, it is clear that the speed of keyword-format data retrieving has greatly improved than that of ODBC-based Excel data retrieving. 4.

Conclusion: Excel worksheet is easy for user to input and store the design parameters and standard library. However, it not convenient for use with traditional ODBC method in a real CAD system, especially for progressive die design. By introducing the intermediate keyword-formatted data storage, and a set of tool, our proposed database can retain the easy-use feature of excel worksheet, and on the other hand has improved the efficiency for data retrieving. Such database has been incorporated into our knowledge-based die-design system. Reference: 1 Cheok, B T and Nee A.Y.C, "Configuration of progressive dies", Artificial Intelligence for Engineering Design, Analysis and Manufacturing, 1998, Vol 12, pp405-418. 2 Cheok, B T Zhang Y.F and Leow L.F.,"A skeleton-retrieving approach for the recognition of punch shapes", Computers in Industry, 1997, Vol. 32, pp249-259. 3 MISUMI Corporation, "Face, Standard Components for Press Dies", April, 1998, http://www.musumi.co.jp 4 Korth, H.F ."Database System Concepts", Second edition, McGraw-Hill, Inc, 1991. 5 Karpovich, J.F. and French J.C, "High Performance Access to Radio Astronomy Data: A Case Study", Proceedings of Seventh International Working Conference on Scientific and Statistical Database Management, Charlotteswville, Virginia, Sept. 1994, pp240-249.

THE APPLICATION OF SENSITIVITY ANALYSIS TO MODIFYING CAR BODY CONFIGURATION Xuerong Zhang Maotao Zhu School of Automobiles and Transportation, Jiangsu University, Dantu Road 301,Zhenjiang city, Jiangsu province, China E-mail: [email protected] In this paper, three dimensions CAD model of a car body is built up with UG software. The model is inputted into ANSYS program by port. After the model is modified properly, it can be meshed. It is necessary for modal analysis to define materialO real constants and constraints etc. Then we can launch the ANSYS solver to calculate the model. The modal parameters (modal frequency and mode shape) are gained. Their validity is verified by modal test. On above base, structure modification and optimization are carried out. According to the theory of interior cumulation relativity, their mode shape and frequency relativity is calculated. The degrees of relativity determine if the finite element model can substitute the prototype or not. According to request of exposing environment, its first order frequency must less than 23Hz,but the actual value is 24Hz, so the adjustment must be made. Sensitivity method is used to modify body structure. The values of sensitivity to thickness of sheetmental are calculated. The parameters of thickness corresponding to larger value are regarded as design variables. The anticipant target is minimum expense.

1

Introduction

There are two cases during structural dynamic modification. First, the structure modification slightly in detail due to design or manufacture, we can work -out dynamic property changing value according to it. Second, we are able to change structure parameters in order that structure dynamic properties (for example, natural frequency and mode shape) meet request we are expected. In allusion to complex structure, there are many modification precept and design variables, so it is necessary to decide which way is most effective. We can find out these parameters or variables that is highly sensitive to dynamic property and use them as design variables during optimization. The method based on sensitivity analysis can avoid blindfold action, improve efficiency and cut down design cost [1]. 2

Building Body Finite Element Model

In this paper, the car body surface model is built by UG software. During the process, we take action in simplifying model mainly including omitting function and unloaded component, simplifying section shape properly, omitting these features influenced performance slightly such as small hole, boss, groove and flange etc [2]. Finally the CAD Model of car body is built up. It is inputted into ANSYS program by IGES port. After the model is modified properly and element type, real constants (thickness of sheetmental) and material are set up, it can be meshed. The whole finite element model is composed of 14,848 nodes and 15,544 shell elements. 3

Modal Analysis with Finite Element Method

Any constraint and force is not applied to FE model, that is to say the model is free of restraints, you specify a modal analysis and mode extraction method. The four methods (Block Lanczos, subspace, PowerDynamics, and reduced) are the most commonly used in

463

464

ANSYS program. Block Lanczos method is applied to finding many modes (about 40) of large models, recommended when the model consists of poorly shaped solid and shell elements. This solver performs well when the model consists of shells or a combination of shells and solids. Works faster but requires about 50% more memory than subspace. So first six modes (mode shapes and frequencies) are shown in Table 1 and Fig.l to Fig.4. 1

ANSYS S . O FEB 2 3 2 0 0 2 09:56:01 DISPLACEMENT STEP-1 SUB - 1

Fig. 1 First mode shape of whole torsion

Fig.3 First mode shape of whole bend

4

Fig.2 Second model shape of whole torsion

Fig.4 Local vibration of coping

Validate Finite Element Model by Experiment

According to structure characteristic, 161 testing positions forming grid chart are lay out in whole car body. The whole experiment system is composed of charge amplifier, three-dimensional acceleration sensor, force hammer (including force sensor), software system of modal analysis and SD380 dynamic analysis apparatus etc. During the process, one curve of transformation and interference function is got as shown in Fig.5. According to experiment results, numerical value of interference function is bigger near interesting frequency. In fig.8 the interference function is 0.998 on 61.125Hz.It is found that only 0.2% in response signal is produced by interference, other is produced by inspirit signal. So the result is accurate. Modes (mode shapes and frequencies) of experiment are shown in Table 1 and Fig.7 to Fig. 10. In Table 1, the number of frequencies less than 50Hz is 6 in calculating and 4 in testing respectively. Their degrees of relativity determine that the finite element model can substitute the prototype whether or not. According to the theory of interior cumulation relativity, their mode shape and frequency relativity is calculated.

465

Sensitivity

2000 0

,1,

I , I , HM^.1, | H I > L+jyl

-?—0

11 1

* i *J*J— .558 "Jff|. iTfl si.iannz cm"™ "'sis .ii(ii~>&' XF-KI 'SOS H 3 Fig.5 Curve of transformation and interference function

Component Nuiber Fig.6 Sensitivity to first order frequency Tablel Comparing Frequency between calculation and Experiment

Modal Order 1 2 3

Frequency^ HzD Calculating Testing 24.04 23.90 28.03 30.87 28.14 39.20

Modal Order

FrequencyD HzO Calculating Testing 35.70 44.74 38.75 51.04 45.73 54.24

4 5 6

/.

Fig.8 Second model shape of torsion of of testing

Fig.9 First mode shape of whole bend of testing

Fig. 10 Local vibration of coping

Assuming X e Rn, it is real mode shape of testing modal.D Y e Rn is mode shape of calculating modal. The interior cumulation relativity of X and Y in Hilbert space is defined as [3]

pc{xj)=n=l==JL== J{X'*){f'?)

(l.i)

We program to calculate it with MATLAB software according to (1.1). The result is shown in Table2.

466 Table2 The interior cumulation relativity of first six order mode shapes

Testing Modal Relativity 2 3 4 5 6 Order 1 0.3541 0.1021 0.2593 0.3856 0.2235 1 0.9242 n 0.2432 0.1847 0.1645 0.1811 0.2403 0.2920 2 S o SL 0.0084 0.3035 0.8794 0.3864 0.4299 0.3106 3 Q. O 0.3124 0.1252 0.2123 0.2163 0.1475 0.4011 4 0.1256 0.3127 0.2384 0.2134 0.1643 0.8142 5 3' 00 0.2736 0.1679 0.1377 0.2070 0.3293 0.8627 6 The interior cumulation relativity ranges from 0 to 1 .The bigger these values are, the better their relativities of them are. According to analysis results of relativity, corresponding modes is found and possibly missing modes in testing is also found (Because the inspiritment method in testing is single point. If we want to avoid missing modes, we can apply multipoint inspiritment method to distinguishing dense and equal frequency modes). The relation of corresponding modes between calculation and experiment is shown in Table3. So the finite element model can substitute the prototype. The model can be used as foundation in modifying body and optimization design. Table3 The relation of corresponding modes

Testing Modal Frequency Order 23.90 1 / 2 30.87 3 4 / 5 39.20 6 44.74 4.1

5

Calculating Modal Frequency Order 24.039 1 28.027 2 28.135 3 35.702 4 38.749 5 45.726 6

rc.eiauvuy 0.9242 / 0.8794 / 0.8142 0.8627

Annotation* The symbol ' / ' denotes missing of this order mode

The Application of Sensitivity Analysis to Modifying Car-body Configuration

According to request of exposing environment, the first order frequency must less than 21Hz,but the actual value is about 24Hz,so it needs be adjusted. This paper applies sensitivity method to modifying body structure. The sensitivity values to thickness of sheet mental are calculated, the thickness parameters corresponding larger sensitivity value are regarded as design variables. So it will expend minimum to realize the anticipant target. Here we assume that Xr (eigenvalue) is complicated function of ra^kVpz-(mass, stiffness, damp): K = / K / ,ky ,Cy) (1.2) We are able to change structure parameters in order that structure dynamic properties (natural frequencies and mode shapes) meet requests we are expected. When the scope of adjustment is narrow, second-order correction item is omitted. According to literature [4], the eigenvalue is approximately defined as

"- = X X < ^ 7 » - • X X <$£*«. * X X <£j««. <'-3> 1 = 1 j' = l

v

;=i

j= \

'J

,=i

y=i

u

467

On the basis of equation (1.4), we have A/r = X(-^)A/*,.

(1.4)

Where Ah, stands for basic variable of thickness of sheetmental and 3 /r/ stands for /ahi

natural frequency sensitivity to thickness. On the premise that sensitivity have been known, we can find out the change of thickness according to the change of frequency. It is highly important that we are able to figure out the value of sensitivity. Sensitivity is a board conception. In mathematics, it is classified as differential and difference [4]. According to literature, we know that the calculation of differential sensitivity is highly complicated and data size is very enormous. So we figure out difference sensitivities. The whole body is composed of 17 components through weld or bolt. Thickness of every component is regarded as independent variable. The thickness of every component from 1 to 17 increases 10% respectively, then we figure out the change of natural frequency respectively. The change of frequency divided by the change of some component thickness is sensitivity to thickness of this component. Through repetitious calculation, the sensitivity of first order natural frequency is shown in Fig.6.It is found that the components of larger sensitivity are 15, 17.They correspond with front standing pillar and doorsill.The target value of first order frequency is less than 23Hz. So 4fi =f\-J\= 23.0-24.013 = -\M3(Hz) According to equation (1.4), assumed that A/i, is direct ratio with its sensitivity. Their scale coefficient is k. A/, = (-^-AA15 +-^-Ahl7) = 3787 *0.38/t + (-2675* (-0.27 k)) = 2161.3* = -1.013 So the scale coefficient is -4.69e-4. Therefore the thickness of above two components cut down 0.2 millimeter and add 0.1 millimeter respectively.In ansys program, we modify the corresponding real constants, then solve again. It is found that the first order frequency is 29.92Hz.So it is satisfying. Therefore the sensitivity analysis is an effective method in modifying body structure. It will expend minimum to realize the anticipant target. References 1. Zhifang Fu and Hongxin Hua, Theory and Application of Modal Analysis(Shanghai liao Tong University Press,2000). 2. Tianming He and Xiangnong Xu ect. Intensity Analysis of Unitized Body With Stub Front Frame. Wuhan Automobile University Transaction. Vol.18 NO.l 1996.4 3. Qiyin Cao, Relativity Analysis of Modal Based on interior cumulation.Application Mechanics Transaction( 15)4,1996. 4. Vanhonacker P.Differential and Difference Sensitivities of Naturnal Frequencies Via Sensitivity Analysis,Proc.of the 3rd IMAC ,1985

THE SIMULATION OF THE IT-TYPE CONSTRAINT BENDING PROCESS Xu Hongzhi Institute of High Performance Computing, 1 Science Park Road, HOI-01 The Capricorn Singapore Science Park II, Singapore 117528 Email: [email protected] In forming complex sheet bent part the calculation of the unfolded length is crucial to design technology, which is one of the numerous factors affecting the final shape and size of a product. This paper presents the analyses of the Il-type constraint bending deformation process by means of FEM. Mechanic models is set up according to the deformation characteristics and the gap-element is employed to handle the contact problem of the punch, die and blank. To study the deformation regularities and calculation method of the unfolded length, the effects of die parameters and material properties on deformation are discussed particularly and the distribution of stress, strain and displacement in a blank are examined under the Il-type constraint bending under different condition. The deformation features and regularities of the Il-type constraint bending are found and a formula for calculating the unfolded length of the Il-type constraint bent parts are proposed.

1.

Introduction

The bending process is very important in stamping industry, for bending parts the calculation of the unfolded length and the forecast of springback are crucial for stamping die designer. But the calculation of the unfolded length and springback are related to many factors such as tooling geometry, material properties, friction and so on, so it is hard to calculate the unfolded length of bending parts accurately. Usually people use empirical formula to calculate the unfolded length of bending parts [1]. Hill's accurate mathematical theory of plasticity [2] gave people theoretical help in mathematical calculation in the bending process. In recent years many people have done research in bending process using FEM simulation [3, 4] and achieved good results which could help engineers to design stamping die. They have studied the V-bend and free bending process using FEM simulation and received good results which compared with the practical test. For complicated parts, for example, the Il-type constraint bent part, there are many facts affect on them to simulate the bending process and it is also difficult to use empirical formula to calculate the unfold length accurately. The Il-type part is a blank for the strengthen frame, as shown in Fig 1. The calculated result of unfold length of Il-type part is very important for the precision of the strengthen frame. This paper presents the

n- lypa part

strengthen

frame

Fig 1. the n-type part and the strengthen frame

analyses of the Il-type constrained bending deformation process by means of FEM. Mechanic models is set up according to the deformation characteristics and the gapelement is employed to handle the contact problem of the punch, die and blank. A formula for calculating the unfolded length of the Il-type constraint bent parts is given.

2.

Simulation of Il-type constraint bending process

For analysis the bending process of Il-type constraint bent part, the constrained bending process is simulated by ADINA software using non-linear gap elements. The discretization of mechanic model is shown in Fig 2. In the simulation process very fine meshes are required on the sheet part between the bending punch and die for analysis the

468

469 deformation feature and regularities and calculating the length changing of the part. The updated formulation adopted in this simulation is shown in follows:

{o}w = {of-" + [D] {Ae}(i)1 where the {ay''' is the stress in the gap elements. [Lp-20 j Punch

tens

ii Die

Fig 2. Discretization of mechanic model in ITtype constrained bending The sheet material used for simulation is steel 20, brass H62 and LY12 Al, sheet of thickness 1.0mm and length 60.0mm. The material property parameters are listed in Table 1. Due to its symmetry, only half of the blank is considered and it is divided into 750 quadratic elements with six layers in the thickness. Table 1 Material property parameters Material Yield stress Young's modulus E0 (MPa) as (MPa) 20 240 21000 H62 188.72 10000 LY12 277.03 7100

Poisson's ratio v 0.37 0.34 0.31

The tangent modulus after yielding ET(MPa) 1000 1364 1558

The simulation results are shown in Fig 3 with same forming condition and the radius of bending punch and die are 1,0 mm. It is seen that the material property affects on the tangential strain, but it is not much. The strain between bending punch and die are minus, •xoi' No*No3No*. No5Neb' No'No8-

-.Me3£-Q3 -5bd3l*(ii -67?C£*OL .6e5*t*Q2 .I3C3£-03 .i9?I£*01 .?535E>Q3 000«>OQ

No?< -o556E*03 N 0 3 -3656£»03 No* -;568E«CS No5> j;J3E-03 No6 5M8C*03 NO?' 744JE.03 NoS MOOE'OD

fT7~-s

Fig 3. Variation of the tangential strain et when Rp = Rd = 1.0mm and the depth of bending h = 4.0mm for different materials (a) 20 (b)LY12 (c) H62

470 that means the thickness of the blank is thinning after constrained bending process and the thinning are different according to the material and radius of tooling. The smaller radius increases the bending deformation and the thinning is greater.

3.

The formula of calculating the unfold length for Fl-type constrained bending process

For calculating the unfold length of a bent part, usually the bent part is divided into deformation area and non-deformation area and used different calculation method in different area, it is shown in Fig 4. The calculating formula is shown in follows: l

A L = 1,+ 12 +hab + 13 + 14

Normally the empirical formula is application to single bent part. This Il-type constraint bent part is a multi-bend and its way to form is different from single bent part, it can be divided three area, they are (1) non-deformation area (li and 12), (2) deformation area (13 and 14) and (3) thinning area (hab). For this constraint bent part the new calculating

Fig 4, Division of blank unfolding calculation

formula is needed. For non-deformation area, the unfold length is just the length of this area. In deformation area to calculate the unfold length the curvature radius of strain neutral layer is key factor. In the simulation of Il-type constraint bent part we know that there are greater deformation at bending punch and die radius and between them than other parts. From the simulation results of bending process for Fl-type constraint bent part the relation between curvature radius of strain neutral layer and relative bending radius are found, they are shown in follows: LY12 Al:

pEp = 1.001840r/t + 0.151910 pEd = 0.993623r/t + 0.343599

(in punch radius) (in die radius)

Brass H62:

pEp = 0.985617r/t +0.236465 pEd = 1.002220r/t + 0.300318

(in punch radius) (in die radius)

Steel 20:

pEp = 0.996763r/t +0.143295 (in punch radius) pEd = 0.979220r/t + 0.309408 (in die radius) where pE is curvature radius of strain neutral layer, r is radius of tooling, t is the blank thickness. For the thinning area, from the simulation results we can find the following function: whenr/t 2, -q = 2% and when hab/t < 2, TI = 1.96% when hab/t > 4, T| = 5.33% where r| is thinning percentage of the blank, r is the radius of tooling, t is original thickness of the blank, hab is the length which is shown in Fig 4.

471 So from these analysis the formula of calculating the unfold length for Il-type constraint bent part is shown in follows: Vi L = 1, + 12 +p Ep a+ p e d a + hab(l--n) where L is the unfold length of blank, 1] and 12 are the length of non-deformation area, pEp and pEd are curvature radius of strain neutral layer at punch and die radius, a is bending angle, hab is the length which is shown in Fig 4, and T| is thinning percentage of blank at partofhab. For verification this formula of calculating the unfold length for Il-type constraint bent part, 3 material which are used in simulation are tested, the measured and simulated results are shown in table 2. From the table 2, it can be seen that the simulated unfold length is a little bit bigger than experimented, the difference come from curvature radius of strain neutral layer at punch and die radius, because this curvature radius are given by linear regression analysis method used simulation data. Table 2 Experimentally measured results of Il-type constrained bending (R„ = Rj = 2.0mm) (mm) Material

L,

12

LY12 H62 Steel 20

33.34 4.48 49.94

31.84 45.10 42.10

Length of Bottom 19.18 19.16 19.18

hi(lr])

h2(lrt)

7tpp

7tpd

6.32 8.42 7.36

6.29 7.76 7.57

6.77 6.94 6.71

7.32 7.24 7.12

Measured thinning percentage 5.26% 5.37% 5.41%

Simulated length of blank 111.06 139.10 139.98

Measured length

Difference

111.0 139.00 139.94

+0.06 +0.10 +0.04

Even though there are difference between simulated and experimented, the results simulated and experimented for Il-type constraint bent part coincided well and the difference are very small, this formula can be used in manufacture. 4.

Conclusions

1. The deformation features and deformation regularities of constraint bent part are found by means of FEM method to simulate Il-type bent part and it is useful for understanding multi-bending process. 2. According to the simulation results of Il-type bent part, the formula of calculating the unfold length for Il-type bent part is proposed first time: l

A L = 1, + 12 +pep a+ p£d a + hab( 1 -r\) whenr/t 1,T| = 5.33% This formula af calculating unfold length is not only application to II (U) -type bent part, but also to multi-bending (at same bending process) process. 3. The calculation results that are verified by experimental test are correct. The formula of calculating the unfold length for constraint bent parts can be used in manufacture.

5.

Reference

1. Li Shuoben, The Technology of Stamping, 13-17, Jixie Gongye Press, Beijing, 1982. 2. R. Hill, Mathematical Theory of Plasticity, 142-177, Oxford, London, 1962. 3. M. Kawka and A. Makinouchi, "Shell-Element Formulation in the Static Explicit FEM Code for the Simulation of Sheet Stamping, ", Journal of Materials Processing Technology, 50 (1-4), 105-115, 1995. 4. Eiji Nakamachi, "Sheet-forming Process Characterization by Static-Explicit Anisotropic Elastic-Plastic Finite-Element Simulation", Journal of Materials Processing Technology, 50 (1-4), 116-132, 1995.

OPTIMISING THE DIMENSIONS OF CYLINDRICAL ULTRASONIC MOTOR YANG QUANGANG Data Storage Institute, DSI Building, 5 Engineering Drive 1, NUS. E-mail: YANG [email protected] LIM SIAK PIANG Department

of Mechanical Engineering, National University of Singapore E-mail: [email protected]

Despite the simplicity of the structures, the current difficulties in designing high performance ultrasonic motors are associated with the lack of complete and accurate models and well-understood design rules because motor's parameters are time varying and load-condition dependent. In this paper, an expression, which describes the relation between the length and radius of the ceramic transducer of cylindrical ultrasonic motor, is presented after the theoretical analysis of its longitudinal and flexural vibrations. It can be used as a basis to choose the dimensions of the ceramic transducer at the design stage.

Introduction Recently, a lot of R&D efforts have been directed to the understanding and optimisation of ultrasonic motors because of their advantages over the conventional electromagnetic motors. However, despite the simplicity of the structures, the current difficulties in designing high performance ultrasonic motors are associated with the lack of accurate models and wellunderstood design rules because the motor's parameters are time varying and load-condition dependent. A special feature of ultrasonic motors is their two-stage energy conversion mechanism. Electrical energy is first converted into high frequency mechanical oscillations, and it in turn is rectified into macroscopic unidirectional motion of rotor at the second stage. The dimensional optimisation of the piezo-ceramic transducer plays an important role in increasing the conversion efficiency at the first stage. Cylindrical ultrasonic motors have been increasingly studied recently [1-3]. However, they didn't deal with the choice of design parameters. In this paper, the vibrations of the ceramic tube are investigated. By equating the flexural frequency to the longitudinal frequency to obtain the maximum vibration amplitude during excitation, the optimal dimensional relationship between the radius and length is obtained. It can be used as a basis to choose the dimensions of the transducer at the design stage. Simplified Piezoelectric Equations A piezo-ceramic tube is shown in Figure 1. The poling direction is aligned along the r-direction. Cylindrical coordinate is used and its origin is chosen at the center of tube and z coincides with the axis of the tube. 472

473

Utilizing the hypotheses described in [4], the d-type piezoelectric strain equations can be simplified and reduced to S

r

=

S

13T8

S

\3Tz+d13Er

s T +s T +d

e

(1)

+S

E

(2)

= n e n z 3i r

S

(3)

=SnT6+SUTz+dUEr-

z

From equations (2) and (3), we have Tz = YE [Sz + nSe - d3l (1 + pL)Er ]/(l - n2)

(4)

(5) T8=Y0E[Se+nSz-d3l(l + ii)Er]/(l-H2), E E E E where Y l/s x is the elastic modulus and ji = - s 2/s x is the Poisson's ratio.

Tier

Figure 1. Piezo-ceramic tube.

Figure 2. The bending of piezo-ceramic tube.

Longitudinal and Extensional Vibration of Piezo-Ceramic Tube In piezo-ceramic cylinder, if we isolate an infinitesimal arc element with angle 3d and height dz, the differential equations can then be established in the radial and axial direction as p(d%/dt2) = -Te/R (6) p(d2Zz/dt2) = dTz/dz. (7) where p is mass density and £ is displacement component. Under harmonic vibrations, equations (6) and (7) can be simplified to pm%=TB/R (8) -pco2Zz=dTz/dz, (9) in which co is the angular frequency. Differentiating equation (8) with respect to z, and considering Sg = £r/R and Sz = d£Jdz , we obtain d%/dz2+k%=0. where k =

CO

(l-^Xco/co.f-l

(co/coy-i

(10) CO,

, and c 2 = YE Jp . c is the

wave propagating velocity, and cox stands for the angular frequency of the radial vibration without considering the piezoelectric effects.

474

As }X is about 0.3, (1-/L12) 1/2~1.05. Therefore, under the conditions fw/ft), < 1 < „ ,,„, the solution of (10) can be written as {(0/(0, > ( l - / i )"V
= pi2(o\

(13)

where a>2 = TIC/I is the angular frequency of longitudinal vibration without considering the piezoelectric effects. This is a coupled frequency equation of longitudinal and radial vibrations. In our discussion, only the fundamental vibration (n=l) is considered. Moreover, as described in [5], if the length of the cylinder tube is greater than half of its perimeter, the lower eigenfrequency will correspond to the longitudinal vibration. Hence, the fundamental longitudinal frequency can be obtained {(o2 + a2)--yjiaf -co2)2 +Aji2(o2(o z v (On = , — ' ^ — 11 2 2(1- n )

.

(14)

Flexural Vibration Analysis of the Ceramic Tube For the cylindrical ultrasonic motor, the cylinder ceramic tube is uniformly segmented into four quadrants. With two opposite electrical sources applied to one pair of quadrants facing each other, one quadrant will contract while the other expend, this will bend the tube, as shown in the Figure 2. If the applied voltages are R.F. signals, bending vibration will be excited. For simplicity, the bending vibration can be described by Euler beam model and its fundamental flexural frequency can be written as

in which R0 and R. are the outer and inner radius of the tube. Dimensional Relation of the Ceramic Tube To obtain the maximum vibration amplitude during excitation, it is important to make the longitudinal and flexural frequencies equate to each other. Let con = (Ofl, and note that Y0E = E and R = (Ro + Ri )/2, then

475

250.3x(/? o 2 +/g,. 2 )(l-^ 2 )

2 2

i

=

i

n—-

2

2

(16)

4 4 ;r 2 ( le^V { 7t (fl 0 +fl,.) 2 / 2 "\| (* 0 +/?,) 2 /2 (K+Rfl2 This expression can be used to choose suitable tube dimensions during the design of new cylindrical USM. We can fix either the radius or the length, and then determine the other. As an example, the transducer in [1] has 2.4mm outer diameter and 1.4mm inner diameter. Utilizing equation (16), if the Poisson's ratio of PZT is 0.36, the calculated tube length with same diameters will be 5.1 mm. It is about half of their chosen length of 10mm. Summary In this paper, longitudinal and extensional vibrations of piezo-ceramic cylindrical tube have been analysed based on the simplified piezoelectric equations and the differential equations of equilibrium. Their frequencies are obtained after decoupling the frequency equation initially involving both longitudinal and extensional vibration. The flexural vibration of the tube was described by Euler beam bending model. By equating the flexural frequency to the longitudinal frequency, the optimal dimensional relation between the radius and length of tube has been attained. This will provide a basis to choose the dimensions of the ceramic transducer at the design stage. References 1. T. Morita, M. Kurosawa and T. Higuchi, Cylindrical Micro Ultrasonic Motor Utilizing Bulk Lead Zirconate Titanate(PZT), Jpn. J. Appl. Phys. Vol. 38, (1999), pp 3347-3350.Part 1. No.5B 2. T. Morita, M. Kurosawa and T. Higuchi, A Cylindrical Microultrasonic Motor, Ultrasonics 38 (2000) 33-36 3. Lu Pin, Kwok Hong Lee, Siak Piang Lim, Wu Zhong Lin, A Kinematic Analysis of Cylindrical Ultrasonic Micromotors, Sensors and Actuators, A87 (2001) 194-197 4. J. F. Haskin and J. L. Walsh, Vibrations of Ferroelectric Cylindrical Shells with Transverse Isotropy. I. Radially Polarized Case, The Journal of the Acoustical Society of America, V29, 1957, 729-734 5. Stefan Markus, The Mechanics of Vibrations of Cylindrical Shell, Elsevier, 1988

DESIGN AND ANALYSIS OF A HIGH-EFFICIENCY MR VALVE W. H. LI, H. DU AND N. Q. GUO School of Mechanical & Production Engineering, Nanyang Technological 50 Nanyang Avenue, Singapore 639798 E-mail:mwhli @ ntu. edu. sg

University

This paper presents an optimized design of a high-efficiency magnetorheological (MR) valve using finite element analysis. The MR valve composes a core, a wound coil, and a cylinder shaped flux return. The core and flux return form the annulus through which the MR fluid flows. The effects of magnetic field formation mechanism and MR effect formation mechanism on the MR valve performance are investigated. The analytical results of magnetic flux density in the valve indicate that the saturation in the magnetic flux may be the core, the flux return, or the valve length. To prevent the saturation as well as to minimize the valve weight, the dimensions of the valve are optimally determined with the finite element analysis. In addition, this analysis is coupled with the typical Bingham plastic analysis to predict the MR valve performance.

1

Introduction

Recently, a very attractive and effective approach to developing controllable hydraulic devices makes use of MR fluids. Devices using these materials have many advantages, including: valves have no moving parts, eliminating the complexity and durability issues in conventional mechanical valves, providing a direct transduction from an electrical control signal to a change in mechanical properties [1]. Among these semiactive MR devices, MR valve is a key component, which plays a significant role in contributing to the device performance. Despite many investigations on semi-active MR devices, research work on systematic or optimal design of MR valves is relatively rare. In particular, the coupling between magnetic field formation mechanism and MR effect formation mechanism still lacks much attention. Conveniently, the design and manufacturing of MR valve is based on trial-and-error method, the valve performance with such empirical approach depends much on the designer's practical experience. It is quite time-consuming and the cost is very high for the industry to implement mass production. As such, it is crucial to develop systematic design algorithms for manufacturing high-efficiency and low cost MR devices. The objective of this work is to design a small-sized, and highefficiency MR valve. For this purpose, an efficient magnetic circuit was designed using ANSYS for magnetic field analysis [2]. Once the MR fluid characteristics and core material property are incorporated into the analysis, and the response of the magnetic circuit as a function of applied current has been determined, a highly efficient design for the magnetic

476

477

circuit can be achieved. In addition, this finite element analysis is coupled with the typical Bingham plastic model to predict the valve performance. 2

FEM Modeling and Analysis of the MR Valve

2.1 New Modeling The schematic of the proposed MR valve is shown in Figure 1. This valve consists of a core and a flux return, and an annulus through which MR fluid flows. The bobbin shaft is wound with insulated wire. A current applied through the wire coil around the bobbin creates a magnetic field in the gap between the core and the flux return. Then the magnetic field increases the yield stress of the MR fluid between the core and the flux return. At the design stage, many parameters, including fluid gap, bobbin shaft diameter, flux length, the thickness of flux return as well as the wire numbers, should be considered. To achieve an efficient MR valve, the flux density in the fluid gap should be maintained constant. The relative permeability of the MR fluid is far smaller than that of low-carbon-steelbased bobbin and flux return; consequently, the smaller fluid gap distance will be better. Practical gaps typically range from 0.25 to 2 mm for ease of manufacture and assembly. In this study, the gap is set to 0.5 mm. The other optimum parameters will be determined by using finite method with the help of ANSYS package. Due to structural symmetry, the MR valve is to be analyzed as a 2-D axisymmetric model, as shown in Figure 2. The main dimensions of the valve compose of Dcore, Lcore, Din, Dout, Lactive, Lretum and g.

Figure 1. Schematic of the proposed MR valve

Figure 2. Axisymmetrical model of the MR valve

478

2.2 Analytical Results By using ANSYS simulation, the effects of the saturation phenomenon both in steel path and fluid gap are studied. The effects of the design parameters are evaluated consequently. Throughout this analysis, an optimal model of the MR valve is introduced. One of the basic requirements for the optimal valve is that the flux density across the fluid gap can reach its maximum value at its operating point. The bobbin shaft diameter is the most sensitive design parameter limiting the magnetic performance. In Figure 3, the maximum magnetic flux density, Bmax, in the gap is plotted versus the core diameter, Dcore- It can be seen from this figure that when the core diameter is smaller than 14 mm, it is impossible for the MR fluid to reach its operating point. In other words, the maximum flux density cannot reach 0.8 Tesla no matter how big the coil current is. The reason for this is due to the saturation of the core. Figure 4 shows the trend of the magnetic flux density, Bmax, in the gap as a function of active core length, Lactive- It can be seen that at small active core lengths the flux density in the gap can reach and maintain its maximum value 0.8 Tesla until 3 mm before it decreases steadily with increasing active length. Decreasing the core length results in higher magnetic flux density in the fluid gap by reducing magnetic saturation in

Figure 3. Maximum magnetic flux density, Bmax, in the gap versus the bobbin shaft diameter, Dcore

Figure 4. The maximum flux density, B max, *1S a

function of active core length, Lacuve

the bobbin shaft. Considering all the influencing factors, the optimum MR valve is proposed and its dimensions are listed in Table 1. Table 1. Optimal dimensions of the MR valve

Item Dimension

Ucore

'-'core

Din

Dout

'-'active

J-return

g

mm

mm

mm

mm

mm

mm

mm

14

16

22

28

3

2.5

0.5

479

3 Valve Evaluation It is assumed that the MR fluid is incompressible and fluid inertial is assumed to be negligible, the governing equation for the laminar flow in the valve is given by: 24r]QLa '. i A active AP = (1) y
, -

—•

*~~

"*

MRF-132LD

1800 1600

_—*—-*——K

1400 * _ _ — K - — • * 1200 i

<

*"

" *

"

"

t,

A

*

*

*"Z -»*-"I^ -«-1.CBmp

^

1000 800

4

-*

•*

*

*

600 400 200 0

B -»—-*—-»—"" ^ ___-^—*———-" . -» * *

" ^ "~~*

B

*

Figure 5. Flow characteristics of the MR valve: pressure difference, AP, versus flow rate, Q

4

Conclusions

Considering the MR fluid properties and the magnetization curve for steel path, a maximum magnetic flux density in the fluid gap was achieved with optimized design and was verified with simulation. References 1. Gavin, H. P. Annular Poiseuille Flow of Electrorheological and Magnetorheological Materials. Journal of Rheology, 45 (2001), pp. 983-994. 2. http://www.ansys.com/webdocs/University/gettingstarted/tutorials/prin t_emag.htm.

COMPUTER DESIGN AND VISUALIZATION ON NEW LOOP WORM TRANSMISSION QINGSHENG LUO & BAOLING HAN Mechanical & Electrical Department, Shantou University Shantou City, Guangdong Province, PR. China E-mail:

[email protected]

This paper presents a methodology for parametrical design New Loop Worm Transmission (NLWT) and Visualization the design result, based on Computer Virtual Reality Technology. Considering NLWT's complicated structure and precise surface, the authors are analyzing NLWT by numerical value method and carrying out a parametrical design for NLWT. NLWT Design result will be modeled and visualized for designer's inspection and modification. In this paper, All parameters of analysis NLWT and a procedure of design NLWT are also proposed for further research.

1

Introduction

Contrasting to Common Column Worm Transmission (CCWT), Loop Worm Transmission (LWT) possesses some virtues, such as its transmission capability is good and its tempo is fast, therefore, LWT is called New Loop Worm Transmission (NLWT). In engineering, NLWT has much strongpoint, for instance, in Single Bundle Loop Worm Transmission (SBLWT), the tool used to machining worm can be greatly predigested, and the tool interchangeability is very good, thus, it is applicable to the single piece and small batch production. In Double Bundle Loop Worm Transmission (DBLWT), if the proper design and manufacture may be down, the excessive tooth mesh and the double contacting route contact will be realized, and the angle between the contacting route and the relatively sleek velocity direction is able to reach 60° ~ 90° currently. Owing to the inducement curvature radius between tooth surface is large, thus, NLWT can greatly improve the carrying capacity of Worm Transmission (WT). The practice proves that different DBLWT having various kings of tooth forms possesses very impressive carrying capacity, it is one of WT possessing excellent capability, its carrying capacity is about 2 ~ 3 times the carrying capacity of CCWT. So, NLWT is now finding more and more extensive application in the correlation engineering fields[l]. 2 Parameter design & entity modeling According to the knowledge of Graphics, it is known that the outline of Loop Worm is formed by a concave arc generating line rolling the worm axes to turn, and its tooth surface is a track face that takes a beeline or a curve as a generating line to be formed, and or its tooth surface is a bundle face that takes a plane or a camber as generating line to be formed. Fig. 1 is the principle sketch map in which a beeline is taken as a generating line to form a Loop Worm. In Fig.l, the plane P passes the axes OlOl, and rolls it for turning at the same angular velocity. At the same time, in plane P, there is a straight generating line u—u, it is tangent with the circularity which takes rb as its radius, and the generating line u—u rolls the point 02 for turning at the same angular velocity. Therefore, the straight generating line u—u 480

481

Fig.l a Straight Loop Worm forming sketch map

Fig.2 a Plane Bundle Worm forming sketch map

forms a trace camber in space, it is the screw tooth surface of the Straight Loop Worm, and it is a frank grain bend surface which could not be opened up [1]. If the straight generating line is replaced by the plane E (3 ), something shown as Fig.2 will appear, the plane E ( 3 ) can bundle out a tooth surface of Loop Worm when it turns in above rule. Such worm is called the Plane Bundle Worm, it is the bundle bend surface of a plane family, and it is a frank grain bend surface which could be opened up. If the straight generating line is replaced by involute helicoids, thus, the Involute Bundle Worm will be gained, and its tooth surface is a bundle bend surface of the involute helicoids family. Therefore, LWT can be fallen into the Straight Loop Worm Transmission (SLWT) and the Plane Bundle Loop Worm Transmission (PBLWT) and the Involute Bundle Loop Worm Transmission (IBLWT) according to the generating line or generating line which forms the Worm curve. In Graphics, LWT can be fallen into SBLWT and DBLWT according to the formative principle of tooth surface of worm and worm wheel. In SBLWT, worm wheel is a column gear, and the worm wheel tooth surface is the bundle surface of the column gear. When the worm wheel tooth surface is a plane, it gains the Plane Single Bundle Loop Worm Transmission, and when the worm wheel tooth surface is a involute helicoids, it gains the Involute Single Bundle Loop Worm Transmission. For DBLWT, because its worm wheel tooth surface is formed by bundling of the worm tooth surface, therefore, it is called the double bundle. In practice, PBLWT generally refers to the Plane Once Bundle Loop Worm Transmission (POBLWT) and the Plane Bis Bundle Loop Worm Transmission (PBBLWT). In POBLWT, there is the difference of the Straight Tooth Plane Bundle Loop Worm Transmission and the Tilted Tooth Plane Bundle Loop Worm Transmission. The Beeline Loop Worm Transmission and PBBLWT are the multi-tooth contact and the double touching line contact, thus, it enlarges the touching area of the tooth surface, and improves the forming conditions of the oil film, and increases the relative curvature radius between the tooth surfaces, it is the reason that the drive efficiency of NLWT possesses is rather high and the carrying capacity of NLWT possesses is very strong. Although POBLWT is the single touching line contact, it possesses the virtue such as the multi-tooth contact, therefore, its drive efficiency and its carrying capacity are greatly exceed CCWT. Simultaneity, PBLWT is more easy to realize the accurate machining which completely accords to its mash principle, and the computer entity modeling technique can be used to conduct a dummy design for it, these will create the conditions to the extend mobilization of NLWT.

482

Passing the analysis on forming principle and structure specialty of New Loop Worm, it can find that the tooth shape and the lead angle of loop worm are the key factors for conducting computer entity modeling, so we should put high recognition and labor on it, and the basic point of analyzing and disposing should expand on particular parameters. According to the transmission conditions, it is known: worm transmission power P=7.2KW, worm rotate speed n=1452r/m, transmission rate i=37, and the work and start of this gearing are very frequently, so, according to the handbook[2], we confirm the interrelated design parameters as follow: 1. worm material taking 40Cr, to adopt adjusting matter management, surface rigidity HB=250~300; worm wheel material choosing ZQSnlO-1, to adopt sand casting, precision taking 8 class. 2. according to Pc=P/(KlxK2xK3xK4)s£[P' ], checking the table to gain the various coefficients, then, we know: Pc=7.2/(lxl.06x0.8xl)=8.5kw, checking the table again, we can inform the center distance a= 150mm. 3. adopting worm thread number Z l = l , thus, according to the request of transmission rate, we know that the worm wheel tooth number 72=31. In order to accomplishing the three-dimension entity modeling of NLWT, we have used some aided design and aided modeling software, they are 3DS max 4.0 and AutoCAD 2000. Relatively speaking, the third dimension and texture sense and reality sense of the entities adopting 3DS max 4.0 software for modeling are very good, and the veracity and accuracy and cooperation of the entities adopting Autocad2000 software for modeling are very good, thus, we use 3DS max 4.0 and Autocad2000 in the fashion of interlude using, and then we bring the function of the software into play greatly. The entity modeling of NLWT may be divided into two parts, one is the modeling of loop worm, and the other is the modeling of loop worm wheel qualified with loop worm. In the modeling process of loop worm, it is the key step for us that the modeling of the loop worm axes section is or not nicety. Thus, we specially adopt Autocad2000 software to draw the loop worm axes section and to construct the loop worm screw lines, its particular process shows as follows: 1. Choosing the screw line function of the establishing panellil, and then protracting a screw line, its parameters shows as follows: height equals 85mm, ring number equals 4.58, radius equals 39.89 mm, and its screw direction is in clockwise. 2. Adjusting the configuration shape of the screw, and making it shown as Fig.3. Its particular parameters is that the length of the above line and the below line is equal to 89.6 mm, and the radius of the left arc and the right arc is equal to 121.47 mm0 3. Using the free changing function of cylinder in the modify paneffl, we set the number of reference points to equal 6 X 1 2 X 7 , and then we conduct the screw modeling in term of the thread frame shown as Fig.3, the tool that we used in this process is the geometric proportion zoom. 4. It needs to adopt the shape sampling means for modeling the loop worm tooth, this operation exerts serious influence to the modeling result. Because of the screw created in the above process is constructed in order to the graduation round track of loop worm tooth, so, we must do some works in advance of the time when we conduct the shape sampling to the tooth. One of these works is to put the center of

483

Fig.3 creating process sketch of loop worm axes section

Fig.4 sampling sketch map of loop worm screw

Fig.5 tooth shape sampling sketch map of loop worm

tooth cutaway view on the tooth graduation round, and makes it locate in the center axes of the tooth, and ensures anyone coordinate of the tooth to coincide with the center axes. Another one of these works is using the get shape function to conduct the tooth shape sampling in choosing the screw that has been modeled (shown as Fig.4) . 5. Here, we may find that the practical entity modeling effect gained in term of above steps differs to the expectant entity modeling effect, so, we must modify the shape sampling entity, the method is to turn the shape sampling entity, and we can gradually modify to the control line till getting the needed effect. The essential of operation is to make the adjusted degree of the two successive control points as the integral times of 45, its adjusting result shows as Fig.5: 6. After modeling of the loop worm tooth, we can model the else parts of loop worm according to the practical needs of the gearing. Owing to its modeling is very easy, so, the particular modeling steps do not any longer narrate. In 3DS max4.0 software, we use the Boolean operation to compose the various parts of loop worm into a whole, its result is shown as Fig.6. 7. Finally, we conduct the material editor to the loop worm that has been modeled in 3DS max4.0software, and then we can gain the modeling effect shown as Fig.7, which can be used for transforming of the later numerical control tool track and for testing of the entity modeling effect. Generally speaking, only when loop worm and worm wheel realize exact cooperation, the transmission may be ensured to go with a swing. Same as loop worm, the modeling of worm wheel possesses its particularity too, and the tooth of worm wheel shows itself as arc shape (see Fig.8), in this way, worm wheel can realize a exact mesh transmission with loop worm. The modeling process of worm wheel is introduced as follows:

HH Fig.6 loop worm entity modeling sketch map

• i wssm

Fig.7 loop worm entity modeling effect map

Fig.8 worm wheel tooth sketch map

1. We use AutoCAD 2000 software to accurately draw the radial section plane of the worm wheel tooth according to the design data. 2. We draw the sampling path of the worm wheel tooth. The method is drawing a round whose radius equals 22.45 ™ firstly, and we use a rectangle whose width equals 40 mm to cut a arc as sampling path secondly, and then we sampling to the tooth of worm wheel radial section plane once again, so, we can get a tooth of worm

484

Fig.9 the worm wheel tooth sampling sketch map

Fig. 10 the worm wheel tooth ring sampling sketch map

Fig. 11 the worm wheel entity whole sampling sketch map

wheel in this way. But it does not a final tooth, the reason is worm wheel having a thread angle (6° 51' 54" )„ Therefore, we can use the revolving knob HI to make the tooth rotating round its axes for 6° 51' 54" , its effect likes the white tooth shown in Fig.9: 3. We move the part plotting axes of the white tooth to the center position of Fig.9, its operating step is using the axes in the level panel H to attack the adjusting axes, and making it effective only on axes, then, we use the knob H to adjust the axes to appropriate position, at this time, we press the shift knob to let the tooth rotating 9.729° round its axes. 4. We copy the tooth of worm wheel which has been modeled for 36 shares firstly, and then, we make them arrange in term of the rule shown as Fig.9. The particular step is to choose the copy function in menu, copy number chooses 36 (because of oneself has a tooth, and worm wheel has 37 teeth in all) . Owing to each tooth of worm wheel is independent, we can put them together to copy for convenience. 5. Finishing above steps, the modeling of worm wheel tooth ring has fulfilled, relatively speaking, the later entity modeling and the material editor are quite simple, so, we do not expatiate any more. The modeling sketch maps of worm wheel tooth ring and worm wheel entity are shown as Fig. 10 and Fig. 11 respectively. 3 Epilogue NLWT has huge value of extending use in modern industry field, but its part structure is too complex and its machining technology is too vexed, all these restrict the popular application of NLWT in certain extent. Passing the computer graphics researching on NLWT, we complete the parameter design and entity modeling of NLWT, and lay a foundation for the following numerical control machining and the tool track translating, and explore some means and skills to use computer virtual reality technology conducting the entity modeling of complex transmission parts. REFERENCES 1. Fu Shaoze.((New Worm Transmission)).Shanxi Science and Technology Publishing Company. 1990 2. ((Mechanism Design Handbook)) .Chemistry Industry Publishing Company. 1992

A COMPUTER-AIDED OPTIMISATION APPROACH FOR THE DESIGN OF COOLING CHANNELS AND SELECTION OF PROCESS PARAMETERS IN PLASTIC INJECTION MOULDING L.Y. ZHAI, Y.C. LAM, K. TAI, AND S.C. FOK School of Mechanical and Production Engineering, Nanyang Technological Avenue, Singapore 639798 E-mail: [email protected]

University, 50 Nanyang

Cooling process in plastic injection moulding has a direct impact on both productivity and product quality. This paper integrates an evolutionary algorithm with CAE (Computer-Aided Engineering) technology to launch a computerised system that can guide the design of cooling channels and selection of process parameters. A genetic algorithm (GA) based optimisation system has been developed as an external routine to the suite of MOLDFLOW® programme-a commercial CAE software package for injection moulding simulation. The objective of the optimisation is to achieve uniform cavity surface temperature so as to ensure product quality.

1

Introduction

The cooling process in plastic injection moulding plays a crucial role in determining both the productivity and the quality of an injection moulded part. Productivity prefers fast cooling whereas product quality requires uniform temperature distribution. Hence, the overall requirements for cooling a part will always be a compromise between uniform cooling to assure part quality and fast cooling to minimise part cost. In cooling system design, design variables typically include the size and location of cooling channels, temperature and flow rate of the coolant, as well as the packing time, clamp opening time, and cooling time, etc. With many parameters involved, the determination of the optimum cooling system design is difficult. A tool integrating the cooling simulation and optimisation techniques into the design process will be helpful. There have been many optimisation techniques successfully used in various engineering applications, including conventional techniques such as the classic gradient-based optimisation methods [1] and the recent stochastic techniques such as the simulated annealing algorithm. However, most of these methods show various drawbacks when applied in cooling design optimisation [2]. In contrast, genetic algorithms (GAs) [3] show great advantages when handling cooling optimisation problem due to its inherence-parallel population-based search engine. GAs have been widely used in various engineering optimisations including injection moulding [2,4]. However, hitherto there is no literature on integrating GAs with mould cooling simulation software package to achieve the optimisation of cooling system design. 485

486

2

Optimisation system

Unlike most existing investigations, this study focuses on the development of the optimisation system instead of the algorithm for the calculation of heat transfer using finite (or boundary) element analysis [5]. This has the advantage of easy integration with the prevailing commercial software with well-developed cooling simulation technology. The framework for mould cooling optimisation is constructed as shown in Figure 1. As revealed by some researchers [6], genetic algorithms, in general, could locate the near global optimum (or a group of near global optimums) instead of the exact unique optimal solution, especially in problems with multiple variables and a large search space. As such, the well-known Hooke and Jeeves [7] search method is adopted for the refined search to zero in on the exact optimal solution following the GA search. Cstart J Initial generation of chromosomes MOLDFLOW® cooling analysis for each chromosome Constraints evaluation

t

Standard deviation of cavity surface temperature for each chromosome

Crossover, mutation

'

4

'

Selection of good chromosomes +

>

r

Refined search ( E n d ) Figure 1. Framework of the prototype system

3

A case study

Figure 2 shows the part and mould investigated. Design variables are shown in Table 1. For ease and simplicity, these variables take only integer values. Each chromosome is represented as an integer string and single site crossover is adopted. Mutation is implemented by replacing the selected gene with an integer randomly generated. This real-coded chromosome representation has the advantages of faster, more consistent from run to run, higher precision and intuitive [8].

487

I Figure 2. Plastic part and mould with 3 cooling channels

As mentioned earlier, the standard deviation of cavity surface temperature is chosen as the objective function. In finite element model, the cavity surface temperature is represented by element temperatures and MOLDFLOW® 3D cooling analysis divides the cavity surface into top and bottom surfaces. As such, the objective function can be mathematically defined as:

£A(7;(x)-r(x))2

/(x) = "Top

""" "-Bottom

(1)

i'=l

where x is the design variables vector; N is the total number of elements; Tt (i = 1,2, ..., 2A0 is the element temperature; ATop and ABottom are the areas of the top and bottom cavity surfaces respectively; and At is the area of the element. The average cavity surface temperature is defined as: _ i IN 2

n*)=———iv, oo "Top

T

"-Bottom

()

i=l

The constraints are limited to geometric constraints and value ranges of process parameters. Geometric constraints are the minimum distance for the spacing between the cooling channels, and the spacing between each channel and the mould boundary (including the boundary of cavity surface). If a chromosome represents a design not satisfying the constraints, it will be 'repaired' by replacing the gene(s) with the nearest allowable value(s). Value ranges of design variables are summarised in Table 1. Table 1. Design variables and their value ranges

Design Variable X Co-ordinate (mm) Y Co-ordinate (mm) Z Co-ordinate (mm) Cooling channel diameter (mm) Packing time (s) Clamp open time (s) Cooling time (s) Circuit inlet temperature (°C) Circuit flow rate (L/Min)

Value Range (integer) [-180,180] subject to constraint [-130,130] subject to constraint [-200,-20] subject to constraint [4,10] [5,15] [4,8] [10,20] [20,40]

[3,8]

488

Based on some preliminary trials, the population size of GA was set at 60 and crossover and mutation rates were set at 0.85 and 0.25 respectively. The genetic algorithm converged well within 110 generations. With 4 subsequent refined search iterations, the final optimal solution was found (Table 2) with the standard deviation of the cavity surface temperature at 4.33°C. Table 2. Optimal design of cooling channels and process parameters

Design variable Location of channel 1 (mm) Location of channel 2 (mm) Location of channel 3 (mm) Inlet temperatures (°C) Flow rates (L/Min) Channel diameters (mm) Clamp open time(COT), Cooling time(CT), and Packing time(PKT) (second)

Optimal design xi=-115,yi=-90,zi=-85 x2=-33,y2=-25,z2=-47 X3=55,y3=-29,z3=-90,X4=-55,z4=-200 T,=40,T2=39,T3=40 Ri=5,R2=7,R3=8 di=6,d2=10,d3=6 COT=8, CT=20, PKT=15

Figure 3 is the temperature distribution contour of the top and bottom cavity surfaces. The optimisation procedure was repeated 10 times with different initial populations and the optimal solutions found were identical, which verified that the optimal cooling system design obtained was reliable. OPTEMPEBATUR! 3O.i70

to

n.m '

50,410

:)RE[cleg.C)

to

mmy

J A

* '—

51.139

49.554 •£•

56.565 « i 1

m l

70.475

I

70.589

Figure 3. Temperature contours of the top and bottom cavity surfaces

4

Discussion and conclusion

Cooling system optimisation is basically a non-convex optimisation problem. Traditional optimisation methods are likely to be trapped in a local optimum. The approach proposed in this study is reliable in locating the global optimal solution. It successfully utilised the advantages of GA in optimisation and CAE for cooling simulation.

489

5

Acknowledgement

This project is supported financially by Moldflow Corporation and the Academic Research Fund, Ministry of Education, Singapore. The authors are grateful for the stimulating discussion with Mr. Peter Kennedy and Mr. David Astbury of Moldflow Corporation. References 1. Lam Y.C. and Seow L.W. Cavity balance for plastic injection moulding. Polymer Engineering and Science. 40(6)(2000): pp. 12731280. 2. Ye H and Wang K.K. Optimization of injection-molding process with genetic algorithms. In Proceedings of Annual Technical Conference, Society of Plastics Engineer-5(1999), pp.594-599. 3. Goldberg D.E. Genetic algorithms in search, optimisation and machine learning (Reading, Mass.: Addison-Wesley Pub. Co. 1989). 4. Kim S.J., Lee K. and Kim Y.I. Optimisation of injection moulding conditions using genetic algorithm. In Proceedings of SPIE - The International Society for Optical Engineering. Vol.2644(1996): pp. 173-180. 5. Park S.J. and Kwon T.H. Optimal cooling system design for the injection moulding process. Polymer Engineering and Science. 38(9) (1998):pp.l450-1462. 6. Young W.B. Gate location optimization in liquid composite molding using genetic algorithms. Journal of Composite Materials. 28(12)(1994):pp.l098-1113. 7. Hooke R. and Jeeves T.A. Direct search solution of numerical and statistical problems. Journal of Association for Computing Machinery. 8(1961): pp. 212-229. 8. Janikow C.Z. and Michalewicz Z. An experimental comparison of binary and floating point representations in genetic algorithms. In Proceedings of the 4' International Conference on Genetic Algorithms (1991) pp. 31-36.

WAVELETS-BASED MULTIRESOLUTION REPRESENTATION AND MANIPULATION OF CLOSED B-SPLINE CURVES GANG ZHAO, SHUHONG XU, WEISHI LI Institute of High Performance Computing, 1 Science Park Road, #01-01 The Singapore Science Park II, Singapore 117528 [zhaog, xush,

Capricorn,

liws]@ihpc.a-star.edu.sg

XINXIONG ZHU Beijing University of Aeronautics

& Astronautics, 37Xue Yuan Road, Hai Dian Beijing, P. R. C. 100083

District,

[email protected] Multiresolution curve representation, based on wavelets, provides more flexible methods for curve editing at different resolution levels, curve smoothing and curve data compressing. It requires no extra storage beyond that of the original control points. The closed B-Spline curve is a special type of B-Spline curves. A cubic closed curve with C2 continuity needs special processing at its boundaries when wavelets are applied to decompose or reconstruct it. This paper introduces, from the point of geometry view, the principles and methods of wavelets-based multiresolution representation of C2 cubic closed B-Spline curves. An extended method of multiresolution manipulation for closed BSpline curves is also presented.

1

Introduction

Multiresolution curve representation, based on wavelets, provides more flexible methods for curve editing at different resolution levels, curve smoothing and curve data compression [1]. The conventional construction of wavelets takes place on the whole real line, but applications in computer graphics require wavelets on a bounded interval. This problem can be solved by: (l)setting the data to be zero outside the interval; (2)making the given data periodic; (3)using reflection at the boundaries; (4)constructing special boundary elements [2]. The closed B-spline curve is a special type of B-spline curves. A ^-degree closed B-spline curve with Cr~ continuity has the different presentation with the unclosed curve. It needs special processing at its boundaries when wavelets are applied to decompose or reconstruct it. In this paper, we adopt repetition approach for the Multiresolution Representation Analysis (MRA) of C2 cubic closed B-Spline curves. An extended method of multiresolution editing for closed B-Spline curves is also presented. 2

Multiresolution representation of closed B-spline curves

2.1 B-spline scaling functions and wavelet functions of closed B-spline curves A fc-degree unclosed B-spline curve y(u) with 2L + k control points C/1 (i = 0,1,..., L 2 +k-l) is described as follows:

7(«) = 2 f c / % ( « ) , «e[0,l] i=0

490

(1)

491 where B^k(u) (i = 0,1,..., 2L+k-l) are the B-spline basis functions defined on knot vector [u0, «i, ..., u2L+2k ]. The functions B^k(u) ,...,B2\+k_lk(u)

form the basis for the

L

space V . The level L represents how many times the vector space, knot vector in this case, can be subdivided. To use the united formula as the unclosed B-spline curve, Shi [5] presented a simple method to process control points of the ^-degree closed B-spline curve with C*"1 continuity at its boundaries: C 2 V M _, + r + ,.=Ct,; = 0 , l , . . . , * - r (2) In general wavelet transform theory, we denote 6 as the scaling function, 0 as the wavelet function. Then with (1) and (2), the cubic closed uniform B-spline curve with C2 continuity at its boundaries can be described as (3), 2L-\

y(«)=£C ( L 0, L ( M ), u s [0,1], where 0,L («) = B& (II) + B^L

3 („)

(3)

(i = 0,1,2), 6f{u) = B&u), (i= 3,4,...,2L-1).

Wavelets offer an L level hierarchical basis for the space V'. According to [2] [3] [4], B-spline basis functions dL at level L can be transferred as two-part basis functions [OL~\0L~l ] at level L-l, there are the following relationships between them,

Of\u)=

^ H A ' W

(4)

k=2j-2 2j+6

0/~V>= X gk-ifiiW

(5)

k=lj-A

wi

m

0^{u)= IV 2 t Or'( M )+ X^_ 2i 0 t "(«)

(6)

where the sequence h, g, h and g are given in the appendix. Now the C2 cubic clc closed B-Spline curve can be represented with respect to the twopart basis functions [ O i_1

y(«) = 'Scf-'O," (II) + 2 XV"'0 " («) i=o

(7)

;=o

2.2 Wavelet on a bounded interval The construction of B-spline wavelets described by (4), (5) and (6) takes place on the whole real line, the index j goes from -~,..., +«>. For computer graphics, however, we are interested in problems confined to an interval. In general, there are five difference approaches for an MRA on a bounded interval: spatial windowing, reflection, repetition, multiple knots and <^-> B-spline control points: C(f Gram-Schmidt [4]. I I I I Because the C2 ...670 1 2 3 4 5 670 1 2 3 4 5 6 7 0 1 2 3 4 5 670 123... cubic closed B-Spline I I interval I I curve is periodic, and in Wavelet control points: Z)0L />£_, order to get an efficient I I I I and elegant computation ...670 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3... scheme, we adopt the I I interval | | repetition method for its . j _ Repetition method for closed B-spline curves Fig

492 MRA on an interval. The extension pattern of B-spline and wavelets control points is shown in Fig. 1. 2.3 Mutiresolution representation of closed B-spline curves With (3), (6) and (7), the control points needed to express the curve in the two-part basis [ 6Li , 0 L'x ] can be found by using the following formula 2/+5_

XV 2 A L

cr=

(8)

t=2i-5 2i+3

(9)

D k=2i-\

The process of splitting the control points C[ into a low-resolution version C,L~' and detail £>/•"' is called decomposition. Alternatively, with (3), (4), (5) and (7), the control points with respect to the original B-spline basis Ol can be found with

ckL = X**

(10)

+ Is k-2i i u

Recovering Cf from C,L_1 and D,t_1 is called reconstruction. If the decomposition procedure of C\ is applied recursively C,i_1, the original curve y(u) can be expressed as a hierarchy of lower-resolution y'(u) and details /3'(«).i.e.

r(«) = y"(«)+/J"(«) =rL-2M+pL-2M+pL-\u)=-- = y0M+p0(u)+--+iiL-l(u) wherey0(u),p°(u) , ..., /JL"'(«)is called multiresolution representation of y(«), y°(«), / ' ( M ) , . . . , / 4 - 1 ^ ) is the multiresolution approximation of y(«) at different resolution levels. The wavelet basis with the corresponding multiresolution representation y°(«),)30(j<) , ...,pL~l{u) can be described as follows \p\ Fig.2 described the multiresolution approximation of the C2 cubic closed uniform B-spline curve y(u) witn il control points. 3

0'], (0
(a)

(b)

(11)

(c)

(d)

Fig.2. Multiresolution representstion of closed B-spline: (a) Original curve with 32 control points; (b),(c),(d) Low-resolution curve at level 4,3,2.

Multiresolution manipulation of closed B-spline curves

The wavelet representation allows the user to specify whether local detail changes are desired, or whether broader changes to the entire sweep of the object are intended. Fig.3 illustrates the effects of the original curve at the full

(a) (b) (c) Fig.3 Editing control points at different level

493

resolution when modify the B-spline control points at the different resolution level. Fig.4 shows an example for changing the overall form of the curve while preserving its details.

(a) (b) (c) (d) Fig.4 Changing the overall form of the curve while preserving its details Fig.5 shows an example for changing the curve's character without affecting its overall sweep.

(a) (b) (c) Fig.5 Changing the curve's character without affecting its overall sweep 4

Conclusions

This paper describes a multiresolution representation for C cubic closed B-spline curves, and extends the multiresolution manipulation method to it. Although the method is illustrated with the special case of C2 cubic closed B-Spline curves, it is suitable for the case offc-degreeclosed B-spline curves with C^"1 continuity. References 1 2

3 4

5

Finkelstein, A., Salesin, D., H., Multiresolution Curves, Computer Graphics (Proceedings of SIGGRAPH' 94), Vol.28, 1994, pp. 261-268 Chui, C. K. and Quak, E., Wavelets on a bounded interval, in "Numerical Methods of Approximation Theory, Vol.9" (D. Braess and L. L. Schumaker, Editors), pp.5375, Birkhauser Verlag, Basel, 1992 Cohen, A., Daubechies, I., and Vial, P., Wavelets on the Interval and Fast Wavelets Transforms, Applied and Computational Harmonic Analysis, Vol.1, 1993, pp.54-81 Gortler, S. J. and Cohen, M. F., Hierarchical and Variational Geometric Modeling with Wavelets, in Proceedings of the 1995 Symposium on Interactive 3D Graphics , ACM, New York, May 1995, pp.35-42 Shi, F.Z., "CAGD&NURBS," High Education Press, China, 2001

Appendix

A PIPING MODELING AND CALCULATION SYSTEM GAO HUIXING Senior Lecturer Ngee Ann Polytechnic 535 dementi Road Singapore 599489 Email: [email protected]

Piping design is an important but tedious job in engineering industries. Computer-aided design is an effective way to raise the productivity and quality. The author has developed a PC-AutoCAD based graphic piping CAD/CAM system called GPIPE and it has been applied in Singapore industries for many years. The functions of this software cover 3D modeling of piping and equipment, clash checking, various pipe production parameter calculations and documentation. The detailed functions of this system and how these functions were realized will be described in this paper.

Keywords 3D Model A 3D model is a description of a three dimensional object created by computer software that can be studied as though it really existed in space. GPIPE A graphic piping CAD/CAM (Computer-Aided Design/Computer-Aided Manufacture) system developed by Ngee Ann Polytechnic. AutoCAD It is one of the most widely used CAD and drafting software products on the market. AutoDesk Inc, USA, develops it. ARX The AutoCAD Runtime Extension is a compiled-language programming environment for developing AutoCAD applications. ARX includes C++ libraries for developer to create AutoCAD external applications that operate just like native AutoCAD commands.

1. Introduction Piping systems are an important integral part of any engineering, such as marine, oil processing and civil. All pipes must go through three stages: design, fabrication and installation. The main tasks of design stage are: arrangement, calculation and drafting. It is a very complicated work to reasonably arrange all pipes into a limited space without any interference with structural members, equipment and other pipes. After arrangement, the layout drawings of piping systems are the basis for further production design in which calculation of fabrication parameters and drafting various kinds of drawings for fabrication and installation will be performed. In recent 10 years, AutoCAD is widely used in industries. But most engineers are still doing the traditional way that uses one or several 2D views to represent 3D objects. Because the views are created independently, the possibility of error and ambiguity is inevitable. To improve your design and therefore to get information for calculation, we should create true 3D model instead of 2D drawings. Based on this requirement, 3-dimensional

494

495

AutoCAD-based graphic piping CAD/CAM system named GPIPE has been developed and applied in Singapore industries. 2. Functions of GPIPE GPIPE is an integrated graphic piping CAD/CAM software system built up to meet the needs of piping design in various engineering. The system covers the entire processes from pipe layout to fabrication and installation. 2.1 3D Modeling of Equipment and Piping In various engineering design area, there are a lot of interactions between the different design disciplines with these activities happening concurrently. It is essential to build a product model using 3D CAD system for simulation, virtual reality and for providing production information. By using the powerful AutoCAD graphic capability, GPIPE system can do a lot of sophisticated interactive operations to create 3D model. - Machinery, equipment, major structural members A parametric definition is adapted. There is a standard equipment library defined in GPIPE. Users need only to key in some key values. A 3D model of such equipment is created. Users can also use AutoCAD commands to construct 3D object of non-standard equipment, GPIPE can add pipe inlet and outlet on it and convert it to be a GPIPE 3D model of equipment. - Pipe and components User can select any plan view or isometric view to make pipe and component 3D model. There are a lot of interactive operations to facilitate the pipe modeling, including add, delete, move, query, etc. 2.2 Calculations The information of a 3D product model is stored in database. GPIPE can pick the information for various calculations. - Interference checking This is to check the interference among pipes and pipe with equipment. The pipe crashed with other pipe or equipment will be displayed in white color, so that users can modify it. - Pipe piece calculation This is to calculate each pipe piece and produce a pipe piece drawing for fabrication and installation. The calculations include developed length, feeding length, bending angle and rotation angle of each segment.

496

-BOM The bill of material of various pipe systems is automatically generated in GPIPE. It is provided for costing, procurement and production preparation. - Pipe nesting It indicates how to reasonably distribute a raw pipe for making several pieces in order to save pipe material. 3. DEVELOPMENT CONSIDERATIONS When an application CAD/CAM system is planned to be developed, some factors should be considered. The following factors have been considered in developing GPIPE system: 3.1 Environment Hardware and supporting software are the most important factors when application software is developed. Considering the fact that AutoCAD is widely used in almost all Singapore companies, AutoCAD should be the basic graphic tool for piping design. There is a powerful C language interface called AutoCAD Runtime Extension (ARX) [1] in AutoCAD. C++ language combined with ARX is the programming language of GPIPE. 3.2 Sophisticated Data Structure GPIPE has its own sophisticated data structure. There are two main databases in the system. One is the piping part library, in which various pipe fittings and valves are stored. Another is a pipe route database, in which pipe specifications and route data are stored. All pipes are organized as many groups. The pipes within the same fitting zone, in the same system, with the same material and water testing pressure can be merged in a group called a batch. User can take a batch or several batches for process in a design session. For mathematical describing and memorizing the position of a pipe, various pipe nodes should be defined and stored. 3.3 User-friendly Interface As application software, user interface is a very important factor. All operations in GPIPE are menu-driven. There is a main menu which controls the running of whole system. In the AutoCAD there is a menu file called acad.mnu. A sub-tree menu of GPIPE has been inserted into this file, which forms a customized AutoCAD menu. User can pick any AutoCAD menu items as usual for normal graphic operation. If users click the GPIPE item, a GPIPE pull-down menu and/or GPIPE menu tree can be displayed.

497

After clicking a pipe menu item, an ARX application program starts. Some AutoCAD-like dialogue boxes will be displayed for selecting operation and/or inputting data. The dialogue boxes can be grouped as Radio Button, Edit Box, Image Button and List Box. To define these dialogue boxes, AutoCAD Dialogue Control Language (DCL) [2] is used. 3.4 Graphic Process Although GPIPE is running in AutoCAD, most of normal AutoCAD drawing commands such as LINE, ARC cannot be used directly for pipe modeling. It is because these commands can only generate a geometry object. When users click this geometry, AutoCAD can only tell you what and where it is. A special data structure should be adapted for creating the relationship between a geometry and pipe data. Each pipe piece is drawn as an AutoCAD block which has a unique name. A pipe data file is created when a pipe drawing is drawn. All pipe information including specification, position and the block name are written in this file. During the interactive operation, when a pipe (actually a block) is clicked, ARX will return the block name. Thus a relationship between the pipe and the data file is established. 4. CONCLUSION Piping systems are an important integral part of any engineering. Their design, construction and operation reflect on the quality of work done by a company. To improve the efficiency and productivity of pipe design, fabrication and installation, a graphic piping CAD/CAM system GPIPE has been developed. This system has been applied in many Singapore companies. It has been clearly verified through practices that the GPIPE has offered an agreeable service in the increasing of design efficiency, improvement of quality, reduction of the work in piping design, fabrication and installation, therefore promotion of production in the related industry. Efforts should be put in the future to keep up with the latest development of computer technology, and to expand the application of GPIPE to more industries. REFERENCES 1. Charles Mc Auley, Programming AutoCAD 2000 Using ObjectARX, Thomson Learning, USA, 2000 2. Sham Tickoo, Customizing AutoCAD R14, Chapter 14, Autodesk Press, USA, 1998

OPTIMIZATION OF INJECTION MOLDED PART BASED ON THE CAE SIMULATION LIYAN Institute of High Performance Computing,! Science Park Road, Singapore 117528 E-mail: yanli@ ihpc.a-star.edu.sg

#01-01,

CHUNRONG PAN Mechanical & Electrical Engineering Department, Shantou Shantou City, Guangdong Province, PR. China 515063 E-mail: [email protected]

University

This paper presents a systematic approach using injection molding simulation analysis to identify the root caused of an injection molding problem and optimise the plastic product. Advanced commercial Computer-Aided Engineering (CAE) molding simulation software (MOLDFLOW), which can be used to analyse injection molding simulation, cooling simulation and warpage simulation, is a very powerful tool for engineer to evaluate the plastic part at an early design stage. In this paper, some factors which affect the quality of injection molded part will be described, and an optimised methodology has been developed to improve injection molded part quality based on CAE simulation.

1.

Introduction

Injection molding is the most widely used plastic process to make large quantities of parts with geometry versatility and cost effectiveness. However, many parts manufactured by injection molding suffer from a wide range of defects [1]. The defects may include warpage, black streak, blister, blush, bubbles, burn mark, flash, poor weld lines, short shots, etc. In the past, injection molding problems were addressed through a conventional trial-and-error process which used different materials, part designs, and modifications and through a Design of Experiments (DOE) which was run with different molding process parameters. This approach is very expensive and time-consuming, and the end solution may compromise part quality [2]. It is very hard to use this approach to obtain optimum part quality. With advanced CAE technology, such as injection molding simulation, cooling simulation and warpage analysis, it is now possible to evaluate part design, mold design, and the injection molding machine process parameters long before a mold is manufactured [3]. By interpreting the injection simulation results, people can gain a much better understanding of the inside causes of injection part defeat and get an optimised one. Based on different function of injection molded part, the main criterion for their quality is different. Some parts, such as coverings and toys, only need the good-looking appearance. Even a wide tolerance can be acceptable. Other industrial parts, such as connectors, which need to assembly with other parts exactly, must hold a very tight tolerance. This paper presents a systematic approach using injection molding simulation analysis to identify the root caused of an injection molding problem and optimise the plastic product.

498

499

2.

Approach

The basic model makes the following assumptions: Figure 1 shows the part geometry for Connector. Figure 2 is its mid-surface model.

• . • • \ ^

•••'.«

•XJ Fig 1: Connector Part

Fig 2: Mid-surface Model

>

'

j f

: • %.ir

Fig 3: Mesh Model

Material: the material used for the part was LCP Zenite 6130L Black Mold: The part was molded in a four-cavity mold with a sub-marine gate. Normal processing conditions for part: Injection time = 0.2s; Hold time = 1.00s; Cool time = 4.00s; Melt Temp = 330* C; Mold Temp = 60* C. Simulation Software: MoldFlow Insight 2.0 Simulation Models: A mid-surface finite element model generated by MoldFlow (see Figure 3). Then the cooling channels and runner system were built in so as to run a complete shrinkage and warpage analysis.

The key quality factor of connector is to hold very tight tolerances. To meet the functional requirement, the overall warpage must control in 0.5mm along in length direction. After initial simulation under the above parameter, its result showed the distortion is approximate 1.42mm (see Fig 4), similar to the one observed on the real part. Generally, the warpage is represented by nodal displacements from the cavity dimension upon ejection in the use of warpage simulation. It is formally represented as F(Xi)= aX+bY+cZ (1) Where Xi=[Xl,....,Xn]; n is the number of design variables to be considered; a, b, and c is weighting factor for X, Y, and Z, respectively; and X= an additive function of a maximum displacement, Y= an average of top 10 percentile displacements, Z= an overall average displacement Warpage comes from differential shrinkage, which is a function of differential pressure, differential temperature, differential residual stress, molecular and plastic orientations in fill and post-filling stage, as well as inherent and geometric stiffness of the part. These parameters do not act independently, they affect each other. Changing any parameter almost always causes two opposite effects on final results. For example, increasing packing pressure will decrease shrinkage and increase differential pressure in the part. The former could decrease warp which came from geometric differential stiffness, and the latter may increase warp by resulted higher density difference. So the final effect of increasing packing pressure on warpage depends on which of the two opposite effects is dominant. Because of the complexity of the causes of warpage, there is no a rule of thumb which can be simply followed in part design or mold design to minimize warpage. With CAE simulation tools, we can find out to which parameters warpage is more sensitive. Quite often, based on restricted computer time and analysis time, you will have a situation which only allows you to run a certain number of iterations to get the final answer. You have to find out as much information as possible from each iteration to determine the causes of warpage, then change the most sensitive parameters to reduce it.

500

As plastic material and some process parameters are fixed, the simulation focus on changing gate location, cooling arrangement, different pressure and wall thickness. From the simulation result, we try to get the main factor affected warpage and optimize it to get the minimized wapage part. 3.

Simulation and Results

Several different gate locations were evaluated. Different gatings caused quite different plastic orientations in the part. However along the length direction, the warpage was not affected by the changes of plastic orientation. Several cooling analysis were done by varying the coolant temperatures on both sides of the mold. Basically, mold temperature effects on warpage have two aspects: the temperature difference across the part thickness and at different locations along the part. The temperature difference through the thickness will cause differential residual stress from each surface to the mid-plane. This forms a moment causing the part to bend. Temperature differences at different locations cause uneven area shrinkage on different portions of the part, which will also contribute to the warpage of the part. The simulation results in this case showed that no matter what the cooling conditions were (within the recommended processing range), the trend of the warpage was the same and the magnitude was almost not affected. The differential magnitude of warpage between isothermal mold condition and with cooling analysis, in which both temperatures effects had been considered, was less that 7% of total warpage. This prediction indicated that in this case, the warpage was not caused by mold cooling.

•i" .

Fig 4: Before Simulation

Fig 5: After Simulation

What left was the non-uniform shrinkage, caused mainly by pressure difference. Several variations were simulated, such as increasing packing pressure, longer cooling time, changing wall thickness, etc. The most effective way, from simulation, to reduce the warpage was to change the part thickness in certain areas. The warpage was decreased to 0.414mm (see Fig 5), within functional requirement range. The methodology of optimizing part wall thickness was introduced as following. 4.

Optimization Methodology

To optimize part wall thicknesses, two characteristics may be considered. One is the warpage is used as a measure of the overall quality of the part, hence constituting an objective function value. Therefore, the objective function of this problem is numeric instead of analytic. The other one is a considerable amount of computing time is required to evaluate the objective function.

501 The optimization technique is primarily classified into two search methods: the direct search method, and the gradient-based method [4]. The former uses only function values to reach the minimum; the latter uses the gradients of the objective and constraint functions. The gradient-based method is generally considered superior to the direct search method in its efficiency and effectiveness for most of the functional optimization problems. However, since the objective function of the proposed problem is not in a functional form, difference approximation must be employed to obtain gradient information. The gradient-based methods employ the following iteration procedure:

r(*+1) =TW +akdik)

(2)

Where CCk is the line search parameter, and a

is the search direction for the design

variables. To calculate the search direction, the gradients of the objective function are approximated using a forward finite difference method as follows:

3/(0,

1T'='°=

/(( 0 +At,)-/(t 0 )

£"

,3>

Where i = 1 to n number of design variables. In the above equation, to obtain a good estimate of the derivative, it is important to properly choose the finite difference step size At for each design variable, which depends highly on f(t) and t0. Also, to determine the search parameter CCk from equation (2) requires implementation of a line-search algorithm. Design variables tt, i = 1 to n, number of different wall thickness. Accordingly, for this type of injection molding problem, the gradient-based method will require a numerical experimentation, with a large number of function evaluations, which results in increased computing time. After optimising, the warpage was reduced from 1.42mm to 0.414mm, less than 0.5mm which is connector functional requirement. The optimized part has a much uniform frozen layer thickness and melt front advancement. And the bulk temperature distribution in the optimized part is more uniform. Therefore the final result leads to a consequent lower warpage value. 5.

Summary

In this paper we have presented a method on using injection molding simulation analysis to identify the root caused of an injection molding and an optimised methodology to improve injection molded part quality. 6.

References

1. Edward A. Muccio, Plastic Part Technology, ASM International, Ohio, 1991. 2. John Moalli, Plastics Failure Analysis and Prevention, Plastics Design Library, NY, 2001. 3. B.H. Lee, Optimization of Part Wall Thicknesses to Reduce Warpage in Injection Molded Part Based on the Modified Complex Method, ANTEC, p692-698, 1996 4. Robert A. Malloy, Plastic Part Design for Injection Molding, Carl Hanser Verlag, NY, 1994.

Evaluating Plane-strain Forging of M a g n e s i u m Alloy A Z 3 1 Using Finite Element Analysis S.C.V. Lim, M.S. Yong and C M . Choy Singapore Institute of Manufacturing Technology, 71 Nanyang Drive, Singapore 638075

Finite element (FE) simulation was used to evaluate plane-strain forging of magnesium alloy AZ31 at temperatures of 150°C, 250°C and 300°C. A commercially available finite element package, ANSYS/LSDYNA, was used in the finite element analysis. From the simulation, forging load and strain distributions were computed and determined. Physical forging experiment was carried out on a rectangular billet of dimension, 100mm in length by 20mm in width by 5mm in thickness. The possibility of numerical prediction was evaluated by comparing the predicted forging loads with the empirical results. The deformation profile generated from the FE stress-strain distributions were compared to the macrographs of actual forged parts.

1.

Introduction

There has been increasing use of magnesium alloy for lightweight structural and functional parts in the automotive and electronic industries. This is largely due to the fact that magnesium alloys have the lowest density among structural metals, high specific strengths, excellent machinability and good damping capacity and electromagnetic interference shielding. Most parts made of Mg alloys are produced by diecasting and thixoforming but the products tend to have inferior mechanical properties compared to forged parts [1]. It is desirable for Mg alloys under solidus condition to be formed using stamping, extrusion or forging processes but compared among the different forming methods, forging has a number of advantages particularly in the area of strengthening and increasing the reliability of the component. Many influencing parameters such as temperature, strain-rate, friction and pre-form shape can affect the forging process [2] and the effects of such parameters also vary with different materials used. There are few studies done on the effect of temperature and strain rate on the forging of axisymmetrical parts of Mg alloys [3-4] but very little or none on the plane-strain forging of Mg alloys. Some work has also shown that the workability of magnesium alloy can be effectively improved by increasing the working temperature above 300°C [5] but little work has been done to evaluate the formability of Mg alloy below 300°C. Therefore in this study, we will investigate the possibility of plane-strain forging a Mg alloy, AZ31 billet, of dimensions 20 mm in width, 5 mm in thickness and 100mm in length through a backward extrusion process. The simulation study uses ANSYS/LSDYNA finite element analysis software to evaluate the effect of different forging temperature (within the warm forging temperature range of 150°C to 300°C). Empirical experiments were conducted and the results were compared with the FEM predictions. 2.

Methodology

2.1 FE modeling The 2D plane-strain FE model of length 1 mm (in the z axis direction) consists of a forming punch, billet material and die insert (Figure 1). The half 2D model was found to be sufficient for the simulation in our previous study [6] and was used to cut down computation time. The punch and die were simulated as rigid bodies (MAT_RIGID) while the AZ31 billet material subjected to the backward extrusion was modeled as a strain-hardening plastic body (MAT_POWER_LAW_PLASTCITY). The material constants of AZ31 used in the FE program, a = Ke", for the respective forming temperatures selected are shown in Table 1. An auto-remeshing logarithm (CONTROL_ADAPTIVE) was performed to avoid substantial damage in the deformed material and to provide a more accurate simulation. As thin-wall magnesium parts are desirable, the simulation was carried out for the actual plane strain forging of a billet with dimension 100 mm in length by 20 mm in width from a thickness of 5mm to a thickness of 1 mm.

502

503 A DEC/ALPHA 600 series 64-bit workstation with an OSFl version 4.0 operating system was used to perform the finite element modeling. ANSYS/LS-DYNA version 5.6 was used for processing of the model. Table 1: Material Constants of AZ31

Temperature (°C)

150 250 300

0 = Ke" K 178 85 59

N 0.1 0.024 0.021

2.2 Experiments and simulation This study was divided into three stages. In stage I, different coefficient of friction values (0.1,0.12, 0.15, 0.2 and 0.3) at a fixed working temperature of 250°C were used as process parameters in FE simulations. In stage II, actual plane-strain backward extrusion experiments were carried out on Mg alloy AZ31B billets of dimension 100 mm by length, 20 mm by width and 5mm thickness. The billets were coated with a fine layer of graphite to act as lubricant prior to forging. The forging was done using a 50 tons hydraulic press. The initial forging was carried out at temperature of 250°C with forging loads of 30, 40, 50, 53 and 56 tons. Forging load is limited to 56 tons as that is the maximum machine capacity. The thickness of the billet was measured after each forging. The results were plotted and used to match the FE simulation results obtained in phase I to approximate the coefficient of friction of the process. In stage III, with the approximated |i value, FE simulation was carried out for the plane-strain forging at 150°C and 300°C. The forging load required was to be evaluated from the simulations and stress and strain distribution were studied from the FEM analysis for the three different working temperatures used. Forging using temperature of 300°C with forging loads of 20, 30, 35, 37, 40, 45 and 50 tons were conducted. 2.3 Microstructural analysis The actual parts forged at 250°C and 300°C were mounted, polished and etched to study the grain morphology. The metallurgical samples were examined using an inverter microscope at a xlOO magnification. 3.

Results and Discussions

From the FEM simulations, forging load vs punch displacement graphs were tabulated and shown in figure 2. From the graphs, it can be observed that there is a substantial increase in load over the initial punch displacement. This is due to force needed to overcome friction and more so to reach the yield strength of material. After the yield point is reached, smaller increase in load is sufficient for the material to undergo plastic deformation. The build-up in load at the final stage of deformation (at about 3 mm to 4mm region) can be attributed to the hydrostatic pressure build-up which is caused by greater difficulty of material flow at the final stages. The forging load was also observed to increase with an increase in coefficient of friction value since more load will be required to overcome higher frictional force.

504 Table 2: Punch displacement obtained with different forging load.

Forging Load (Tons) Punch Displacement (mm) Forging Load (Tons) Punch Displacement (mm)

30 0.1 20 0.08

Forging Temperature of 250°C 50 40 0.25 0.73 Forging Temperature of 300°C 30 35 37 0.5 1.0 0.25

53 1M 40 2.25

56 3.13 45 3.65

50 4.0

The punch displacement with respect to the different forging load using forging temperature of 250°C and 300°C are shown in Table 2 while the actual parts formed are shown in Figure 3. The experimental results of forging load vs punch displacement for the working temperature of 250°C was compared with that predicted by the simulations (see Figure 2). It was observed that the experimental result matched best with the simulation using p. value of 0.12. This value is indicative of the value of the coefficient of friction for the forging process carried out in this study. Experimental test would be done to verify the value obtained in subsequent studies. Using p. = 0.12, simulations of forging at 150°C and 300°C were carried out and the forging load vs punch displacement are shown in Figure 4 along with the empirical results for 250°C and 300°C. Actual experiment for the forging at 150°C was not conducted as the forging load evaluated from the simulation required for substantial punch displacement ( - 1 0 0 tons, see Figure 4) is higher than that of the capacity of the hydraulic press. It can be observed from the graph that the experimental results for forging at 300°C is in good agreement to that predicted by the FEM simulation. The load capacity of the hydraulic press used was sufficient in forging the original billet from 5 mm to 1 mm in thickness using a temperature of 300°C but not so for the case using 250°C. A load of approximately 70 tons is predicted by the simulation for the original billet to be forged to 1 mm thickness using a temperature of 250°C. From the graph, a trend can be observed where the forging load increases when the working temperature decreases. The increase of load though from 300°C to 250°C is quite marginal when compared to that of the situation when working temperature decrease from 250°C to 150°C. This is expected, as the strength and strain-hardening coefficient for working temperature of 150 is much higher than those of 250°C and 300°C (see Table 1). These values coincide with the theory that magnesium having a HCP structure will have another slip system when formed at temperatures 225 and above [7]. The deformation profiles generated by the FEM were matched with the actual forged part and found to be similar and generally accurate (see Figure 5). This indicates that the model used can adequately predict the deformation profile of forged part. Through the microstructural analysis, small refined grains were observed generally throughout the part formed with some areas having coarse grains (see Figure 6). Such small refined grains can be attributed to dynamic recrystallization occurring and this can be substantiated with studies carried out by Mwembela et al. [5]. Having small refined grains is desirable as the mechanical properties of materials having such microstructure are generally better in terms of strength and ductility. It was observed that the small refined grains were found in areas of high deformation and flow stress. Further studies are being done to correlate and investigate the effects of stress and strain on the development of microstructure in the deformed material. Stress and strain distributions for the different working temperatures were analyzed (see Figure 7). It was observed that the stress and strain distribution patterns are similar for the different forging temperature which implies that temperature has little effect on the stress and strain distribution pattern. The maximum equivalent stress (Von Mises), o"y stress and pressure predicted by the simulation for the three different forging temperatures are shown in Table 3. Both stresses and pressure values have similar trends, where the values increase with a decrease in forging temperature used. From the results, it is important to note the predicted maximum 0 y and pressure values for the forging at 150"C as the values are higher than normal worked hardened steel used for die manufacturing. The strength of normal worked hardened steels is approximately 1300 MPa. The maximum o"y and pressure values for forging at 250"C and 300°C are below 1300 MPa which indicates that the tooling could be operated safely.

505 Table 3: FEM prediction of maximum Von Mises stress, a y stress and pressure values for the different forging temperature Temperature (°C) 150 250 300

Von Mises Stress (MPa) 206 89 61

a, (MPa) 2225 1076 716

Pressure (MPa) 2110 1031 680

In the actual experiments carried out, the edges of some forged parts were found to have sheared off (see Figure 8). This can be attributed to occurrence of material dead zone (no material flow) as indicated by simulation using nodal vector displacement and the plastic strain distribution which reveals the shear zone Y (see Figure 9b and 9c). Comparing the nodal vector displacement for different u. values used in simulating the forging at 250°C (see figure 10), it was observed that the area of material dead zone decreases with a decrease in |l values. This phenomenal can be attributed to less friction between the die wall and billet as well as between punch and billet which leads to more homogeneous and greater ease of material flow. Thus, decreasing the friction coefficient may not reduce the forging load as significantly as compared to increasing temperature (see Figure 3 and Figure 4) but it can help in reducing area of material dead zone especially so for the present geometry used in this study. 4.

Conclusions

The following are conclusions drawn from evaluating plane-strain forging of magnesium alloy, AZ31 using finite element analysis: 1. 2. 3. 4. 5. 6. 7.

By matching the actual experimental results with simulations using different u. values, we obtain the coefficient of friction value for the process to be 0.12. The actual forging load and deformed shape are in good agreement with the load and deformation profile predicted by simulation. Actual parts of 5mm thickness could be formed to a thickness of 1mm using a forging temperature of 300°C and a 50 tons load. Small refined grains were found in the grain morphology of the forged parts. Material dead zone and material flow could be identified from the FEM analysis. Decreasing the friction coefficient does not reduce the forming resistance as significantly as compared to increasing the forging temperature. Reducing friction coefficient is significant for the reduction of material dead zone.

References: 1.

2. 3.

4. 5. 6. 7.

H. Hoffmann, A Toussaint, Strategies and future developments in the manufacturing process of lightweight car bodies, in: Proceedings of the 6lh international Conference on Technology of Plasticity, Nuremberg, Germany, 1999, pp. 1129-1140. Y.H. Kim, T.K. Ryou, H.J. Choi, B.B. Hwang, Journal of Materials Processing Technology, 123, (2002), pp. 270-276. Yasumasa Chino, Mamoru Mabuchi, koji Shimojima, Yasuo Yamada, Cui'e Wen, Kenji Miwa, Mamoru Nakamura, Tadashi Asahina, Kenji Higashi and Tatsuhiko Aizawa, Materials Transactions, 42, (3) 2001, pp. 4 1 4 - 4 1 7 . N. Ogawa, M. Shiomi and K. Osakada, International Journal of Machine Tools and Manufacture, 42, 2002, pp. 607 - 614. A. Mwemebela, E.B. Konopleva and H.J. McQueen, Scripta Materialia, 37, (11), 1997, pp. 1789-1795. S.C.V. Lim, M.S. Yong and C M . Choy, in Proceedings of the 4th ASEAN ANSYS User Conference 2002 Singapore, in-press. E.F. Emley, "Principles of Magnesium Technology", Oxford, New York, Pergamon Press 1966, pp. 483-488.

COMPARATIVE STRUCTURAL EVALUATION OF PROTECTIVE HELMETS USING THE FINITE ELEMENT METHOD A. SUBIC AND M. TAKLA Department of Mechanical & Manufacturing Engineering, RMIT PO Box 71 Bundoora, Vic 3083, Australia

University

C. MITROVIC Faculty of Mechanical Engineering,

University of Belgrade,

Yugoslavia

Every protective helmet on the market must first undergo rigorous testing procedures that are typically very time consuming and costly. For a protective helmet to be sold on the market, the design must be tested according to appropriate international and national Standards, which require that the helmet can withstand a certain level of energy absorption. Rigorous testing is also required during the design and development process if the helmet is to meet such standards when manufactured. Clearly, it is of paramount importance from the design point of view to develop and implement equivalent testing and analysis procedures within a virtual design environment prior to prototyping and manufacturing in order to reduce the time and cost associated with the development of new and improved designs. This paper presents computational design approaches and results obtained through modelling and analysis of energy absorption and penetration effects of protective helmets during impact using the Finite Element Method (FEM). The models developed encompass the nonlinear behaviour of helmet designs involving material, geometrical and contact nonlinearity using the Arc-Length method in conjunction with the Newton-Raphson method. A relatively new Arc-Length method was used rather than the more traditional displacement control method which cannot be applied successfully to structures showing snap-back effect. Analysis is done for both helmet and head-form. Computational results show close correlation with experimental tests. The developed methodology allows for more effective design optimisation of protective helmets compared to traditional approaches currently applied in industry.

1

Introduction

One of the main safety problems in road transport is the head protection of motorcycle riders. Even in most Western countries where helmets are compulsory by law, head injuries are a leading cause of fatalities. This problem has gained increased attention world wide, hence as a result of this concern a wide range of improved helmet designs have emerged in recent years. For a protective helmet to be sold in Australia, the design must be tested according to the Australian Standards (e.g. AS 1698:1988; AS/NZS 2512:1998; 2512.3.1:1999), which require that the helmet be subjected to three different types of energy absorption tests (using different shaped anvils) and a penetration test. The standard test used to determine helmet's structural integrity is known as the Impact Energy Attenuation test, or more commonly known as the drop test. In this particular test the helmet is secured to the standard headform and the helmet is dropped in guided free-fall onto a flat steel anvil. The acceleration imparted to the assembly is measured. Tests required for compliance in case of frontal impact are even more complex [1]. The time and cost involved in such tests is considerable. Clearly, it is of paramount importance from the design point of view to develop and implement equivalent testing and analysis procedures within a virtual design environment prior to prototyping and manufacturing in order to reduce the time and cost associated with the development of new and improved designs. This paper presents computational design approaches and results obtained through modelling and analysis of energy absorption and penetration

506

507

effects of protective helmets during impact using the Finite Element Method (FEM). A relatively new Arc-Length method has been used for this purpose in conjunction with the Newton-Raphson method [1], rather than the more traditional displacement control method. A case study involving a comparative analysis of two different helmet designs for frontal impact compliance is presented. 2

Modeling Approach

The helmet model considered here is made from multilayered laminar composite materials and takes into account fiber orientation, possible impact directions and interlaminar-normalized value of dynamic strength. Finite elements of the thin laminar shell type are used in helmet discretisation. The nonlinear finite element method is applied, taking into consideration particular nonlinearities in geometry. The simulations presented here involve dynamic tests whereby accurate identification of the relations force - time, displacement - time and force - displacement is essential. Simulation has been done for different initial conditions and composites of different characteristics for different helmet models. The complete analytical model is formed by using a particular analytical solution, displacement control method, Arc-Length method and adaptive system stabilization method, while Newton-Raphson procedure is used for non-linear finite element analysis (coupled with the analytical model) [1-3]. In the same manner as in case of the displacement control method it is possible to express the displacement increment as (see Fig. 1)

,^A,_ ~

Ax}l)r-AX^' (1>

Ax}* • AX^' +<'>AA

Figure 2 shows the associated Newton-Raphson incremental solution for non-linear finite element analysis of a typical simply supported beam with a force acting mid-span.

Figure. 1 Iteration Path

Figure. 2 Newton-Raphson Incremental Solution

508

A comparative analysis of two types of helmets with different lower edge designs has been carried out in terms of their respective energy absorption capability and maximal impact force.

Figure. 3 Helmet Type A

Figure. 4 Helmet Type B

In this analysis the problem of deformed helmet during impact with a rigid obstacle is treated. This is based on the load limitation problem solution by the adaptive stabilization method. Therefore, both geometric and material nonlinearities are minimized. Simulation has been done for different initial conditions and composites of different characteristics for different models. Computer software package NISA II (EMRC - Engineering Mechanics Research Corporation, Michigan, USA) is used for the simulation of helmet impact with a hard obstacle [4]. 3

Discussions of Results

For the system exposed to progressive impact force it is very important to determine the force load limits acting upon the helmet and its actual behavior under load over the time period in question (Fig. 5).

Figure. 5 Helmet Deformation Under Progressive Impact Force

It can be seen from Figure 6 that during impact the material has remained in the elastic domain. Such helmet structure will recover its initial shape. Also, the comparative impact

509 force versus displacement curves for designs A and B confirm that design A exhibits significantly greater energy absorption capability and lower maximal impact force value than type B, which is more suitable considering the passive safety criteria. 900

m~m

800

'H

700

^

4

600

1

£.500

W

8

b 400 u.

1 300 Jf 200 100 0

4

^

*-*.

/

i 1

i

j/

-•— Type A Helmet «

Type B Helmet

i

(/ \ ^ 20

40

60

displacement [mm]

Figure. 6 Impact Force Versus Displacement

4

Conclusions

The paper introduced a new computational approach for structural evaluation of protective helmets. Numerical analysis based on non-linear FEM was used to test static and dynamic models. Results presented in this paper indicate that according to the theoretical considerations the developed approach can successfully be applied with high accuracy for helmet design taking into consideration real-life requirements. Based on the results obtained and validations achieved, the presented method can be considered equivalent to standard crash-tests used for compliance testing of such equipment. This allows for a significant reduction in time and cost involved. 5

References 1. Mitrovic C. and Subic A., Simulation of energy absorption effects during collision between helmet and hard obstacle, in Subic, A. and Haake, S. (Ed.), Sports Engineering. Research, Development and Innovation, Blackwell, (2000) pp. 389-398. 2. Dunn S. A., Issues concering the uodating of finite elements model for experimental data, NASA TM 109116 (1994). 3. Zienkiewicz O. and Zhu J., Adaptivity and mesh generation, International Journal of Numerical Methods in Engineering, 32 (1991) pp. 783-810. 4. ANSYS Theory Reference, Structural Fundamentals, SAS IP, Inc.

BUCKLING ANALYSIS OF COMPOSITE SPHERICAL PANELS WITH RANDOM MATERIAL PROPERTIES B. N. SINGH Department of Applied Mechanics, MNNIT (Deemed University), Allahabad 211004, India E-mail: bnsingh9@ rediffmail. com N.G.R. IYENAGAR AND D. YADAV Department of Aerospace Engineering, Indian Institute of Technology, Kanpur 208 016, India E-mail: [email protected];[email protected] Composite spherical panels with random material properties subjected to axial compressive load with all edges simply supported have been investigated for buckling. The system model incorporates first order and higher order shear deformation theories. A probabilistic approach in conjunction withfirstorder perturbation technique has been outlined. Results for mean and standard deviation of buckling response for cross-ply panels have been presented.

1

Introduction

The increasing need of lightweight and optimised structures has lead to widespread adoption of thin walled composite laminates. Buckling in any form either precipitates or hastens the collapse of such structures. Very limited literature is available on composite structures with random material properties. Free and forced vibration [1], stability of columns [2], beams [3], flat and cylindrical panels [4-6] have been investigated with uncertain parameter and material properties. This paper presents an analytical approach to stability of spherical panels with random material properties. It outlines a stochastic approach using first order perturbation (FOPT) for solution of random characteristic equation of buckling evolved from random variation of material properties. The transverse shear effects have been incorporated in the formulation. 2

Basic Formulations

The governing equations with higher order and first order shear deformation theories [HSDT and FSTD] proposed in [7] as applied to spherical panels under compression can be taken from the said reference for the present study (not presented here to economise for space). For all edges simply supported cross-ply laminates, exact Navier type solution is possible. The displacements along the axes and rotations about X2 and xi satisfying the boundary conditions can be seen from ref. [7]:

510

511

Substitution of the above mentioned equations [7] in the system equation turns it into a homogeneous set. Nontrivial solution of the deflection yields the eigen value formulation for the critical load Ncr. This depends on the stiffness matrix elements atj ' s . These elements dependent on the material properties are random in nature, leading to the buckling load being also random. 3

Perturbation Approach for Buckling Load Statistics

Any random variable can be split up as the sum of its mean and the zero mean random part. For example, with over bar denoting the mean value, superscripts 'R' the random variable and 'r' the zero mean random part N«=Ncr+Nrcr

(1)

Substitution of Equation (1) in the characteristic equations, expanding, collecting same order of magnitude terms and retaining only up to first order, yields in a symbolic form Zeroth order: Ncr=F(aij) (2) First order: Nrcr=F(a^,Ncr) (3) Equation (2) is deterministic relating only the mean quantities and the mean critical load is obtained from this by any standard procedure. Using Taylor's rule and manipulating expressions, we get expansion: •McrJ=F(aik,aikJ,Ncr)

^

(4)

where , j denotes the partial derivative with d* evaluated at dj Now Ncr is obtained by using above Equation. Its variance is evaluated by taking its appropriate expectation. 4

Results and Discussion

The above procedure is employed for the second order statistics of critical buckling loads of symmetric and anti-symmetric cross-ply spherical panels with axial compressive load and all edges simply supported. The mean values for material properties of graphite/epoxy composite used are [7]: Eu = 40E22, Gl2 = G13 = 0.6E22, G23 = 0.5E22, vI2 = 0.25 , with the shear correction factor for FSDT as 5/6. Results are generated for b/a=l and 2, R/b=5, b/h=10 and 100, and two lay-up [0°/90°] and [0°/90°/900/00]. However, results for only [0°/90°] are presented due to space considerations.

512

4.1 Buckling Load: Composite Spherical Panels Validation Study Validation of the present technique has been carried out by comparison with Monte Carlo simulation (MCS). Table 1 shows the results by FOPT and MCS. Only En is taken to be random. It is observed that the results obtained from FOPT are in good agreement with MCS. Second Order Statistics Mean buckling loads The nondimensionalised mean buckling loads are presented in Table 2. It increases with increase in b/a and b/h ratios. HSDT gives slightly higher values compared to FSDT for both values of b/h. These differences in the predictions are small for thin panels. The mean load increases as the aspect ratio increases. Variance of buckling loads Table 3 presents the SD of Ncr with SD of all the basic RVs changing simultaneously. Dispersions in buckling loads with FSDT and HSDT models are of comparable order of magnitude. There is slight change in dispersion as the b/a ratio increases for both the stacking sequences. 5

Conclusion

The following main conclusion can be drawn from the present study: The buckling loads for symmetric and anti-symmetric cross-ply laminates show almost equal changes in scatter for the two theories FSDT and HSDT and in general, the two predict comparable results. Table 1: Comparison of spherical panel buckling load from MCS to the present approach, [0/90], with R/b=5, b/a=l and b/h=10 SD/mean, E n Method MCS: HSDT FOPT:HSDT

.05

.10

.15

.20

0.0252 0.0252

0.0441 0.0444

0.0636 0.0640

0.107 0.110

Table 2: Mean buckling loads, Ncr

= N

b2 / ( £ 2 2 / l 3 ) , N

= N,

for all edges

simply supported composite spherical panels with R/b=5 Theory Aspect [0°/90°] [0°/90u/90°/0u] ratio b/a b/h=10 b/h=100 b/h=100 b/h=10 FSDT

12.26

58.55

19.37

62.67

513

1 2

HSDT FSDT HSDT

12.47 32.82 35.30

58.57 74.51 74.56

18.37 38.03 34.32

62.64 114.28 114.16

Table 3: Sensitivity of SD of spherical panel buckling loads with SD of basic material properties with all basic material properties changing simultaneously, [0°/900]. Method SD/mean, material properties R/a=5 &b/h=10 .05 .10 .15 .20 0.0582 0.120 HSDT: b/a=l 0.0271 0.0810 0.0530 0.0750 0.104 FSDT :b/a=l 0.0268 HSDT :b/a=2 0.0210 0.0426 0.0680 0.100 0.0418 0.0630 0.091 FSDT :b/a=2 0.0200

References 1. Ibrahim, R.A., Structural dynamics with parameter uncertainties. Appl. Mech .RevAto (1987) pp.309-328. 2. Zhang, J. and Ellingwood, B., Effects of uncertain material properties on structural stability. ASCE J. Struc Engrg ,121(4) (1995) pp. 705716. 3. Jeong, J.D., Critical buckling load statistics of uncertain column. In Proc. 6th speciality conf. Probabilistic Mech. And Struc. And Geo. Tech Reliability (1995), pp. 563-566. 4. Singh, B.N., Yadav, D. and Iyengar, N.G.R., Initial buckling of composite cylindrical panels with random material properties. Compos. Struc. 53(1), pp. 55-64. 5. Singh, B.N., Yadav, D. and Iyengar, N.G.R., Stability analysis of laminated cylindrical panels with random material properties. Compos. Struc. 55(1), pp. 6. Singh, B.N., Iyengar, N.G.R. and Yadav, D., Effect of random material properties on buckling of composite plates. ASCE J. Engrg. Mech. 127(9), pp. 873-879. 7. Reddy, J.N. and Liu, C.F., A higher order shear deformation theory of laminated elastic shells. Intl. J.Engrg. Sci. 23(3) (1985) pp.319-330.

NUMERICAL ANALYSIS OF ADHESIVELY BONDED CYLINDRICALLY CURVED LAP JOINTS CHENGYU QIAN AND LIYONG TONG School of Aerospace, Mechanical and Mechatronic Engineering University of Sydney, NSW 2006,Australia E-mail: cyqian @aeromech.usyd.edu.au This paper investigates the effect of curvature on adhesive stresses in cylindrically curved bonded single-lap joints using both analytical and finite element analysis methods. In the analytical method, shear and peel stresses are assumed to be constant across bondline, and the governing equations are derived and then solved using a multiple shooting method. 2-D plane strain finite element analyses are performed using Strand 7 to validate the present analytical models through studying the effect of curvature on stress distributions for the joints subject to three-point bending. Preliminary results show that there exists a good correlation amongst the shear and peel stresses predicted using the present analytical approach and the finite element method.

INTRODUCTION Adhesive bonded joints and repairs have been increasingly used in joining and /or repairing lightweight structures, particularly in fuselages and wing and control surfaces in airframes. There exists a large amount of research on stress analysis of adhesively bonded lap joints with straight adherends [1-2]. However, only very limited studies were devoted to investigation into the effect of curvature on adhesive stresses in adhesively bonded joints with curved adherends. Sun and Tong [3] investigated the effect of curvature on the performance of actuators and sensors for curved smart beams. This paper aims to address this issue by determining adhesive stresses in cylindrically curved beams bonded with a single-sided patch. In this paper, an analytical model is formulated using the curved beam theory [4] and the constant shear and peel strain assumptions [1]. Solutions for the present model are obtained by using the multiple-segment shooting method, and then validated via comparison with finite element analysis results. Numerical results for the selected joints subjected to three points bending are presented with various radii of the curved beams and patches to illustrate the effect of curvature. ANALYTICAL MODELING Consider a curved thin host beam with a single-sided patch as showed in Figure 1, it is assumed that the shear and peel strains in the adhesive layer are constants across the bondline. The entire curved beam can be divided into two regions, namely, the host beam region and the overlap region. Using the curved beam theory degenerated from the deep shell theory [4], the middle surface strains and curvatures of the host beam and patch are: ax

Ri

dx

Rt dx

The strains and the longitudinal displacements at an arbitrary point can be expressed as €,& = {£?+zx,),

ut(z) = ut + z(%--^), R, dx

514

(»=1,2)

(2)

515 Where the subscripts 1 and 2 represent the host beam and the patch respectively; u and w are the longitudinal and transverse displacement in the mid-plane; R is the radii of the curvature of the beam and patch.

Adhesive Layer

Figure 1 The adhesively cylindrical curved beam with lower patch

The equilibrium equations in the overlap can be derived as follows: TU+QJRI -bT2+f,(x) = 0, QU -TjR, -ba2+f,(x) = 0 Mu+br1hl/2-Q,=0, T2x+QjR2+bT2=Q Q2,*-T2/R2+ba2=0, M2x+br2h2/2-Q2=0 (3) where h denotes the thickness of the beam and patch, b is the width of the curved beam, T,Q,M are the axial force, transverse shear force and bending moment respectively, rand a are the shear and peel stresses of the adhesive layer. The axial force and bending moment are: (1=1.2) 12 E is the Young's modulus of the beam and the patch. Shear and Peel stress in the adhesive layers are defined as

T^E.bh.e?

2

1 i,

•

,

-2

V 1 1.x

i.»,-», 2.x)

.

1

2\

—=-«! H \

R

—U. 2

R

(4)

J

a =£ v /((l-v*2KXw,->v 2 ) (5) hv is the thickness of the adhesive layer, £v and Gv are the Young's and shear moduli, v is the Poisson's ratio of adhesive. Equations (2)-(5) together with relevant boundary and continuity conditions form a boundary value problem, which can be solved by rewriting them in a set of first-order ordinary different equations and using multiple shooting methods [5].

ILLUSTRATIVE EXAMPLE Consider a single-sided lap joint of 25 mm wide clamped at both ends and subjected to point load at the middle point of the host beam as schematically shown in Figure 1. Numerical analyses are conducted using the present analytical model and 2Dlinear elastic plane strain finite element analysis (FEA) models with STRAND7 [6]. In the FEA, 8node isoparametric elements are used to model both adherend and adhesive layer. In the through-the thickness direction, one element is used to model the bondline whereas four

516

elements are used to model the adherend of host beam and the patch. Along the circumferential x or 6 direction, elements of equal arc length 0.25 mm are used. It is worth noting that the arc length of adhesive element near adhesive free edge is slightly less than the height of the element, which is equal to the adhesive thickness 0.3 mm. The applied load P is chosen as 0. lkN. Table 1 Physical properties and dimensions of the curved joints Item Host beam Patch Adhesive Young's modulus (GPa) 70 70 2.234 Poisson ratio 0.334 0.334 0.31 1.0 0.3 Thickness (mm) 1.0 Arc length (mm) 500 100 100 RESULTS AND DISCUSSION Figure 2 plots the transverse deflection of the mid-plane of the host beam at the loading point versus the curvature of the host beam predicted using the present analytical models and the finite element methods. This deflection at the loading point is the maximum deflection occurs in the joint. It is clear that the maximum displacement falls sharply as the curvature varies from 0 to 0.25, and then decreases slowly as the curvature is increased from 0.25 to 3.33. The straight lap joint has an extremely small bending resistant stiffness and thus undergoes a substantially large deflection, whereas the curved lap joint, even with a very small curvature, gains significantly in overall bending resistant stiffness and hence experiences a small deflection. Evidently, there is an acceptable agreement between the FEA and analytical results for the peak deflection of the host beam with the actual difference ranging from 14 to 24%. 0.3

E" 0.25 , ysis

>placerr

0.2 1 0.15

— * — FEA

0.1

h 0.05 \ 0

I

1

2 3 Curvature(1/R)

4

Figure 2 Comparison of maximum displacement vs. curvatures predicted by the analytical model and FEA

Figures 3 and 4 depict variation of the peak shear and peel stresses versus the curvature of the host beam. Similar to the observation for the maximum transverse deflection in Figure 2, both shear and peel stress decrease dramatically as the curvature becomes nonzero and then plateaus overall as the curvature is increased. However, there exist a range of curvature between 0.25 and 0.5, in which shear stress increases slightly from its valley. It is evident that there exists a good agreement between the FEA and analytical results for the peak shear and peel stresses in the adhesive layer. The actual difference between the FEA and analytical analysis ranges from 2.6 to 9.7% in peak shear stress and 0 to 3.4% in the peak peel stress.

517 175

s 150 i

100 80 co CO

,

60

to

40

I 20

L

-

••""•Analys

11 25 00

s

co 0)

—FEA

'

75 a> 50 o. 25

11 1

1

1

2

3

2

3

4

Curvature(1/R)

Curvature(1/R)

(a)

(b)

Figure 3 Peak shear and peel stresses vs. curvature predicted by analytical and FEA

CONCLUSION In this paper, an analytical model is presented for thin cylindrically curved beams with adhesively bonded single-sided patch to allow investigation into the effect of curvature on adhesive stresses. 2D plane strain FEA models are used to validate the present analytical model by considering the selected illustrative example. The present preliminary numerical results show that introduction of curvature in the host beam can significantly reduce the peak adhesive stresses compared to those without a curvature. ACKNOWLEDGEMENTS The authors are grateful to the support of the AOARD/AFOSR and the University of Sydney. The authors would also like to acknowledge the help from Dr. D. Sun and Mr. Michel Wood. REFERENCES 1. Goland M. and Reissner E. (1944) The Stress in Cemented Joints. ASME J. of Applied Mechanics 66, A17-A27. 2. Tong L. & Steven G.P. (1999) Analysis and Design of Structural Bonded Joints, Kluwer Academic Publishers, USA. 3. Sun D. & Tong L. (2002) Modeling and analysis of curved beams with debonded piezoelectric sensor/actuator patches. Int. J. Mechanical Sciences 44, 1755-1777 4. Qatu M.S. (1993) Theories and Analysis of Thin and Moderately Thick Laminated Composite Curved Beams. Int. J. Solids Structures 30 (20) 2743-2756. 5. Stoer J. & Bulirsch R. (1980) The Multiple Shooting Method. Introduction to Numerical Analysis, pp483-519. 6. Introduction to the strand 7 Finite Element Analysis System (G+D) Company Pty Ltd, Sydney, Australia, 1999).

NUMERICAL ANALYSIS OF THE EFFECT OF INTERPHASE ON THE DEFORMATION OF PARTICLE-REINFORCED COMPOSITES W.X. ZHANG, F.P. YANG AND T.J. WANG Department

Of Engineering Mechanics, Xi'an Jiaotong University, Xi'an 710049, E-mail: [email protected]

China

Finite element analysis is carried out to analyze the interphase effect on the macro- and microscopic deformation behavior of SiC particle reinforced Aluminum matrix composites. The macroscopic stress and strain curves for the SiC/Al composites are obtained for different interphase stiffness, thickness and Poisson's ratio. Also, the distributions of microscopic stress and strain are presented.

1

Introduction

It is well-known that particle reinforced metal matrix composites (MMCs) are being widely used in engineering. Soppa, et al.[l] investigated the effects of microstructure variations on the macro-, meso-, and microscopic deformation of SiC particle reinforced Al matrix composites. Numerical analysis were carried out by Xu, et al.[2] to study the effects of geometry factors on the mechanical behavior of particle and fiber reinforced MMCs. It is assumed in [1,2] that the reinforcing phases are perfectly bonded to the matrix. However, there will be interphases between reinforcing phases and matrix of MMCs, and the mechanical behavior of composites will be greatly affected by the characteristics of interphases, e.g. stiffness, thickness, Poisson's ratio etc.. Wang and Yang [3] studied the energy dissipation in particle reinforced MMCs under cyclic external loading. In this paper, effects of interphase stiffness, thickness and Poisson's ratio on the macro- and microsopic deformation of SiC particle reinforced Al matrix composites are numerically studied. 2

Micromechanical Methods

It is assumed that spherical SiC particles are uniformly distributed in Al matrix. An axisymmetric representative unit cell mode and ANSYS code are employed in analysis. Here, we assumed that the distances between the particles in the directions of Z-axis and R-axis are H and B, respectively, as shown in Fig.l. A displacement is applied in Z direction. The boundary displacements at z=H and R=B are always uniform. Such that the boundary conditions can be expressed as, uz = 0 at Z=0; uz = UZ at Z=H;

(la)

UR = 0 at /?=0; Fz = 0

(lb)

at R=B.

The macroscopic axial strain is expressed as,

Ez=ln{H/H0). The corresponding stress can be calculated as,

518

(2)

519

*z=^Lv
(3)

SiC particle is taken as an elastic body. Matrix Al is taken as an elastic-plastic body obeying von Mises yield criterion. The isotropic strain hardening law for the matrix is as follows,

(Jf=
Fig. 1 Unit cell model,

(4)

where at and a0 are flow and initial yield stresses, respectively, h and n material constraints, e^ equivalent plastic strain. It is assumed that the constitutive law of interphase layer has the same form as matrix Al [3], namely, '/int

'Oint

+ h^(eS

(5)

where a^^Pi1 represent no interphase, soft and hard interphases, respectively. In what follows, we assume H0=B0=\, and consider five different values of /? (=0.01, 0.5, 1, 3, 5 and 10) and three different values of interphase thickness Ah (=0.01, 0.025 and 0.05). Material parameters are taken from [2-4], £m=69GPa,
Numerical Results and Discussions

Fig.2 shows the macroscopic stress and strain curves of the composites with particle volume fraction f= 10% and interphase layer thickness Aft=0.025, from which one can see the significant effect of material parameter /? on the deformation of composite, but this effect becomes very small as p>\. Effect of Poisson's ratio of the interphase on the deformation of composite is shown in Fig.3 for/=10%, /?=0.01 and A/i=0.01. Figs 4 and 5 show the effect of the thickness of interphase layer on the deformation of composites with f=\Q% and /?=0.01. It is clear from Fig.4 that the interphase thickness has significant effect on the macroscopic deformation of composite as P«\, i.e. a super soft interphase case. However, one can not see this effect in the composite with hard interphase layer, e.g. /?=10fromFig.5. Effect of the material parameter /? on the distributions of microscopic von Mises stresses and equivalent strains in the composites are shown in Fig.6 and Fig.7, respectively. It is seen that stress distributions in matrix are similar for the composites without interphase and with hard interphase, as shown in Figs 6(a) and 6(b), but it is totally different for the composite with soft interphase, as shown in Fig.6(c). Figs 7(a) and 7(b) clearly show that deformation concentrates in the matrix for the composites without interphase and with hard interphase, but this phenomena is totally different for the composite with soft interphase layer, as shown in Fig. 7(c).

520

5T 50 |

«o

— • - -v=0.01

30

—•--v=0.3 -v=0.49 —*— o - - no interphase

20 10

Fig.2 Effect of the material constant /? on the macroscopic stress-strain curves of composites with f=\0% and Afc=0.025.

Fig.3 Effect of Poisson's ratio of the interphase on the macroscopic stress-strain curves of composites with/=10%,/3=0.01 andA/i=0.01.

- a - no interphase S 30 20

h=0.01 - A - h=0.025 -ir- h=0.05

-«-

10

0 004

strain

Fig.4 Effect of the thickness of interphase layer on the macroscopic stress-strain curves of composites with^=10% and ^=0.01.

4

Fig.5 Effect of the thickness of interphase on the macroscopic stress and strain curves of composites with/=10%and / 8=10.

Conclusion

Material parameter J3 has singnificant effect on the macroscopic stress-strain curves of MMCs, but this effect becomes very small as /fc>l i.e. hard interphase case. It is seen that effects Poisson's ratio and the thickness of interphase layer on the macroscopic deformation of MMCs are also significant as /?<1 i.e. soft interphase, but the interphase thickness has almost no effect on the deformation of MMCs as /fc>l. It is clear from the distributions of microscopic stresses and strains in the MMCs with and without interphase that deformation concentrated within the soft interphase layer and stress concentration occurs in the hard interphase layer. 5

Acknowledgements

This work was supported by the National Natural Science Foundation of China (No. 10125212) and the funds from The Ministry of Education of China.

521

(a) without interphase

(b) with hard interphase, yS=10 and AA=0.025

(c) with soft interphase, yS=0.01 and AA=0.025

Fig.6 Distributions of microscopic von Mises stresses in the MMCs with particle volume fraction f= 10% while the initial yield occurs in the matrix.

(a) without interphase

(b) with hard interphase, /?=10 and A/i=0.025

(c) with soft interphase, /?=0.01 and A/i=0.025

Fig.7 Distributions of microscopic equivalent strains in the MMCs with particle volume fraction^ 10% while the initial yield occurs in the matrix.

References 1. Soppa E., Schmauder S. and Fischer G., Influence of the microstructure on the deformation behaviour of metal-matrix composites. Comp. Mater. Sci. 16 (1999), p.323-332 2. Xu D., Schmauder S. and Soppa E., Influence of geometry factors on the mechanical behavior of particle- and fiber-reinforced composites. Comp. Mater. Sci. 15 (1999), p.295-301. 3. Wang J.C. and Yang G.C., The energy dissipation of particle-reinforced metal-matrix composite with ductile interphase. Mater. Sci. Eng. A303 (2000), p. 77-81. 4. Zhang J., Perez R.J., Wong C.R. and Lavernia E.J., Effect of SiC and graphite particulates on the damping behavior of metal matrix composites. Mater. Sci. Eng. R, 13(1994),p.325.-390.

NUMERICAL FINITE DEFORMATION ANALYSIS ON SOLID PROPELLANT GRAIN USING FINITE ELEMENT M E T H O D

YANG YUECHENG*, QIANG HONGFU, XU GUIMING AND ZHAO HAISHENG Xi'an Hi-Tech Research Institute, Hongqing Town, Xi'an, Shaanxi, PRC, 710025 Email address: [email protected] In the paper, a numerical simulation of finite deformation on solid propellant grain under the cuing process is studied. A three-dimensional numerical simulation for shell-grain model is carried out by using a commercial software ANSYS, which is a finite element code with Lagarange processor, and is especially suitable for modelling nonlinear quasi-static/transient problems. The material models in ANSYS, however, are mainly those applicable to metals or other materials. They are not exactly suitable for viscoelastic materials such as solid propellant. In this paper, a new nonlinear viscoelastic constitutive relation model which includes effects of strain softening and strain rate, effects of Poisson ratio's time-dependent or time-independent and effects of time-temperature equivalent and compressibility or incompressibility for solid propellant, a viscoelastic large deformation variational equation based Dirichlet-Prony series representation of the relaxation modulus Total Lagrange (T.L.) method, are developed. The developed models are implemented into the ANSYS code through its user subroutine function. Finally, the 1/8 scale Solid Rocket Motor (SRM) 3D numerical examples are executed, numerical results of deformed area, as well as stress field distribution are obtained and compared with those from independent fields tests.

1

Introduction

It is very important to study viscoelastic properties for solid propellant accurately and effectively due to structural complexity of SRM and nonlinear geometrical distortion in service process. Many endeavors have been done [1-3, 5-6] for the viscoelastic constitutive model of solid propellant grain, they were all single integral forms using kernel function. Amongst them, Swanson's nonlinear constitutive law is better than others, it has a very sound engineering background, strain softening and rate-dependent effects are considered and formula is very simple. Unfortunately, the compression for material behavior is not considered when Poisson ratio is time-dependent, in another words, it has coupled phenomenon among the shearing modulus and bulk modulus, as well as Poisson ratio. Hence authors propose a new modified version of Swanson's model which considered the demerits of it. In present paper, a new improved nonlinear viscoelastic constitutive model is presented based on Swanson's model, the strain rate effects and Poisson rate timedependant are considered. Incrementalization is accomplished in closed form. With virtual work equation of T.L. method, 3-D viscoelastic FE algorithm to quasi-static / transit problems is derived. Finally the 1/8 scale 3-D model is meshed and its numerical simulation is performed, the simulating results agreed well with published data, it proved that the mechanical model is sound academic. 2

3-D unified viscoelastic constitutive relation

Based on single integral form nonlinear viscoelastic constitutive relation proposed by Swanson et al. [6] and corresponding small-strain theory, a modified nonlinear viscoelastic constitutive model is presented, which considered Poisson ratio time-

522

523

dependent or time-independent and time-temperature equivalent effect for solid propellant, as well as compressibility. S ^ g i E ' ^ l G i t - O m ^ r

+S t j i m - O m ^ d A

(1)

where S-- is Kirchhoff stress tensor under finite deformation; Ey is Green strain tensor under finite deformation; (T) is modified function with respective to strain rate, especially refers the effect of temperature, when temperature variation is neglect, we can take
HE'is the

AT =T —T0 is temptation variation for material; G(
dr

\

and P' = r

« r (n)

^

are defined as reducing time, assume it is thermo-

^rCi)

rheological simple material; aT is time-temperature shift factor, and is defined by W.L.F. equation, , 10]

_ C,(T — TR) where TR is reference temperature, C, and C 2 are constants

C2+T-T, determined by experiments. Hence all the factors associated with time-temperature equivalent effects, compressibility and strain-softening, strain rate variations as well as heat strain are taken into account in equation. (1), so equation (1) is a unified nonlinear viscoelastic constitutive model. It can be degenerated to special cases under different conditions. When material property is approximately incompressible, bulk relaxation modulus K{t) = const is taken, when it is completely incompressible, the bulk relaxation modulus K(t) = oo is assumed. In additional, another property parameter in Eq. (1) is common used as shearing relaxation modulus G(t) • When bulk relaxation modulus K(t)^ const is considered, Eq. (1) is also stand. When its nonlinearity in the Eq. (1) is ignored, i.e. strain-softening function #(£') = 1 or modified function with respective to strain rate 0(T) = 1, then Eq. (1) is reduced to linear thermo-viscoelastic constitutive model. 3

Incremental form of unified 3-D nonlinear viscoelastic constitutive model

In order to analyze finite deformation for solid structure, we have to derive integral incremental constitutive relation from equation. (1). In brief, let the bulk relaxation modulus be represented by a Dirichlet-Prony series as,

524 (2)

*(f) = 5XexpHV) where K

and )3 are relaxation coefficients and characteristic time in p' order, is

summation of Dirichlet-Prony series expansion, so the incremental relation at time tn is derived as i*S»l=

g.(Ol " £1

/ ^ ; 0,A A '»

{AEtt} (3)

+ £[£„(£') exp(-j3p B„Af„) - g„_, (£')]{ 4 } , - , . , *=1 ffl

+ £„(£')£({/C t t }„_lp - { / „ }._li,)0exp(-^B„Ar11) where the first term on the right side denotes the stress increment induced by the strain increment {AEkk} at time interval Atn ; the second term on the right side denotes relaxation stress increment at time interval Atn ; the third term on the right denotes influence on stress increment by strain rate yielded by temperature variation at time interval Atn. With help of principle of virtue work using T.L. method, we obtain

J0y A^C^JA^dv +

+ J0v (AL0ErsC:siJ80N% + LfEnCnl]S^Ev

AfErsCrsiJSAN0LE,J)dv

= S'+A'W-j0vC0Su+,0S!

+ lvC0SlJ +'Q

S^ +'0 S^SAfE^dv

(4)

^S^SA^dv

this is its corresponding finite deformation formulation for finite element analysis, the significance of variables above on are ignored concrete .due to length limits. All the features of the model are coded and incorporated into the ANSYS commercial software using the user defined subroutine interface. 4

Numerical example

Based on the symmetric configuration of the SRM, the 1/8-scale propellant grain coupled shell structure is targeted to analyze during the cooling process from 70°C to 20°C. The mechanical model mentioned above is employed. The mechanical property parameters of SRM are referred to [4].The results are plotted in vector graph, in which MX and MN denote the maximal and the minimal value position in the grain, see figure 1-figure 2, in which A, B and C represent the radial, circumferential and axial value distribution respectively, D represents the equivalent value distribution, following analysis in detail. • In the X-axes, the displacement MX and MN locates at the inner and outer of the rocket motor, and the volume reduction can read from the figure, which is agreeing with the engineering practice. The stress MX locates at the conjoint place of the alaslot and ala-slice, and MN locates at the outer of the ala-slice. • In the Y-axes, the displacement MX and MN locates at the two sides of the ala-slice respectively, and the value of the MX equals to MN with the opposite direction. This

525

•

•

is the much according with the symmetry assumption and the engineering practice. Stress distribution is similar with the x-axes. In the Z-axes, displacement MX and MN locate at the two ends of inner surface respectively. Stress MX and MN locate at the end of the ala-slot, MX locates at the conjoint place of the ala-slot and ala-slice, MN locates at the bottom of the ala-slot. Equivalent stress MX locates at the conjoint place of the ala-slot and ala-slice, MN locate at the conjoint area of the propellant grain and the cylinder of the shell of the rocket motor.

A

B

C

D

Figure 1. Translation displacement vector-graph.

A

B C Figure 2. Stress field distribution vector-graph

D

5

Conclusions

1.

The comparison between the simulating results and the engineering data [4] proves the mechanical model is reliable and academic. Regarding experimental data for grain [4], the tensile stress is much more dangerous than the compressive one. It can be seen from the results, the tensile stress is much higher than the compressive stress. To guarantee the safety of SRM, the tensor stress value at the centralization location of the tensor stress should be diminished. The displacement during the cooling down of the solid rocket motor is higher than the design value. This phenomena show that the inner ballistic trajectory property would be affected by the higher displacement.

2.

3.

526

References 1. Burke M.A., Woytowitz P.J. and Reggi G., Nonlinear viscoelastic constitutive model for solid propellant. Journal of Propulsion and Power, 8 (1992) pp. 586-591. 2. Jung G.D., Youn S.K. and Kim B.K., 2000. A three-dimensional nonlinear viscoelastic constitutive model of solid propellant. International Journal of Solids and Structures, 37 (2000) pp. 4715-4732. 3. Lai J. and Bakker A., 3-D shapery representation for non-linear viscoelasticity and finite element implementation. Computational Mechanics 18 (1996) pp. 182-191. 4. Qiang H.F., Numerical analysis and experimental researches on solid rocket motor grain structure integrity. Ph.D. Dissertation, Dept. of Engineering Mechanics, Xi'an Jiaotong University, Xi'an, PRC, January 1999 (in Chinese). 5. Shapery R.A., On the Characterization of nonlinear viscoelastic material. Polymer Engineering Science, 9 (1969) pp. 259-310. 6. Swanson S.R. and Christenson L.W., A Constitutive formulation for high elongation propellants. Journal of Spacecraft Rocket, 20 (1983) pp. 559-566.

INVESTIGATION ON THE COUNTER-INTUITIVE PHENOMENON OF ELASTIC-PLASTIC BEAMS Y. M. LIU AND G. W. MA School of Civil and Environmental Engineering, Nanyang Technological University, Singapore 639798 E-mail: [email protected], [email protected] Q. M. LI Department of Mechanical Aerospace and Manufacturing Engineering, UMIST, P.O. BOX 88, Manchester M60 1QD, UK E-mail: ainsming. li@ umist.ac. uk Counter-intuitive phenomena of pin-ended and fix-ended elastic-plastic beams subjected to impulsive load are simulated using finite element method. The effects of element type, mesh size and the magnitude and time duration of the impulsive load on the phenomenon are studied. Sensitivity analysis of the counter-intuitive phenomena based on the Shanley model and finite differential method is also carried out.

1

Introduction

Counter-intuitive phenomena were first observed in elastic-plastic beams subjected to impulsive pressure by Symonds and Yu [1]. The counter-intuitive phenomenon means that the residual mid-point deflection of the beam lies on the same side of the applied impulse. The same phenomenon was also observed when studying a fix-ended uniform beam under concentrated impulsive load [2]. To explain the abnormal behavior of beam dynamics, a Shanley model was adopted to capture the counter-intuitive response feature of the pin-ended beam [1]. Results showed that the counter-intuitive phenomenon is very sensitive to the magnitude and duration of the applied load [1-4]. In the present study, both a pin-ended elastic, perfectly plastic beam subjected to impulsive pressure load and a fix-ended elastic, perfectly plastic beam under concentrated impulsive force are simulated using the finite element code LS-DYNA. The effects of the element types, mesh size, magnitude and time duration of the impulsive load are studied. The derived results are compared with the results obtained by other researchers. Sensitivity analysis on the counter-intuitive behavior based on the Shanley model and finite differential method is also performed to catch the transition points. 2

Pin-ended beam problem

A uniform beam with rectangular cross-section and pin-jointed at both ends is simulated. The impulsive pressure po has a rectangular shape with time duration to. The material is elastic, perfectly plastic. The half span, the width and the thickness of the beam denoted as L, b and h are respectively 100mm, 20mm and 4mm in the present study. E,CT0,p and v are the Young's modulus, the yielding stress, the mass density and the Poisson's ratio of the beam material with values of 80GPa, 300MPa, 2700kg/m3 and 0.3, respectively. Fig. 1 shows the final deflection at the midpoint of the beam obtained by using shell element in LS-DYNA. In Fig. 1, the time duration to is fixed at 0.5ms and the magnitude of the impulsive pressure p 0 varies from 0.8 to 1.1 MPa. When the impulsive pressure is in a window approximately between p]=0.90 MPa and p2=0.99 MPa, the final deflection

527

528

of the mid-point of the beam is negative, which indicates the counter-intuitive phenomena. The numerical results obtained by using 3-D solid elements in LS-DYNA with different sizes are shown in Fig. 2. It is seen t-~-7"—""" from Fig. 2 that the location and the width of the window of the pressure pulse depends on element types used. There exist two windows within which the counter-intuitive behavior occurs when solid elements with four layers are adopted along the thickness. However, the first •o •6 v window disappears when the mesh size along ~—J the thickness become smaller. Because of the 0.80 0.85 0.90 0.95 1.00 1.05 1.10 highly parametric sensitivity of the problem, the Impulsive pressure load p0 (MPa) mesh must be fine enough to ensure the Fig. 1 The final deflection of the beam accuracy of the results. The time duration of the impulsive pressure load also plays an important role on the counter-intuitive response of the beam. Three cases corresponding to to values of 0.01, 0.1 and 0.5 ms are studied with a mesh of 100x10x6. Fig. 3 shows that the magnitude of the load when the phenomenon occurs increases significantly as the time duration of the impulsive pressure reduces. The vibration of the mid-part of the beam is more significant than that near the ends when the time duration becomes shorter. .

*-*—•—*

f:.~-

-0.98 D

"So.96 §0.94 20.92 I o u .20.90 - i

—|g—p2

—A-Pi

BO.88

0 1 2 3 4 5 6 7 8 9 10 Cases of mesh Fig. 2 Case 1: shell element with a mesh of 100X10; Case 2-9: solid element with meshes of 60X6X4, 80X8X4, 100X10X4, 60x6X6, 80X8X6, 100X10X6, 100X10X8, 100X10X10 respectively

3

Cases of time duration of the impulsive load Fig. 3 Effect of the time duration of the impulsive load on pi and p2 (Case 1-3: to=0.01, 0.1, 0.5 ms)

Fix-ended beam problem

A fix-ended beam is then simulated with a refined 3-D solid element mesh 100x10x5 for a quarter of the beam. The beam is uniform and fully constrained against displacement and rotation at both ends. A rectangular impulsive force is applied at the mid-point of the beam. The material is assumed again elastic, perfectly plastic. The material and geometry parameters adopt the same values as those in [2]. The window of force within which the counter-intuitive phenomenon occurs is found in the range between 560 and 700 N. This range differs slightly from the results calculated with ABAQUS in [2] which is in the range from 520 to 600 N. It is shown that not only the location but also the width of the

529 window in which the counter-intuitive phenomenon occurs are different when different finite element codes are used. Figs. 4 and 5 illustrate the variation of the deflection time history and the effective plastic strain accumulation at the midpoint of the beam when the counter-intuitive phenomena occur.

• | 0.035 " 0.030 o '£ 0.025 "C 0.020 > 0.015 J& 0.010 W 0.005

N

[F

J

r

0.000 0.002

0.004

0.006

0.008

0.'

t(s) Fig. 4 Comparison of the deflection of the mid-point of the beam

4

F=700N F=710 N

0.002

0.004

0.006

0.008

0.010

t(s) Fig. 5 Comparison of the effective plastic strain of the lower element at the mid-span

Sensitivity analysis

Shanley model has been successfully used to analyze the counter-intuitive behavior of the pin-ended beam [1]. As shown in Fig. 6, the SDoF Shanley model has a deformable cell , i Arj,/2 connected by two rigid bars. For small ! AoV2 ~ M rotations, there exists tan
TTTUtT

n / A,

e In order to simplify the theoretical procedure, it is assumed that the pulse loading -Ob (C) on the model is very short and intensive, Fig. 6 The Shanley model which implies that the recovery response phase of the model is only determined by the maximum rotation angle. Therefore, the external loading can be simply represented by the initial condition cp=o- The dynamic equilibrium equation is

^

(Ir

+ 5 1 (^ + | ) + 5 2 ( ^ - | ) = 0

(2)

The above equilibrium equation can be simplified and rewritten as the following ip+K(p=F (3) Sensitivity analysis (SA) is usually characterized by the gradient of the response with respect to the system parameters. Eq.(3) is differentiated with respect to cp0 to obtain the following sensitivity equation, ip^ +K(p'm =F'n -K'm

530

The general form of the response equation, Eq.(3), and the sensitivity equation, Eq.(4), can be solved simultaneously using Newmark-P method to obtain the time histories of cp and (p cp° under their respective initial conditions. Fig. 7 shows the final vibration of the Shanley model. Three critical initial angles (pcrA, cpcrB and (pcrc, are 0.08523, 0.09254 and 0.09728 rad, respectively. In order to compare response sensitivity when cpo varies in a range, the computation time is limited to 4 ms. The maximum value of the absolute response sensitivity ld(p/d(f>olmax in 4 ms is calculated for different (p0, which is shown in Fig. 8. The range of cpo is from 0.04 to 0.14 rad, and 57 simulations are carried out with a terminal time t=4ms. It has three obvious peaks, corresponding to the values of cp0 equal to 0.08556, 0.09222 and 0.09778 rad, respectively. These values agree well with the critical initial angle (pcrA, cpcrB and (pcrc. By introducing sensitivity analysis method, the range of the initial angle within which the counter-intuitive phenomenon 0.20 0.15

|

(pmax

0.10

1/

I 0.05

/

(pmin

/

s

/

V

#0.00 -0.05

r *""

I (Pc

'.

C

0.03

-0.1Q 00

0.05 0.10 0.15 Initial angle displacement ipo: rad

0.20

Fig. 7 Thefinalvibration of the Shanley model occurs can be determined.

5

0.05

0.07

0.09

0.11

0.13

0.15

o

Discussion and conclusions

The counter-intuitive phenomenon is very sensitive to the finite element codes, element types, meshes and time duration of the impulsive load. Both the location and the width of the window within which the counter-intuitive phenomenon occurs depend on the selected element types and finite element codes. By using sensitivity analysis method, the narrow counter-intuitive window can be found.

Reference 1. Symonds, P. S. and Yu, T. X., Counterintuitive behavior in a problem of elasticplastic beam dynamics, ASME J. ofAppl. Mech., 52 (1985) pp. 517-522. 2. Symonds, P. S. and Lee, J. Y., Anomalous and unpredictable response to short pulse loading, Recent Advances in Impact Dynamics of Engineering Structures, D. Hui and N. Jones, eds., ASME, New York, AMD 105 (1989) pp. 31-38. 3. Li, Q. M., Zhao, L. M. and Yang, G. T., Experimental results in the counter-intuitive behavior of thin clamped beams subjected to projectile impact, Int. J. Impact Engng., 11(3) (1991) pp. 341-348. 4. Li, Q. M. and Liu, Y. M., Uncertain dynamic response of a deterministic elasticplastic beam, Int. J. of Impact Engng. (in press).

COMPUTATIONAL MATERIAL TESTING OF PRE-DAMAGED METALS USING DAMAGE MECHANICS MODELS Y. TOI AND

S. HIROSE

Institute of Industrial Science, University of Tokyo, 4-6-1 Komaba, Meguro-ku, Japan (E-mail: [email protected])

Tokyo

153-8505,

The elasto-viscoplastic constitutive equation is formulated, based on the concept of continuum damage mechanics. The constitutive modeling is identified, based on static/dynamic tensile tests and fatigue tests for SM490A. The identified model is used to predict the dynamic, tensile behaviors of pre-strained SM490A. The predicted results have agreed well with the corresponding experimental results.-

1

Introduction

Computational material testing considering elasto-viscoplasticity and material damage in the constitutive equations based on continuum damage mechanics [1-3] is conducted in the present study. All of the material constants of undamaged metals concerning elasto-viscoplasticity and material damage are determined, based on the quasi-static tensile, dynamic tensile and fatigue test (S-N curves) results [4-6]. Subsequently, the whole process of the pre-damaging of specimens by the pre-strains or pre-fatigue and the following dynamic tensile tests of the pre-damaged specimens is simulated, using the identified material testing simulators. The validity of the computational prediction for the effect of the pre-strains and pre-fatigue on the dynamic tensile fracture behavior is demonstrated by the comparison of the calculated results with the test results [4-6]. The method of computational material testing as proposed in the present study, which is applicable to other sorts of materials and other forms of damage, can be effectively used for the evaluation of mechanical properties of pre-damaged materials based on a limited number of test results for undamaged materials and the lifetime prediction of structures. 2

Formulations

The following elasto-viscoplastic constitutive equation considering damage based on Ihe strain equivalence hypothesis [1] is used:

m = [».]{£') =[D.]({£}-{ev}) (1) e vp where {cr}: effective stress, {£•}: total strain, {s ): elastic strain, {s }: viscoplastic strain, [De]: isotropic, elastic stress-strain matrix. The effective stress {a} is expressed as

531

532

{a} = {a)l{\-D)

(2)

where {a}: nominal stress, D: scalar damage variable. The following viscoplastic strain rate {svp}, which is the extension of Perzyna's viscoplastic constitutive equation [7] by Murakami [8] to the damage analysis, is used:

K

vr-

A" V)

(3)

\(l-D)^+(x0-qy/3^} / fy-D) where J2: the second invariant of deviatoric stresses {crd}, s eq P equivalent viscoplastic strain, y, q,x0,p,m: material constants. The following form of unified damage evolution equation proposed by Lemaitre [l] is employed: V

(4)

D= where D=0 D>0

(5a)

when £eq < ePd v/heneeq>epd

andaeq>af

(5b)

0<£> (5c) in which e : accumulated equivalent total strain, epd : critical strain for damage, <req : equivalent stress, of : fatigue limit, Dcr: critical damage variable for mesocracking, Y: elastic strain energy release rate, seq: equivalent total strain rate, S and s: damage material constants. The elastic strain energy release rate Y is given by the following equation: R..

(6)

R., = j ( l + v)+3(l-2v;

(7)

-7 =

2E(l-D)2

where

eq J

J

in which E: Young's modulus, v : Poisson's ratio, Rv: triaxial function, (TH : hydrostatic pressure. The von Mises equivalent stress oeq and the total strain rate s are expressed as a eq

V^=ffMV)T.

F

=

WH

(8)

The material constants S (5 = 1.0), epd and Dcr depend upon materials, temperature and types of damage [9]. Then, S is assumed as S = S0e for elasticity, S = S0P (1 + cseeg) for plasticity (9) As for spd and Dcr, the following equations are assumed: £

pd

~£pd0\(1

+ cee„),

Dcr = Dcr0 (1 + cDseq)

(10)

533

3 Numerical Results Sixteen material constants (Table 1) contained in the formulation of the preceding section have been determined so as to fit the material test results [4-6] for steel specimens under static, dynamic tension and repeated tensile loading (Fig. 1). Table 1 Material constants of SM490A

y [sec"1]

210 0.3 2300

q [MPa]

750

JC0 [MPa]

420

P

5.0

E [GPa] V

m S0

[MPa]

1 600

S0'

[MPa]

0.47

e

s Cs

1.0 0. 3625X10"3

[sec]

0.110

£

pdO

-0.3750X10" 4

C£ [sec]

0.530

Dcr0

0

C

D

Gf

s

550

%

500

2

450

399

[MPa]

' •

° •

experinent model

o o •o o o

I

400

-

E o

:z 350 10" 10s 106 107 Number of Cycles to Failure N

Fig.l Identified S-N curves for SM490A under repeated, tensile stress

534

The identified model has been applied to the prediction of dynamic tensile behaviors of pre-strained SM490A specimens [4-6]. The dynamic stress-strain behavior of pre-strained SM490A has been reasonably predicted by using the identified computational model (Fig.2).

800 .-. 700 S «

600 500

CD

^ c

300

"e

200

Y

\

experiment

o

•

100 0

............ 0

0. 1 0. 2 0. 3 Nominal Strain

0. 4

: 0. 5

Fig. 2 Predicted dynamic stress-strain curves for SM490A after pre-straining of 5%

4

Conclusion

The elasto-viscoplastic constitutive equation has been formulated, based on the concept of continuum damage mechanics. It employs the viscoplastic strain given by Perzyna and extended by Murakami to consider the effect of damage. The unified form of damage evolution equation given by Lemaitre has been extended to consider the effect of types of damage and strain rates. The constitutive modeling has been identified, based on static/dynamic tensile tests and fatigue tests for the steel SM490A. The identified model has been used to predict the dynamic, tensile behavior of pre-strained SM490A. The predicted results have agreed well with the corresponding experimental results. Other results are contained in Ref. [10].

References 1. Lemaitre, J., A Course on Damage Mechanics, Second Edition, Springer, (1996). 2. Skrzypek, J. and Ganczarski, A., Modeling of Material Damage and Failure of Structures (Theory and Applications), Springer, (1999). 3. Krajcinovic, D., Int. J. Solids Structures, 37, 267-277, (2000). 4. Itabashi, M. and Fukuda, H., Technology, Law and Insurance, 4, 37-44, (1999). 5. Itabashi, M. et al., Sino-Japanese Symp. Deformation/Fracture of Solids, 41-48, (1997). 6. Itabashi, M. and Fukuda, H., Journal of Materials Processing Technology, 117-3, (2001) 7. Perzyna, P., Arch. Mech., 32-3,403-420, (1986). 8. Murakami, S., Transactions of the JSME, 60-578, 230-235, A(1994). 9. Lemaitre, J., Comput. Methods Appl. Mech. Engrg., 51, 31-49, (1985). 10. Toi, Y. and Hirose, S., Transactions of the JSME, (2002), in print.

STUDY OF THE INFLUENCE OF THE SUSPENSION PARAMETERS ON SUSPENSION KINEMATICS CHARACTERISTIC DINGHUA ZHUMAOTAO XIACHANGGAO School of Automobiles and Transportation, Jiangsu University, Dantu Road 301 Zhenjiang City Jiangsu Province China E-mail: dineh@sina. com, en Suspension is one of the most important parts in vehicles, and independent suspension is the most widely used style in now day's vehicles. Its Design level will influence the performance of the vehicle; especially the drive ability of the vehicle and lifetime of the tire. It is important to select the parameters of the independent suspension to get the ideal suspension performance. Based on the theory of multibody kinematics, one McPherson suspension multibody kinematics model is built by using ADAMS/Car software and the influence of suspension parameters on the suspension kinematics characteristic studied. Through studying on suspension kinematics characteristic, this article researches the methods for overcoming the steering wheel wear. During constructing the model, it has been checked with the measured front wheel alignment parameters. And the model was consummate with these parameters. The model study is performed to look for the sensitive parameters of tire wear through separating the parameters. The influence of structure parameters (deviation angle of control arm, pitching angle of control arm, height of control arm inner bearing) on front wheel alignment parameters is studied. We can get the ideal model through optimizing these parameters.

1

Introduction

McPherson suspension is the most widely used suspension style in now day's vehicles. Performance of suspension is influenced by suspension parameter. Based on the theory of multibody kinematics, this paper constructs one McPherson suspension multibody kinematics model by using ADAMS/Car software and studies the influence of suspension structure parameters on the suspension kinematics characteristic. 2

ADAMS Software

ADAMS is the world's most widely used mechanical system simulation software. It enables you to produce virtual prototypes, realistically simulating the full-motion behavior of complex mechanical systems on their computers and quickly analyzing multiple design variations until an optimal design is achieved. This reduces the number of costly physical prototypes, improves design quality, and dramatically reduces product development time. ADAMS provides a full suite of modeling, analysis, and visualization capabilities. With ADAMS, we can quickly and easily create a complete, parameterized model of our mechanical system, building from scratch or importing parts geometry from your preferred CAD system. We then apply forces and motions and run this model through a battery of physically realistic 3D motion tests. ADAMS/Car is a specialized environment for vehicle modeling. It allows virtual prototypes of vehicle to be created subsystems and the virtual prototypes analyzed much like the physical prototypes.

535

536

3

Building of the Model

3.1

Analyses the Structure of the Model

The McPherson suspension is composed of wheel, control arm, steering arm and shock absorber. With ADAMS/Car, we can build the mold easily. The connection between the wheel and is revolute joint, the steering arm connect to control arm with spherical joint, control arm connect to the vehicle body with revolute joint, the steering arm connect to the shock absorb with Cylindrical joint and the shock absorber connect to the vehicle body with hook joint. Suspension Joints Styles and Characters show in Table 1. Table 1. Suspension Joints Style and Character.

Style of the joint

The number of Restrict freedom translation revolute

The number of the joint 1/2 suspension

Full suspension

revolute joint

3

2

2

4

hook joint

3

1

1

2

Cylindrical joint

2

2

1

2

Spherical joint

3

0

1

2

There are 4 parts (wheel, control arm, steering arm and shock absorber): n=4 So the 1/2 suspension restricts equation: m=2*5+1*4+1*4+1*3=21 Freedom of the 1/2 suspension: K=6*n-m=6*4-21=3 Three freedoms are up and down of the suspension; revolve of the wheel and swim of the kingpin. 3.2

Build the Model

Figure 1. Suspension Kinematics Model

Build the model with the tested suspension structure data. ADAMS/Car provides the convenience and quickly building tool, so we get the model like Fig.l. 3.3

Check the Model

Caster angle and kingpin inclination angle were decided by the suspension structure, and they can be test in the real vehicle. We test the Caster angle and kingpin inclination angle at the same time testing the suspension structure date. On the other side Caster angle and kingpin inclination angle can be finding on the model. Through compare the real vehicle test data to model data, it confirm that the model is truly and credibility.

537

4

Study of the Suspension Characteristic

The suspension carries through the parallel travel analyses using the ADAMS model. And use ADAMS/Post processing, we can get the result of suspension kinematics characteristic. Through changing the suspension parameter, we got a serious of curves of suspension character. We define the angle of plan of left control arm and XOY plan as pitching angle and the angle of plan of left control arm and YOZ plan as deviation angle. We study the influence of pitching angle and deviation angle of control arm to suspension character. F i g u r e 2 Suspension Pitching Angel and (Fig.2) Deviation Angle 4.1 Change the Pitching Angle of Control Arm As right hand rule, the pitching angle of the original model is negative. We simulate the model on three conditions: the negative pitching angle, the positive pitching angle and the zero pitching angles, and keep the other parameter unchanged. CatlerAigfe From fig.3, we know that 125 changing the pitching angle did not influence the suspension camber *—zero d +—potty? angle and kingpin inclination angle. J* But it influenced the caster angle and US y j f toe angle. Especial caster angel, it ^ increased when the pitching angle is $ r * i * " 4 s ^ . *^- * - * positive; and decreased when the *.*-*-K^*~*-32*r pitching angle is negative. So we can .^#r... use this characteristic to change the !**•# vary tendency of suspension caster OH 25JD SDJD 75.0 tOOIl -1000 -75D -SOD -25.0 angle. When the pitching angle is lAlleelTrael near zero, suspension caster angle Figure 3. Caster Angle Vary Tendency vary very little. CatttAig*

4.2 Change the Deviation Angle of Control Arm As we study the pitching angle, we simulate the model on three conditions: the negative deviation angle, the positive deviation angle and the zero deviation angles, and keep the other parameter unchanged. From simulation result (fig. 4), we know that vary deviation angle

„

*JB -*?*-*

32S

4 r "f!

( IS !•

5 t.75

m

-100D -75.0 -600

-2sn

on

2sn

SOn

750

INIeelTrael

Figure 4. Caster Angle Vary Tendency

10QJ3

538

influence the changing tendency of the caster angel. When the deviation angle varies from positive to negative, the changing tendency of caster angle will be gently. 4.3

Change the height of control arm inner bearing

As before, we simulate the model on three conditions: original height of control arm inner bearing; decrease the height of control arm inner bearing, increase the height of control arm inner bearing, and keep the other parameter unchanged. From the results (fig. 5 and fig.6), we can find that the suspension parameters vary tendency are all gently. Especially, toe angle vary tendency changed notability. It is important to the lifetime of the tire.

Toe Aigfe

100

i

9

00

~ " * ~ ^ ^

-S0

I

!

)

\

i

i Oft] t i l l

j

—

| .. !

+-i;~,:^xw

^

-1DJD • —

j -150 -ram -75J0 SOD

1

-250 00 250 WleelTiacH

600

1

ISO 1000

Figure 5. Toe Angle Vary Tendency CanberAigle

200

Conclusion The suspension parameters are influenced by suspension structure parameters. Suspension parameters will decide the performance of vehicle. Especially the toe angle vary range will influence the tire wear-life. We hope that the suspension -IDOJO -7SJ3 HSOD -2SJQ OJD 2SJD SDH 7 5 0 1000 parameters unchanged during wheel WleelTrael travel in ideal. But it is not reality, so Figure 6. Camber Angle Vary Tendency we try to reduce the vary range of the suspension parameters. So we can optimize the suspension through change the deviation angle, pitching angle and the height of control arm inner bearing. References 1. 2. 3. 4. 5.

ADAMS/View User Guiders, Version 10.1, Mechanical Dynamics, Inc., 2000. ADAMS/View Reference Guiders, Version 10.1, Mechanical Dynamics, Inc., 2000. ADAMS/Car User Guiders, Version 10.1, Mechanical Dynamics, Inc., 2000. M. magic, Automobile Dynamics, Peoples Traffic Pubic, 1992.3 (in Chinese). Yu Zhisheng, Automobile Theories, Mechanic Industry Public, 1994.5 (in Chinese)

COMPUTATIONAL STUDY OF VAPOR PRESSURE ASSISTED CRACK GROWTH AT POLYMER/CERAMIC INTERFACES C. W. CHONG, T. F. GUO AND L. CHENG Department

of Mechanical Engineering, National University of Singapore, Singapore E-mail: [email protected]

117576

Finite element analyses of vapor pressure assisted crack growth along polymer/ceramic and polymer/glass interfaces using computational cells are presented in this work. The ductile polymer film is bonded to stiff elastic substrates. Plane strain Mode I crack growth is studied under conditions of small scale yielding. Void growth and coalescence is described by the extended Gurson model [1, 2] which incorporates vapor pressure as an internal variable. Progressive crack growth along the interface is modeled by a row of Gurson cell elements. Crack growth resistance curves are computed for a range of vapor pressure loadings. A primary objective is to gain some understanding of the relation between vapor pressure levels and die macroscopic interface fracture toughness. The contribution of plastic dissipation in the film to the total work of fracture is investigated as well. Numerical results show significant reduction in both initial and steady state fracture toughness as vapor pressure levels increase. These findings provide some insights into the role played by vapor pressure in IC package failures by interface delamination and popcorn cracking [3, 4]. In addition, the effect of initial void volume fraction on the macroscopic fracture toughness is briefly discussed.

1

Problem Formulation

Figure 1(a) shows the schematic of a ductile layer sandwiched between two identical semi-infinite elastic substrates. A semi-infinite crack lies along the upper interface. The ductile layer of thickness h is assumed to be elastically isotropic with Young's modulus E and Poisson's ratio v. Its response is characterized by the J2 flow theory. The true stress logarithmic strain relation in uniaxial tension is specified by e=— E

for a < cr0, a

\V"

for a > a0,

where (j 0 is the initial yield stress and N is the strain-hardening exponent. The two elastic substrates are modeled as isotropic, linear elastic with Young's modulus Es and Poisson's ratio vs. To model void growth and coalescence along the interface, we adopt the methodology proposed by Xia and Shih [5]. Figure 1(b) and 1(c) show the finite element mesh for small-scale yielding analysis. The fracture process is confined to a planar layer of initial thickness D ahead of the crack. A row of uniformly sized voided cells, each of dimensions D x D and initial void volume fraction / 0 , is embedded ahead of the initial crack tip. The voided cells are governed by an extended Gurson constitutive law [1, 2, 6] for porous material with vapor pressure as an internal variable. This relation governs the progressive damage by hole growth and coalescence at the interface. The extended Gurson flow potential O takes the following form: ^(ae,am,a,f,P)

= (^A

+2qifCosh(3q^C7'"_+P>)-{l

539

+

(qJ)2)=0,

540

where <7e is the macroscopic effective Mises stress, am the macroscopic mean stress and <7 the current flow stress of the matrix, /denotes the current void volume fraction and p is the internal pressure introduced by Guo and Cheng [2]. qx and q2 are the adjustment factors introduced by Tvergaard [6] to improve the accuracy of the model.

Figure 1. (a) Schematic of a ductile layer joining two elastic substrates with an upper interface crack, (b) Finite element mesh of inner region, (c) Close-up view showing the row of void-containing cells embedded ahead of crack tip.

For a plane strain crack subjected to mode I loading, the Griffith energy release rate, G, is related to the mode I stress intensity factor, K, by Irwin's relation,

Under small scale yielding, the criterion for crack advance, Aa, is given by G = T, where r is the crack growth resistance. From dimensional analysis, the interface fracture toughness depends on the following non-dimensional quantities:

I^£) DCT0

= F[^£.3L

yD

E

^Nvv.f E

£s.\ a0J

Present study will focus primarily on the role of internal vapor pressure P<JOQ on the crack growth resistance.

541 (a)

20

30 Aa/D

(b)

p 0 /o 0 PoAo PoAo - Po/co

40

60

80

= = = =

0.0 0-5 1-0 1-5

100

Aa/D Figure 2. Resistance curves showing the effect of internal vapor pressure pjoo. (a) Initial porosities/o = 0.01 and (b)/ 0 = 0.05.

2

Results and Discussion

The computational results for initial porosities f0 = 0.01 and 0.05 are presented. The following set of material parameters is used: a^E = 0.01, EJE = 10, N =0.1, v= 0.4, vs = 0.3.

542

Figure 2 shows crack growth resistance curves for a range of internal vapor pressure. At the initial phase of crack growth, plastic dissipation is insignificant and the total work of fracture /"is effectively equal to the work of fracture process r0 [7]. However, plastic dissipation /p increases as growth continues and this contributes to a rising resistance curve. When crack extension reaches a critical size, steady-state propagation takes place and the corresponding fracture toughness is denoted by the asymptotic peak value 7^s. From Figure 2 it can be seen that internal pressure lowers the toughness of the interface. The work of fracture process r0 decreases (almost linearly) as po/Ob increases, for both/ 0 = 0.01 and 0.05. This shows that void-containing cells are pre-damaged and softened by the internal pressure and lesser energy is thus required for the onset of crack growth. During transient and steady state growth, /p, (and therefore r) is a nonlinear function of po/Ob- In the case of large initial porosity, f0 = 0.05, /J. decreases as the vapor pressure increases. For small initial porosity, f0 = 0.01, rP is nearly constant. That is, for small /o, an increase in internal pressure has little effect on the plastic dissipation /p; however the work of fracture process r0 is reduced. By contrast, for large initial porosity, both r0 and 7p are lowered as the internal pressure is increased. In summary, vapor pressure effects on resistance curves are significant when the interface porosity,/0, is large. At high vapor pressure, the resistance curves are nearly flat, exhibiting brittle like characteristics. By contrast, vapor pressure effects are minimal at low values of/0. As/ 0 -» 0, the resistance curves for the different values of po/ob under consideration approach the curve for po/ob = 0. The findings in this study also show that fracture toughness decreases as/ 0 increases, a result previously reported by Xia and Shih [5]. References 1. Gurson A. L., Continuum theory of ductile rupture by void nucleation and growth part I. Yield criteria and flow rules for porous ductile media. Journal of Engineering Materials and Technology 99 (1977) pp. 2-15. 2. Guo T. F. and Cheng L., Modeling vapor pressure effects on void rupture and crack growth resistance. Acta Materialia 50 (2002) pp. 3487-3500. 3. Galloway J. E. and Miles B. M., Moisture absorption and desorption predictions for plastic ball grid array packages. IEEE Transactions on Components, Packaging, and Manufacturing Technology 20 (1997) pp. 274-279. 4. Galloway J. E. and Munamarty R., Popcorning: a failure mechanism in plastic encapsulated microcircuits. IEEE Transactions on Reliability 44 (1995) pp. 362-367. 5. Xia L. and Shih C. F., Ductile crack growth - I. A numerical study using computational cells with microstructurally-based length scales. Journal of the Mechanics and Physics of Solids 43 (1995) pp. 233-259. 6. Tvergaard V., Material failure by void growth to coalescence. Advances in Applied Mechanics 27 (1990) pp. 83-151. 7. Tvergaard V. and Hutchinson J. W., Toughness of an interface along a thin ductile layer joining elastic solids. Philosophical Magazine A 70 (1994) pp. 641-656.

DISTORTION PREDICTION USING FINITE ELEMENT METHOD Y. C. TSE, P. LIU, Y. Y. WANG, C. LU Institute of High Performance Computing, 1 Science Park Road, #01-01 The Capricorn, Singapore Science Park II, Singapore 117528 E-mail: liuping @ ihpc. a-star. edu. sg, G. R. LIU National University of Singapore, 9 Engineering Drive 1, Singapore 117576 E-mail: [email protected] K. P. QUEK Sunstar Logistic Singapore Pte Ltd, 10 Science Park Road, #04-16/17 The Alpha Singapore Science Park II, Singapore 117684 E-mail: quek. kwang.peng @ unisunstar. com The entire heat treatment process, including induction heating followed by oil quenching, for sprockets made of S45C mid-carbon steel has been systematically analyzed. To study the effect due to different sprocket cutout dimensions and positions, the finite element method has been implemented and coupled thermal and structural analysis has been performed. Phase transformation and temperature dependent material properties are incorporated in the simulation. The simulated temperature distribution and distortion of sprockets have also been investigated, and the results show a good agreement with the experimental results.

1

Introduction

Induction hardening has wide applications in manufacturing industry and is particularly well suited to harden steel components. This method involves heating the component by induced current to a certain temperature at which the rate of formation of austenite is very rapid, and then quenching it to transform the austenite into martensite. The hardness thus formed is higher than that obtained by conventional methods of hardening. However, due to the high temperature gradient, the local plastic deformation and the phase transformation, severe distortion may occur during the heat treatment process. In the experimental study of heat treatment of a sprocket, it often induces distortion problems especially for designs with large cutouts. Deciding the cutout pattern designs, with acceptable distortion, at the design stage will save manufacturers a substantial amount of money and time. Detail analysis of the heat treatment process is very important. The computer simulation offers a powerful tool in designing sprockets and solving manufacturing problems as illustrated in [7-11]. The main objective of this work is to develop sprocket prototype rapidly for our client with maintaining certain accuracy. Pervious work has shown that the factors affecting the distortion of sprockets during the heat treatment process can be categorised into three types: the geometry of sprockets, the material properties and the heat treatment history. In the present study, the finite element packages, MSC/PATRAN and ABAQUS, are used in the analysis. Comparing with the experimental results, the simulated distortions of different sprocket designs after heat treatment are also discussed.

543

544

2

Simulation Model

In the induction heating process, a sprocket is heated by the induced current. This analysis should be simulated as coupled electromagnetic-thermal-structural analysis. However due to lacking of information on material properties, we simplify the electromagnetic-thermal transmission by applying a thermal boundary condition, heat flux. The proper value of the heat flux was obtained using trial fitting procedure. The sprocket is then immersed into an oil tank immediately, and convection has been simulated in the quenching process. The thickness of a sprocket is small compared with the other dimensions so that it is a plane stress problem. The governing equation for the heat transfer in cylindrical coordinate as well as the boundary and initial conditions are given as 1 3 (. 3T^| Id fkaT^I _3T q = h(Ts-TM) and T(r,0,t)lt=o=Tinjtial (2 & 3) The distortion is affected by the following critical factors, which are implemented into the finite element model. 1. The temperature dependent material properties, both thermal and mechanical, must be considered in the model. The available material data are given in [1-6] expect for the mechanical properties at high temperature of 600°C-1400°C. Thus a trial fitting is applied to estimate the proper material properties by matching the simulated distortion and the experimental one after the heat treatment. 2. Phase transformation occurs due to significant changes in temperature in the heat treatment process. In the present study, only Austenite and Martensite are considered and the local temperature determines the phases of the material. 3. The elastic-plastic constitutive relation has to be applied in the model to account for the local plasticity and is given as a/E £ =

(a y /E)(a/a y )"

a
o>oy

(4)

The analysis employs the Von-Mises yield criterion and isotropic hardening. As discussed above, there are two parameters that are required trial fitting, namely the heat flux and the mechanical material properties at high temperature. There are 12 different sprocket designs, in which the experimental results are provided, consisting of different sprocket sizes and cutout locations, which four of them are shown in figure 1.

a.) Nol (42 teeth)

b.) No2 (43 teeth) c.) No3 (45 teeth) Figure 1. Finite element models for 4 different designs

d.) No4 (47 teeth)

545

3

Simulation Results and Discussions

Figure 2 shows a comparison of temperature distribution between the experiment and simulation, in which the fitted heat flux is applied, for the model No.l after induction heating stage. The results are in good agreement that higher temperature is concentrated at the teeth and teeth bottom.

a.) experimental results b.) simulation results Figure 2. Comparison between experimental and simulation thermal results. The fitting procedure can then be Table 1 Comparison of experimental and proceeded to estimate the mechanical material simulation runout results properties at high temperature. The distortion is Model No. Teeth % error defined as the difference between the maximum No.l +8.0 and minimum teeth bottom runout, measured No.5 -19.0 from the sprocket centre. The material selection No.6 42 +16.0 No.7 -8.0 is based on the 42 teeth models, while other No.8 -17.0 models are used to verify the estimated material No.2 +11.0 properties. It has been shown in Table 1 that the No.9 43 -19.0 simulated teeth bottom runout with applying the No. 10 -19.0 fitted material properties can achieve an No. 11 +9.0 No.3 45 +9.0 accuracy of ±20% comparing with experimental No. 12 -3.0 runout results. 47 -13.0 1 No.4 The whole sprocket expands in the induction heating process, and the expansion of sprocket teeth located above the cutout is larger. Conversely, shrinkage occurs in Phase transformation quenching. However the contraction of the curve teeth located above the cutout is smaller. This is due to the deformation induced by local plasticity occurred at the sprocket teeth above Heat-treated the cutout during the heating process. Plastic deformation causes permanent dimensional Non heat-treated changes, which counteracts the shrinkage created during the quenching process. The maximum stress after the heat treatment occurs Cutout at the teeth bottoms which are not above the Figure 3. Sprocket Model cutouts. The distortion of sprocket depends on the cutout patterns and their positions. As identified from the experiments, the cutout length

546

and the cutout position (Figure 3) are the critical factors in affecting the distortions. The analysis is based on 43 teeth sprockets. Figure 4 shows as the cutout length increases, more heat is accumulated above each cutout which leads to more distortions. However when the cutout length is large enough, it restricts heat from conducting to the core rapidly. Thus the temperature at the teeth bottom becomes more uniformly distributed and distortions start to reduce. In the case of cutout position, the teeth bottom runout decreases significantly as the cutout position is increased. The phenomenon can also be explained by heat accumulations. Based on the present simulation results, to achieve small distortion, cutout length around 60mm and small cutout position should be avoided. 0.24

0.14

/^\

0.20

|

s i s

Runout /mm

0.12

0.04

"3 0.12 o

1006

/

\

0.04

0.02

20

*

0.16

40

60

Cutout Length (CL)

80

100

•

^^^•.

0.00

12

16

20

24

28

• 32

36

40

44

Cutout Position (CP)

Figure 4. Simulation results for models varying with different CL and CP

4

Conclusion

The plane stress model for coupled thermal-mechanical simulation is developed and the electromagnetic simulation is simplified by applying heat flux, and the error for the teeth bottom runout is within ±20%. The geometry of the sprockets has a considerable effect in the distortion. The cutout sizes and its locations affect the transient temperature distribution in induction heating and the oil quenching processes, and therefore, causing the distortion in sprockets. Cutout lengths and cutout positions are important parameters in determining sprocket distortions.

References 1. Bauccio, M.L. ASM Metals Reference Book, ASM, 1993. 2. Fletcher, A.J. Thermal Stress and Strain Generation in Heat Treatment, London and New York, 1989. 3. Harvey, P.D. Engineering Properties of Steel, ASM, 1982. 4. Prabhudev, K.H. Handbook of Heat Treatment of Steels, New Delhi, Tata McGrawHill, 1988. 5. Gur, C.H. - Tekkaya, A.E. - Ozturk, T. Proceedings of the Second International Conference on Quenching and the Control of Distortion, Cleveland, Ohio, 1996, p. 305. 6. Henriksen, M. - Larson, D.B. - Van Tyne, C.J. Proceedings of the First International Conference on Quenching & Control of Distortion, Chicago, Illinois, USA, 1992, p. 213. 7. Petrus, G.J. - Krauss, T.M. - Ferguson, B.L. Proceedings of The First International Conference on Quenching and Control of Distortion, Chicago, Illinois, USA, 1992, p. 283.

547

8. Tszeng, T.C. - Wu, W.T. - Semiatin, L. Proceedings of the Second International Conference on Quenching and the Control of Distortion, Cleveland, Ohio, 1996, p. 321. 9. Fuhrmann, J.D. - Homberg, - Uhle, M. The Int. J. Computation and Mathematics in Electrical and Electronic Engineering, 18(3), 1999, p. 482. 10. Wang, K.F. - Shandrasekar, S. - Yang, H.T.V. J. Materials Engineering and Performance, 4(4), 1995, p. 460. 11. Zgraja, J. - Pantelyat, M.G. Int. J. Applied Electromagnetics and Mechanics, 10(4), 1999, p. 303.

INTERFACE PRESSURE DISTRIBUTION IN AUTOMOTIVE DRUM BRAKE A. TOMA, M. TAKLA AND A. SUBIC Department of Mechanical & Manufacturing Engineering, RMIT PO Box 71 Bundoora, Vic 3083, Australia

University

J. ZHAO PBR International,

Melbourne,

Australia

In the conceptual design stages of automotive dram brakes, it is necessary to estimate the performance and durability of the brake system, which are significantly affected by the interface pressure distribution between the drum and lining surfaces as well as the temperature changes in the brake system. Conventional (theoretical) analysis of the drum brake has to ignore the geometric distortions and thermal stresses generated during the application of brakes. Finite element analysis (FEA) techniques were adopted to overcome this limitation. This enabled the calculation of the brake output to take into account the three-dimensional geometric distortions. This paper presents the results of a three-dimensional analysis of a single-shoe automotive dram brake using FEA. The brake system is modeled using brick elements having both mechanical and thermal degrees of freedom, which allows for simulating the heat flow associated with brake application. Frictional finite sliding contact and coupled thermo-mechanical effects were taken into consideration. Also, both brake shoe and drum were modeled as deformable bodies, which allowed for calculating the effects of geometric distortions on the performance of the brake system. Mesh-related problems and their effects on the accuracy of the calculated contact pressure distribution are discussed. This work represents a significant step towards the innovative approach of the virtual development of a brake system.

1

Introduction

Accurate calculation of the interface pressure distribution between drum surface and lining pads leads to a better estimation of the brake output. Conventional (theoretical) analysis of the drum brake assumes both shoe and drum to maintain original geometry during brake application. Finite element analysis techniques have been adopted [1-8] to overcome these limitations. This enabled the analysis to include elastic strains of the brake system. Two-dimensional analysis of the drum brake output [1] assumes constant pressure distribution in the axial direction. Hub fitting at one end of the drum significantly raises the stiffness at that end, producing uneven distortion, and accordingly uneven pressure distribution, in the axial direction. Estimating equivalent stiffness of the shoes and drum with two-dimensional model [1] or adjusting drum stiffness to a mean value [2] would not compensate the model for pressure variations. Single shoe mechanism is not symmetric in axial dimension (Fig. 1). Accordingly three-dimensional analysis is required to enable prediction of axial pressure variation. Three-dimensional thermo-mechanical analysis yielded unrealistic fluctuation of the interface pressure distribution along the circumference of the linings pads [4],[5]. 2

Modeling Considerations

The subject of this investigation is a single-shoe leading-trailing drum brake with single actuator (Fig 1). The shoe is one unit, which constitutes a central body extended on both sides to include leading and trailing parts, where lining pads are attached. In the FEM

548

549

model, the lining pads and the shoe are integrated but have different material properties. The outer surfaces of the linings and inner surface of the drum are assumed to be initially in perfect contact. These two simplifications reduce the required CPU without having significant effect on the accuracy of the results.

Figure 1. Drum Brake Assembly

Both drum and shoe are modeled as deformable bodies. In contact interaction, the internal surface of the drum is the master and the lining pads are the slaves. Constant static friction coefficient of 0.3 is assumed. Geometry and orientation of the master surface as well as the location of the slave node with respect to master surface affect the calculation of contact pressure. The whole model is assumed to have a constant initial temperature of 50 °C. The main body of the shoe is fixed and both sides of the shoe are forced initially to expand against the drum by applying opposite forces of 1400 N to the shoe-tips. This simulates the force applied by the brake piston. Keeping the connecting body of the shoe fixed and the load applied, the drum is then rotated with an angular velocity of 55.2 rad/sec, which is equivalent to a vehicle speed of about 60 km/h. This simulates to a great extent applying the brakes to a vehicle going down hill in order to keep its speed steady. In mechanical analysis, the drum, lining pads and the shoes are meshed with eightnode linear brick element. The main body of the shoe is meshed with four-node tetrahedron element. Coupled thermo-mechanical analysis requires elements with both temperature and displacement degrees of freedom. However, elements with only displacement degrees of freedom can be used simultaneously in zones with negligible heat flow. The drum could be easily meshed with brick elements. The shoe is geometrically complex. Shoe and linings could also be meshed with brick elements, which have temperature and displacement degrees of freedom. The connecting body of the shoe is meshed with tetrahedral elements to accommodate its geometric complicities. This region is having negligible heat flow, which allows meshing with an element having only displacement degrees of freedom, thus saving CPU time. 3

Results and Discussions

Preliminarily mechanical analysisignoring heat flow, was first conducted to investigate unexplained results in literature [4, 5]. Initial static application of load produced periodical fluctuation of contact pressure along the circumference of the linings (Fig. 2).

550 This was found to be caused by using first order (flat surface) elements to define the curved contact surfaces. Rotating the drum causes pressure fluctuation areas to move along the circumference of the drum depending on the drum position. This was found to occur due to using different mesh sizes for the contact surfaces, which produced two sets of artificially prismatic surfaces with different element sizes, rotating relative to each other. That fluctuation could be eliminated (Fig. 3), by using the same mesh size for both contact surfaces and controlling the analysis to calculate the contact pressure, when the drum and lining's nodes coincide during drum rotation. Initial pressure values varied from 2.75 MPa at the ends of linings to 0.12 MPa in the centres. During drum rotation, the pressure values increased to 4.3 MPa at the ends of the leading pad and to 0.5 MPa in its centre. Pressure values decreased slightly in the trailing pad with drum rotation. Pressure values and contour shapes as well as the value of brake factor remain unchanged with continuing drum rotation.

II L.;:::;;::— Figure 2. Fluctuating Contact Pressure

Figure 3. Contact Pressure Distribution

Coupled thermo-mechanical model considers the combined effects of mechanical loads and the heat generated by friction. Initial pressure distributions are identical with the mechanical model. Initial increases in pressure due to drum rotation are identical to those of the mechanical model. With further drum rotation, the brake factor decreases slightly due to the effect of the heat generated by friction, which raises the temperature of the contacting surfaces, causing thermal expansions, basically in the drum, which in turn causes the internal drum diameter to increase, thereby reducing brake factor, which is further decreased due to uneven thermal expansion at the surface of the linings. The stress distribution in the shoe due to expanding the shoes with fixed drum, is almost identical in both leading and trailing shoes with maximum Mises stress of about 71 MPa at the points of load application of the shoes and about 28.5 MPa at the elastic joining points. The maximum stress in the drum is 7.08 MPa. Rotating the drum increases the stress in the elastic joining point of leading shoe to about 123 MPa. It also produces local temperature increases in the drum, accordingly inhomogeneous thermal expansions, resulting in gradual increase in the stresses Heat generated by friction is proportional to contact pressure areas. Consequently temperature distribution takes the same pattern as the contact pressure distribution at the lining surface. This temperature and pressure distributions are expected to cause similar pattern of wear in the contacting surfaces of the linings. Temperature distribution at the internal drum surface is more homogeneous due to rotation, which causes continuous changes in the contact pressure for whole contact area. Maximum temperature in the drum reaches about 91 °C after five seconds of braking

551 4

Conclusions

Different patterns of periodical fluctuation of contact pressure occur due to using first order element to mesh the master surface as well as meshing mating surfaces with different mesh sizes at the interface. These fluctuations could be eliminated by using same mesh size for both surfaces and controlling the analysis to calculate the contact pressure, when the drum and linings nodes coincide. Pressure distributions resulting from both mechanical and thermo-mechanical models are identical at the beginning of rotation. In the thermo-mechanical model, long period of brake application causes gradually a slight decrease in contact pressure (brake factor) as well as gradual increase in stresses and temperatures of the drum. There is also a slight increase in the stresses in shoes and linings. The pattern of the pressure distribution of the thermo-mechanical model changes with braking time due to thermal expansion at the interface. Results can be enhanced by implementing second order elements, which can define curved surfaces. Further research can also include introducing initial gap between the contact surfaces, investigating thermal effects on material behavior including friction properties, considering heat dissipation to the environment due to convection and radiation, as well as considering inertia effects. 5

Acknowledgements

The authors acknowledge the Victorian Partnership for Advanced Computing (VPAC) for financial support through expertise grant as well as access to their HPC facility. PBR International is acknowledged for providing technical data. 6

References 1. Schafer D., Estimation of the friction interface pressure distribution in automotive brakes, BBA Friction Research & Development (UK). 2. Day A. J., Harding P. R. J. and Newcomb T. P., A finite element approach to drum brake analysis, Proc. Inst. Mech. Engrs. 193 (1979) pp. 401-406. 3. Watson C , New Development in Drum Brake Analysis, SAE Technical paper 902249 (1990). 4. Hohmann C , Schiffner K., Oerter K. and Reese H., Contact analysis for drum brakes and disk brakes using ADINA, Computers & Structures 72 (1999) pp. 185-198. 5. Watson C. and Newcomb T. P., A three-dimensional finite element approach to drum brake analysis, Proc Inst Mech Engrs 204 (1990) pp. 93-101. 6. Day A. J. and Harding P. R. J., Performance variation of cam operated drum brakes, Inst. Mech. Eng., conference on Braking of Road Vehicles, C10/83, (1983) pp. 69-77. 7. Rao R., Ramasubramanian H. and Seetharamu K. N., Computer modeling of temperature distribution in brake drums for fade assessment. Proc Inst Mech Engrs 202 (1988) pp. 257-264. 8. Thuresson D., Thermo-mechanical Analysis of Friction Brakes, SAE Technical paper 01-2775 (2000). 9. Loh W. R., D., Basch R. H., Li D. and Sanders P., Dynamic Modeling of Brake Friction Coefficients. SAE Technical paper 01-2753 (2000) pp. 7-16.

A NEW HIGH PRECISION DIRECT INTEGRATION SCHEME FOR NONLINEAR ROTOR-SEAL SYSTEM

J. HUA 1 , Z. S. LIU 2 , Q. Y. XU 1 AND S. SWADDIWUDHIPONG 3 Dept. of Engineering Mechanics, Xi 'an Jiaotong University, P. R. China, e-mail: [email protected]; [email protected] 2

710049,

Computational Mechanics Division, Institute of High Performance Computing, 1 Science Park Road, #01-01 The Capricorn, Singapore, 117528 e-mail: [email protected] 3 Department of Civil Engineering, The National University of Singapore, 10 Kent Ridge Crescent, Singapore, 119260; e-mail: [email protected]

In this paper, the nonlinear mechanics model of rotor-seal system is established with Muszynsca seal forces. A new, efficient and high precision direct integration scheme is proposed based on the 2N type algorithm for the computing of exponential matrix. The proposed model and numerical integration method are employed to investigate the nonlinear phenomena in the unbalanced rotor-seal systems. To study the influence of the seal to the nonlinear characteristics of the rotor system, the bifurcation diagrams associated with various rotor speeds are presented and the course of the system shifting from the steady state to the unsteady state is analyzed. The study demonstrates that the proposed high precision direct integration method can be effectively applied to nonlinear numerical analysis of rotor-seal system. The scheme is significantly less sensitive to the size of time step compared with other existing methods. A larger time step may be used and computing time substantially reduced.

1.

Introduction

The seal characteristic is one of the most important factors affecting the performance and behavior of the rotor system. Nonlinear phenomena in the rotor-seal system can be observed. Examples are (i) the appearance of double periodic motion, periodic motion or quasi-periodic motion when the steady state of the rotor-seal system is lost, and (ii) the possible occurrence of the enhancement of the unsteady speed when the unbalance parameter is increased properly. Therefore, it is imperative to understand the nonlinear characteristics of the rotor-seal system in the design of the system. In many cases, the seal forces may damage the rotor-seal system although the forces are smaller than the fluid film forces of the bearing [1]. Steam-excited problem become more crucial with the increase of rotating speed, medium pressure and the rotor flexibility and the decrease of seal gap. Research on the mechanism of fluid-solid coupling and the control of the steam excited vibration in rotor-seal system has become one of the key issues for modern turbomachinary design. Extended three-control-volume model [5] is adopted to demonstrate the dynamic characteristic of labyrinth seals. Most research work is limited to the calculation of the linear dynamic coefficients and the evaluation of stability. However, the nonlinear nature of the seal forces has to be considered when the mechanism of the instability is explored. Subharmonic mechanism and the influence of unbalancing for single disk rotor-seal system are analyzed using Muszynsca seal force [2]. The common disadvantages of some general numerical methods to solve structural dynamic equations are their sensitivity to time step and low precision. The precise integration method proposed by Zhong [4, 7] can be used to solve conveniently structural

552

553

dynamic equation with high precision. The scheme is significantly less sensitive to the size of each time step compared to other existing methods [3]. In this paper the nonlinear dynamic characteristic of rotor-seal system is studied using Muszynska seal force model. In order to improve computing precision and save computing time, the precise integration method is adopted in the algorithm. 2.

The model of unbalanced rotor-seal system

The equations of motion of the single disk rotor seal system shown in Fig. 1 are: 0 1 , fcoscof] m 0 f*l [n ol f*l K ol u r+ r+ \ •mg\ [sincofj 0 m 0 De

(1) w L ° *.J h where m is the mass of disk, D the damping coefficient at disk, K the stiffness of shaft

W

e

e

at disk, r the value of unbalanced mass and GO the rotating speed;. The Muszynsca seal forces acting on the disk are given by [6] TCdD D 2xmf(0 0 K-mt ror 2 -2tm w D K-m T(o { f ; -TCQD K = K0(l-e2y" . D = D0(\-e2y" , n=0.5~3 4

2

(2)

(3)

2 U2

) .» 0u < 6b << ll , . (-c,, (x f+ y y) i = x0.(ld - ee)° U „ < 0.5), 0.5 ;, e = (X

) /c/c

(4)

where c is the gap of seal; K , D and m / are stiffness, damping and inertia mass, respectively; T,K ,D are nonlinear functions about xand y [3]. 1i

_y

\ \

( [ y

• 4V

777T

X J

m

1' ^

Figure 1. Rotor-seal system

3.

The precise integration method

The n-dimensional structural dynamic equations can be written as: Mx + Cx + Kx = f(f,x,x), x(f0) = x 0 , x(f0) = x„ Introducing p = Mx + Cx/2, Eq (5) becomes V = HV + r,

V(0) = V0

(5) (6)

where V={x T

pT}\

H=

A B

D G

• = {0T f T } T

A=- - M C , B=-CMC-K, G=-icM', 2 4 2 The general solutions of the homogeneous equation, V = HV is V=exp(H r)V„

(7)

D=M

(8)

(9)

554

Let x represents a time step, then V(T)=exp(H-x)V,=T-V,

(10)

m

where T = exp(H • x) = [exp(H • x/m)]

(11)

Select m - 2" , (if AT = 20, m = 1048576) and hence Af = x/m is very small. e x p ^ - A / ) ) - I + HA/ + (HA/)72 + (HA/) 3 /3!+(HA?) 4 /4! = I + T

(12)

where T, =HAf + (HAf)2[l + (HAf)/3 + (HAf)7l2]/2

(13)

To is small in magnitude. In order to avoid loss of numerical precision, the following expression is adopted in computing implementation: T = (I + To)2" =(I + Taf''x(l + Ty' (14) The nonhomogeneous term, r, in Eq. (6) is assumed linear in the time step (tk,tk+l) . \ = U\ + rl)+ri(t-tk), r.=KVt-,.'*-.).

t = tk, V=Vk

(15)

r.^rCVV,,^,) at

(16)

Then V, can be written as Vk =T[Vk_, +H-'(r, +H" 1 r 1 )]-H'[r 0 + H r, +r, x] The following recursion is adopted to improve the precision of the results

(17)

Vt = T ( r i - f M ) - V M + / t . 1 ^ ! « X A a y ) . T ( r > ) l

(18)

7=1

where A, t, are extremum and integration point of Gauss quadrature, respectively. 4.

Numerical analysis

Based on the values of parameters given in Table 1 and adopting the precise integration method, the bifurcation diagram of rotor center is shown in Fig. 2. Note that x is the Poincare mapping point and s = a>/^Kjm is the nondimentional rate of rotating speed. The jump phenomenon of the response appears at a low speed of s =0.8. Increasing the rotating speed, synchronous periodic motion is observed. When 5=1.5, the stable period motion appears with the Floquet multipliers of -0.3439 + 0.701 li, -0.3439 - 0.701 li, 0.0012+ 0.0084i, -0.0012 - 0.0084i as shown in Fig. 3. Figure 4 illustrates the double periodic motion when s = 3.11 with the Floquet multipliers of -1.04936, -0.06868 + 0.0063H, -0.06868 - 0.0063H, -0.91485 where one main Floquet multiplier passes through unit circle at point (-1, 0). There exist two isolated points on Poincare map corresponding to the period doubling bifurcation and the half frequency whirl of the rotor-seal system. As s exceeds 3.4, quasi-periodic motion appears and a close curve is observed on Poincare map as shown in Fig.5. The (1/4) and (1/5) subharmonic motions are observed at s=4.0 and 5.0 as depicted in Figs 6 and 7 respectively. As rotating speed increases continuously, the motion of the system becomes more and more complex.

n

x

b

2.0

0.3

0.5

Table 1. Parameters and values used /J m z c(m) 0.079

0.25

0.1

0.0025

r(m) 0.0002

555

5.

Conclusion

The nonlinear model of rotor-seal system is established with Muszynsca seal forces. An efficient and high precision direct integration scheme is used to investigate the nonlinear behavior of the unbalanced rotor seal-system. The bifurcation diagrams associated with various rotor speeds are obtained to study the influence of the seal to the nonlinear

Figure 2. Bifurcation diagram of rotor center

(a) Trajectory diagram of rotor center Figure 3. Periodic motion

(b) Poincare map

(a) Trajectory diagram of rotor center (b) Poincare map Figure 4. Double periodic motion

(a) Trajectory diagram of rotor center (b) Poincare map Figure 5. Quasi-periodic motion

556

(a) Trajectory diagram of rotor center (b) Poincare map Figure 6. 1/4 subharmonic motion

(a) Trajectory diagram of rotor center (b) Poincare map Figure 7. 1/5 subharmonic motion

characteristics of the rotor system and the course of the system changing from the steady to the unsteady state is analyzed. Several nonlinear motions in the system such as periodic vibration, double periodic vibration, and quasi-periodic vibration are illustrated. The study demonstrates that the proposed high precision direct integration method can be effectively applied to the nonlinear numerical analysis of rotor-seal system.

References [1] Black H. F. and Cochrane E. A., Leakage and hybrid bearing properties of serrated seals in centrifugal pumps. Proc. 6th Int. Conference on Fluid Sealing, Munich, Germany (1973) pp. 61-70. [2] Chen Y., Ding Q., and Hou S., Stability and Hopf bifurcation of nonlinear rotor seal system. J. Vibration Engineering, 10(1997) pp.368-374. [3] Hua J., Nonlinear Dynamics Stability of Rotor-Bearing Systems. Ph.D Thesis, Xi'an Jiaotong University, (2002). [4] Liu J., Shen W, and Williams F.W., A high precision direct integration scheme for structures subjected to transient dynamic loading. Computer & Structures 56 (1995) pp.113-120. [5] Marquette O. R., Childs D.W., An extended three-control-volume theory for circumferentially grooved liquid seals. ASME J. of Tribology, 118(1996) pp.276-285. [6] Muszynska A., Improvements in lightly loaded rotor/bearing and rotor/seal models. J. Vibration, Stress, and Reliability in Design 110(1988) pp. 129-136. [7] Zhong W., Precise computation for transient analysis. Computational Structural Mechanics and Applications (in Chinese) 12(1995) pp.1-6.

DELAMINATION IDENTIFICATION USING PIEZOELECTRIC FIBER REINFORCED COMPOSITE SENSORS Ping Tan and Liyong Tong School of Aerospace, Mechanical and Mechatronic Engineering University of Sydney, NSW 2006, Australia E-mail:

pingtan®'aeromech.us\d.edu.au

In this paper, a dynamic analytical model is proposed to detect a delamination embedded in a laminated composite beam bonded with piezoelectric fiber reinforced composite sensors (PFRCSs). Subsequently, a numerical study is conducted to investigate the effect of piezoelectric fiber orientation angle 6 on the first three natural frequencies, sensor charge output distribution (SCOD) and normalized sensor charge output distribution (NSCOD). The influence of delamination length and location on the SCOD is also discussed. A comparison of the first three natural frequencies between the analytical and finite element analysis models is conducted for the cases of ft=15°, 45° and 75°. It is noted that there is a good agreement between these two models.

1

Introduction

Laminated composites are widely used as structural materials due to their high specific stiffness and strength. However, these advantages are often limited by their lower interlaminar fracture toughness which make them sensitive to delamination. Hence, in recent years, delamination detection has attracted a significant attention in composite community, because of its importance in evaluating the reliability of a laminated composite structure. It is noted from our literature review that pure piezoelectric materials have been widely used as sensors/actuators to identify a delamination embedded in a laminated composite beam [1-2]. However, application of piezoelectric fiber reinforced composites in delamination detection is very limited. In this paper, a dynamic analytical model is proposed to identify the presence, size and location of a delamination embedded in a cantilever laminated composite beam. A numerical study is carried out to investigate the influence of 0 on the first three natural frequencies, SCOD and NSCOD vs x location. The effects of delamination length ld and axial delamination location Xd on the SCOD is also discussed. A comparison of the first three natural frequencies between the present analytical and finite element analysis (FEA) models is conducted, and a good agreement is noted. 2

Model development

In this investigation, we consider a laminated composite beam bonded with identical PFRCSs on both top and bottom beam surfaces (see Fig. 1). For simplicity, a single delamination embedded in the beam system with PFRCSs is considered here. For the geometry shown in Fig. 1, the beam system can be subdivided into three major span-wise regions, namely region I, II and III, respectively. Each region is considered to be made up of beam and sensor segments, e.g., for region I, it consists of the upper sensor, host beam and lower sensor segment. For simplicity, it is assumed that there is no stress transferring between the upper and lower delaminated beam segments in region II. Each segment is modelled as an Euler beam, and thus the corresponding equation of motion for each segment can be obtained based on the classical beam theory. For example, the

557

558

corresponding dynamic equations of motion for the top sensor segment in all three regions (see their FBD in Fig. 2), can be obtained by ut Tu,,bt<

u

kus u A " ^kus ku s = _ 0 (1-3) A W =1~~Z iri- ++TTkub, PsA^kus =-%^ "+ C-kus kub' Ps s kus ~~Z " " °kub, " t o " ' ~~ a - ^ -w> ~ ox longitudinal displacement ox and w is the ox transverse displacement 2 where ukus is the for the km top sensor in the region k. The axial forces Tkus and bending moments Mkus are given by

Ps^kus

T -hYt 1 OI l kus ss

dUkus

-, OX

'

M kus

lvl

-

bYst

* ,~ 12

52wfa

_ T dx

" '

in which b, Ys and ts are the width of beam system, Young's modulus and thickness for a PFRCS. Under the constant shear and peel strain assumption [3], the shear and peel stresses between the sensor and host beam can be obtained using eqs. (10-11) in the Ref. [4].

I

Region I 'Itttjltlllllltttlll

Region II

Region III

ZBBmammnm

i n i i i i A i i i i f i i i i i i r

f= PFRCSs

X

JJJJJJJJJJJJJJJJJJJ

Figure 1 A schematic for a beam system with a delamination

t m»»»>i.»»»»»>»>»»h

Delamination

Figure 2 A free-body diagrams (FBD) for the top sensor in all three regions

For the considered cantilever beam system, there are 18 applicable boundary conditions and 42 continuity conditions at the interfaces between regions I and II as well as those between regions II and III. By numerically solving the required equations of motion together with their boundary and continuity conditions, the required natural frequencies and the absolute value of strain |£5(x,ffl)| on the top and bottom sensor surfaces can be obtained. Due to that the sensor charge output (SCO) can only be measured through the electrodes on it [5], a number of electrode strips are evenly distributed along the beam length to obtain a continuous distribution of the SCO vs x location. It is worth pointing out that the width of an electrode strip should be larger than the thickness of the PFRCS.

3

Numerical study

In this investigation, a cantilever beam system with length Li,=0.3m, width fo=0.02m and host beam thickness fi,=1.9mm is considered, in which a delamination with /rf=0.05m is located atX^=0.15m. The thicknesses of PFRCS and adhesive layer are selected to be 0.4mm and 0.15mm, respectively. The required complex modulus for the host beam,

559

piezoelectric fiber and adhesive layers are 65.68(1+0.01 li), 69.2(1+0.01 li) and 2.15(l+0.011i)GPa, respectively. The piezoelectric constant en for piezoelectric fiber is chosen to be 44.37C/m2. The densities for the host beam, piezoelectric fiber and adhesive layers are selected to be 1527.38kg/m3, 7600 kg/m3 and 1600 kg/m3 [4, 6-7], respectively. The fiber volume fractions of the host beam and PFRCSs are chosen to be 0.6. Using the present model, the variation trends of first three natural frequencies with 9 can be obtained and are shown in Fig. 3, from which it is noted that an increase in 9 results in a reduction of the natural frequencies. This is reasonable because the Young's modulus of PFRCS decreases with the increase in 9. The SCODs vs x location for the 1st vibration mode are obtained and plotted in Fig. 4 for the beams with and without delamination. The abrupt axial discontinuities in Fig. 4 clearly indicate the tip of the delamination and thus the presence, size and axial location of the delamination are easily outlined. The numerical study also reveals that for the considered cases of 9 =15°, 45°, 75°, the effect of 9 on the normalized sensor charge output (NSCO) distribution for the 1st vibration mode is minor (see Fig. 5), but the influence of 9 on the SCOD is obvious (see Fig. 6). Figure 6 also illustrates that the larger value of SCO is obtained at 9 =45°. This is expected due to the fact that e31s attains to its maximum value when 9 is 45°. The numerical study reveals that the change of the SCO around the delamination tip for the case of lj=0.lm is more obvious than that of /rf=0.05m (see Fig. 7 for the case of 1st vibration mode). For the considered case of X^=0.15m and 0.21m, the influence of Xd on the change of SCO around the delamination tip is minor (see Fig. 8 for the case of 1st vibration mode). 1.20E+O0

5.00E-05

N 250 >.200 O

i^ PH _ -"—

ZJ

w

1.0OE+0O

'

3.D0E-05

eq.ll

X -H-W-i /

£

n

4.0OE-O5

w

e.ooE-oi

S i 200E-05 O O 1.00E-05 W

_v

J 6.00E-01

J " 4.00E-01

0.00E+O0

2.00E-O1

-1.00E-05

O.OOE*00

m

r^*T*"

0

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

x(m)

Figure 3 Variation trends of the first three natural frequencies with 9

5.0OE-O5 4 506-05 4.006-05

I

V

5.00E-05

I I

4.50E-05

\

4.00E-O5 3.50E-05

3.50E-05 3.006-05 2.50E-05 2.0OE-O5 1.506-05 1.006-05 S.OQE-06

aooE-oo

\ \ N \ ^ ^ ^A N SZ

t

3.00E-OS y.2.50E-05 O z.OOE-OS W I.50E-O5

5.00E-09

\ \

1 11

5.0OE-O6 O.OOE+00

M_ 0.2S

0.3

0.35

Figure 6 The SCOD for the cases of 0=15°, 45° and 75°.

1 !

4.50E-05

0.05

4.00E-05

MI

3.50E-05

\, V H^

3.0OE-OS O^2.50e-05

A1

O2.00E-05

O (O1.50E-0S

1.0OE-0S

«

Figure 5 TheNSCOD for the cases of 0=15°, 45° and 75°.

Figure 4 The SCOD for the beam with and without delamination (0=45°)

V v V.

_s

0.15

0.2

1.00E-05 500E-06

lb*.

0.25

0.3

0.35

x(m)

Figure 7 The SCOD for the cases of Lrf=0.05m and 0.1m (6^45°)

O.OM'OO

0.1

0.15.

xtm)

0.2

0.25

03

0 35

Figure 8 The SCOD for the cases of .^=0.15m and 0.21m (0=45°)

560

In order to validate the present model, three 2D plane strain FEA models are respectively developed for the cases of 0=15°, 45°, 75° using the commercial FEA software Strand7 [8]. The difference between the present analytical and FEA models ranges from 0.8% to 3.9% for the first three natural frequencies, which indicates there is a good agreement. 4

Conclusions

A dynamic analytical model is proposed to detect a delamination embedded in a cantilever laminated composite beam bonded with PFRCSs, followed by a numerical study. It is noted from the numerical study that for the case considered in this paper, the effect of 6 on the first three natural frequencies and SCOD is obvious, but that on the NSCOD is minor. The SCOD vs x location is closely related to ld and Xd. A comparison of the first three natural frequencies between the present analytical and FEA models shows a good agreement. 5

Acknowledgements

The authors are grateful to the support of Australian Research Council via a Discovery-Project Grant (Grant DP0209504). References 1. Keilers, C.H., Chang, F.-K, Identifying Delamination in Composite Beam Using Built-in Piezoelectrics: Part II-An Identification Method. Journal of Intelligent Material Systems and Structures, 6, (1995), pp.664-672. 2. Saravanos, D.A., Birman, V. and Hopkins, D.A., Detection of Delaminations in Composite Beams Using Piezoelectric sensors. NASA Technical Memorandum 106611, AIAA-94-1754, (1994). 3. Tong, L. and Steven, G.P., Analysis and Design of Structural Bonded Joints. (Dordrecht, Kluwer, 1999). 4. Tong, L., Sun, D.C. and Atluri, S.N., Sensing and Actuating Behaviours of Piezoelectric Layers with Debonding in Smart Beams. Smart Materials and Structures, 10, (2001), pp.713-723. 5. Yin L., Wang X.-M. and Shen Y.-P., Damage-monitoring in Composite Laminates by Piezoelectric Films. Computers & Structures, 59, (1996), pp.623-630. 6. Tan, P., Tong, L. and Steven, G.P., A Flexible 3D FEA Modelling Approach for Predicting the Mechanical Properties of Plain Weave Unit Cell. Proceedings of the Eleventh International Conference on Composite Materials (ICCM-11), V, (1997), pp.67-76. 7. Tan, P and Tong, L., Micro-Electromechanics Models for the Piezoelectric Fiber Reinforced Composite Materials. Composites Science and Technology, 61, (2001), pp.759-769. 8. Introduction to the strand7 Finite Element Analysis System. (G+D Computing Pty Ltd, Sydney, Australia, 1999).

A SIMPLE MODEL FOR PREDICATION OF CRACK SPACING IN CONCRETE PAVEMENTS G. CHEN and G. BAKER The University of Southern Queensland, E-mail: [email protected]

Toowoomba, and

QLD 4350,

Australia

[email protected]

This paper presents a simple model to investigate the minimum and the maximum cracking spacings in concrete pavements from the energy sense and explore the mechanism of the existence of the minimum cracking spacing. A cracking model, which is composed of two cohesive cracks and an elastic bar restrained by distributed elastic springs, is proposed to reflect the damage localization and or distribution in the concrete. By varying the length of the elastic bar of the cracking model, the tensile force on the cohesive cracks and the energy profiles are investigated. It is demonstrated that the cracking pattern varies with the length of the elastic bar (i.e., the spacing between the two possible cracks), from which the minimum and the maximum cracking spacings are obtained.

1

Introduction

Crack spacing in concrete pavements have received considerable attention for many years (Mccullough, 1983; Penev and Kawamura, 1993; Shen and Kirkner, 1999). However, no satisfactory explanation has been put forward as to why the minimum spacing exists. Shen and Kirkner (1999) attempted to tackle this problem through a 1-dimensional model that is very complicated. The present study presents a simple cracking model that consists of two cohesive cracks and an elastic bar restrained by distributed elastic springs. The force acting on the cohesive cracks and the energy profiles are investigated. The main objective is to establish the relationship between the energy variation and the crack patterns and to demonstrate that the energy minimization governs the cracking patterns. 2

A Cracking Model

A pavement can be represented by a series of sub-structures as shown in Fig. 1(a), which consists of a cohesive crack and an elastic bar. The elastic bar represents the un-cracked concrete. It is assumed that all damage within a certain distance is localized into a cohesive crack. The movement of the concrete is restrained by friction of the subgrade, which is modelled by distributed springs. In order to investigate the minimum and maximum cracking spacing, consider a cracking model as shown in Fig. 1(b). We are concerned with how the length of the elastic bar influences the cracking patterns. The equilibrium condition of the elastic bar is Ada /dx + x - 0 and the shear force is assumed x = -ku, with k being the stiffness of the

561

562

distributed springs and u the displacement of the elastic bar. The stress, a , is related to the strain, s , by Young's modulus E, i.e., • •

dtl

a = E(z - s"") = E(

e"") , where s"" is the initial strain caused either dx by shrinkage or temperature changes. By substituting a and T into the equilibrium condition, we obtain " -a2u = 0 with a 2 = Its AE dx' a(X u 1 _[-/,,«(*-*,)_ -aix-x^s + ( eMx^-x) solution is with u(x)=[(e -^ -e^^yuz - - -e• " ^ " ^ m / ^ , / = x 2 - x, and ¥ = e a / - e al. From which the total energy stored in the elastic bar and the distributed springs is obtained: Ee=

("—(Acye + ku2)dx =

[ O M 2 2 -4UXU2

+OWJ2]

(1)

where O = e a / + e~ a/ and ^4 is the cross-sectional area. The forces acting at the two ends of the elastic bar are: AEa 1 (2) F, =^^[®u2-2ul]~AEei *1 = [ 2 « 2 - O M , ] - ^ B E

m.suh-W-d

sub-1 sub-2

Q,

«i

Elastic W

ft

(a)

,„«"&

/,

"a

fesr--¥

:|M|aaaaaa^a|||^|||a^^

00

. 0 +j3f#

H~£

>

Fig, I A ntodelpsvemerat, (a) of a sexi«s sub-structures; (b) A cracking model, (c) the elastic bar„(d) A cohesive crack

Secondly, consider the cracks. The constitutive relation of the cohesive crack is shown in Fig. 1(d). The energy stored in the cohesive crack is:

Ec=\ {

A(ft-^wcra 2wc ^1Aftwc

)wc

if

w'

<

w„

(3) if

wc

>

w„

where ft is the tensile stress, wcra the crack opening, and wc the critical crack opening. From the equilibrium conditions between the two cracks and the two ends of the elastic bar, we obtain, „ 2Xw2 w, = M0 + Hx —

•OXM„

u,-H-

OXM3-2XM!-Z

(4)

563

where X = Eawc,

Z^xV{f

Y = O£ocwc -*¥ft,

functions Hl=H(wcral)

cra2^ craz

and H2 = H(w

)

+Ezini)wc.

Heaviside

indicate cracks closed (=0)

or open (=1), and the crack openings w =ul-u0 and w =u3 From (4) we obtain: 1 [Y(HpX-Y)u0+m]X(H2OX-Y)u3+HlZ(Y-2H2X)\ "J u}] AHXH2X2-Y2 IH^H^X-Y)^ 3

+Y(H2®X-Y)u3 +H2Z(2HlX-YJ]

(5)

Minimum And The Maximum Cracking Spacings

Consider shrinkage cracking, i.e., let w 0 = 0 and M 3 = 0 in (5). When E"" reaches the critical value, ec™ = - / , IE, the forces Fx and F2 reach Aft. If e™ increases further, both cracks have the opportunity to initiate. There are three Length. possibilities: (a) the damage localizes ?0E-H)5 3 5 7 9 into the first crack; (b) the damage Pig, 2 The force on the first wrack distributes on the two cracks; (c) the damage localizes into the second crack. Due to the symmetry, we only consider the first two cases. Let s"" = 1.03ec". Assume that the first crack always opens, (i.e., Hl = 1). We calculate the force acting on the second crack, F2, for both the case H2=0 and the case H2=\ with a variety of the length of the elastic bar. The material properties are: E = 24000MPa , / , = 2.4MPa, uc = 180um, k = 8 ° % and A = 1. The results are plotted in Fig. 2. Fig. 2 shows that, for the localized solution the force, F2, rises up to Aft at / = 6.8m . Thus, when / > 6.8m , the next crack will always initiate, i.e., no localized solution exists. When / < 6.8m, the forces for both the localized and distributed solutions are lesser than Af,, i.e., both solutions are possible. It is the energy minimization principle that governs which solution the cracking model should follow. With e™ being fixed, by varying ux and u2, the corresponding energy is calculated by summing (1) and (3). The energy profiles are illustrated in Fig. 3. For the case / < 6.8m, only the localized solutions correspond to the minima of the energy surface. This means /min = 6.8m is the minimum crack spacing. For the

564

case 1 > 6.8m , the energy surface has only one minimum that corresponds to the distributed solution, i.e., only the distributed solution is possible in this instance. However, the crack spacing will not be greater than 2/ min . Otherwise, we could insert a cohesive crack in the middle and it would initiate, as the force acting on it would surpasses the critical value Aft. Hence the minimum and the maximum cracking spacings are /min = 6.8m and ^ax^^min res P e ctively.

Fig, 3 the energy ptofiles s^al 1=3 m. (b) 1=6KI, (C) J=8m

4

Conclusions

A cracking model has been presented, through which the minimum and the maximum cracking spacings have been investigated. The forces acting on the cohesive crack and the energy profiles have demonstrated that the practical crack spacing will fall between the minimum and maximum spacing. In the circumstances that both the localized and distributed solutions are possible, it is the energy profile that governs which solution the cracking model should follow. 5

References

1. Mccullough, B.F. (1983). Criteria for the design, construction, and maintenance of continuously reinforced concrete pavement. Australian Road Research, 13,79-99 2. Penev, D. and Kawamura, M. (1993). Estimation of the spacing and the width of cracks caused by shrinkage in the cement-treated slab under restraint. Cement and Concrete Research, 23, 925-932 3. Shen, W. and Kirkner, D.J. (1999). Distributed shrinking cracking of AC pavement with frictional constraint. Jnl Eng. Mech., 125, 554-560

HELLINGER-REISSNER MIXED FORMULATION FOR THE NONLINEAR FRAME ELEMENT WITH LATERAL DEFORMABLE SUPPORTS Suchart Limkatanyu Lecturer, Dept. of Civil Engineering, Faculty of Engineering, Prince of Songkla University, Songkhla, Thailand, 90110, tel: 66-074-287129, lsuchartfajratree.psu.ac.th ABSTRACT This paper presents the theory and applications of the Hellinger-Reissner mixed formulation for the nonlinear frame element with lateral deformable supports. The governing differential equations of the problem (strong form) are derived first. Then, the Hellinger-Reissner mixed frame element (weak form) is formulated to solve for the numerical solution of the problem. Tonti' s diagrams are employed to conveniently represent the equations governing both strong and weak forms of the problem. Finally, a numerical example is used to show that the Hellinger-Reissner mixed element is much more accurate than the classical displacement-based element. The nonlinear frame model proposed in this paper has practical applications in modelling the soil-pile structural system, geosynthetics/fiber-glass reinforcement of foundation soils, beam on deformable foundations, etc. KEYWORDS Finite Elements, Nonlinear Analysis, Mixed Formulation, Soil-Structure Interaction, Frame Models, Winkler Foundation Model INTRODUCTION The problem of the soil-structure interaction is often modeled and solved as a beam (structure) on one-dimensional springs (soils). Winkler (1) is the first to propose the socalled 'Winkler foundation model" to study the problem of a beam on elastic foundations. It is also noted that the beam in the Winkler foundation model is based on the EulerBerhoulli beam theory widely used in structural analysis. It is the main focus of this paper to develop the general theoretical framework of the Hellinger-Reissner (H-R) mixed formulation of the nonlinear frame element with lateral deformable supports. This nonlinear frame element can be used as the numerical tool to study the problem of soilstructure interactions. The derivation of the governing differential equations (strong form) of the nonlinear frame element with lateral deformable supports is presented first. The HR mixed element formulation is presented next and form the core of this paper. Tonti' s diagrams are used to concisely represent the equations governing both strong and weak forms of the problem. Finally, the numerical example is used to show that the H-R mixed element is much more accurate than the classical displacement-based element.

565

566

DIFFERENTIAL EQUATIONS OF FRAME ELEMENT WITH LATERAL DEFORMABLE SUPPOTS (STRONG FORM) Equilibrium soil

V(x).

Ds (x)

1 i 1 1 1 1 11 beam

M(x) N(x)

dx

M(x) + dM(x) —*-N(x) + dN(x) V(x) + dV(x)

Figure 1 An Infinitesimal Segment of Frame Element with Lateral Deformable Supports

The free body diagram of an infinitesimal segment dx of frame element with lateral deformable supports is shown in Figure 1. Based on the small deformation assumption, axial, vertical, and moment equilibrium conditions are considered in the undeformed configuration. This work follows the Euler-Bernoulli beam theory, thus the shear deformations are neglected. The shear force V(x) is eliminated by combining the vertical and moment equilibrium equations. The resulting equilibrium equations can be grouped in the matrix form as follows: "d 0 dTD(x) + dTsDs (x) = 0; where d = dx d,=[0 lj (1) 0 dx2 where D(x) = | Af(x) M(x)J is the element section force and Ds{x) is the lateral soil force. Compatibility The element section deformation vector conjugate of D(x) is d(x) = | f ( x )

K"(x)J ,

where ^(x) is the axial strain at reference axis and K (X) is the bending curvature. The following displacements are defined at the element level: u(x) = | «(x)

v(x)J , where

w(x) and v(x) are the axial and transverse displacements, respectively. Based on the small deformation assumption, the element deformations are related to the element displacements through the following compatibility relations: e{x) = du(x)/dxand JC(X) = d1v{x)ldx1,

can be written in the following matrix form:

d(x) = du(x) The lateral soil deformation ds (x) is determined by the following matrix relation. ds(x) = dsu(x) Force-Deformation Relations

(2) (3)

The nonlinear nature of the proposed element derives entirely from the nonlinear relation between the section forces D(x), Ds(x) and the section deformations d(x), ds(x). In the proposed formulation, the fiber-section model is used to derive the section constitutive law D = D(d). The lateral-soil constitutive law is expressed in the form of Ds = Ds (ds).

567

The equilibrium, compatibility, and constitutive equations for the frame element with lateral deformable supports presented above are conveniently represented in the classical Tonti's diagram of Figure 2. p(*) = o

„(*)

compatibility

equilibrium

d = 3u "s - Us

dTD + dTsDs = 0 u

| section constitutive law

d(x),ds(x)

D — D (d) H(X),DS{X)

soil constitutive law

Ds — Ds

\ds)

Figure 2 Tonti's Diagram for Frame Element with Lateral Deformable Supports H E L L I N G E R - R E I S S N E R MIXED F O R M U L A T I O N O F F R A M E E L E M E N T W I T H L A T E R A L D E F O R M A B L E SUPPOTS (WEAK F O R M ) —

•(*)

• strong form

PW = O

soil com putibility equilibrium

ds = dsu

mEQ bea n compat,bil

snKB

= \SuT(dTD + dl.Ds)dx = 0 L

>y

i

T

$8D (du-d)dx

=0

L

«W i

\

w

\ section constitutive law

A = d (D )

»W."iW soil constitutive law

Ds = Ds

\ds)

Figure 3 Tonti's Diagram for Frame Element with Lateral Deformable Supports: Hellinger-Reissner Mixed Formulation

In the mixed formulation, the beam-section forces D(x) are expressed in terms of the element nodal forces through force shape functions, and the beam displacements u(x) are written as functions of the element nodal displacements via displacement shape functions. The element nodal forces and nodal displacements serve as the primary element unknowns. The equilibrium equation (Eq. 1) and the beam compatibility equation (Eq. 2) are satisfied in the integral form. On the other hand, the soil compatibility equation (Eq. 3) is satisfied in the strong form. The H-R mixed functional YlHR is defined as

nHR[u(x),n(x)-\=nEe[u(x)]+nBCE[D{x)-\

w

where YlEQ is the weak form of equilibrium and TlBCE is the weak form of the beam compatibility. According to the stationary principle, the compatible equilibrium configuration is obtained when YlHR reaches a stationary value (SYlHR =0). The mixed formulation is schematically represented in Tonti's diagram (Figure 3). The further details of the element formulation can be found in Limkatanyu (2). The H-R mixed element configuration is shown in Figure 4. The element displacement and force degrees of freedom systems are shown in Figure 4 (a) and (b), respectively. It is also noted that the orders of interpolation functions in these two systems have to be compatible each other in order to ensure the numerical stability of the mixed element.

568

a mmii'o ») Displacement

degrees

of

freedoms

R

\ «\ b) Force degrees of freedoms

Figure 4 Mixed Frame Element with Lateral Deformable Supports

NUMERICAL EXAMPLE

£,„,,= 200 GPu F""' =460 MPa ks = 0.06 Nl mm'

P.S I

IIIp0301[3 8 t K

Figure 5 Steel Beam on Deformable Foundations 5000 1 El*mant 4000 3000

4Elamant* 8 Elamanta * * • Ifi Elwnanti 32 Elamanta EWKit

^

\

4000

\

3000

£ 2000 a 2 1000

2000 1000

Midspan Displacement (mm)

- - 1 Elamant 2 Elam«nla 4 Elamanta Exiel

b) Hallingar-Raiainar Miud Elamant

Midspan Displacement (mm)

Figure 6 Convergence Study of the Displacement-Based and Mixed Elements

The performance of the displacement-based and mixed frame elements with lateral deformable supports is compared and investigated using the simply supported beam rested on deformable foundations of Figure 5. Figure 6 studies the number of elements needed to reach the converged solution for these two formulations by comparing the two loaddisplacement responses. The "exact" response is obtained with 64 displacement-based elements. The stiffness changes in the load-displacement responses are due to yielding of steel beam. Figure 6 obviously shows that the H-R mixed element is much more accurate than the displacement-based element. Only 4 H-R mixed elements are needed to obtain the exact response while 32 displacement-based elements are needed to obtain the exact response. This shows how the force shape functions play an important role in determining the element accuracy. CONCLUSIONS This paper presents the newly developed frame element with lateral deformable supports based on the Hellinger-Reissner mixed formulation. In this formulation, the equilibrium and beam-compatibility equations are satisfied in the integral form while the soilcompatibility condition is satisfied in the strong form. The converged study show that the force shape functions enhance the element performance of the mixed element when compared to the displacement-based element. REFERENCES (1). Winkler, E., Theory of Elasticity and Strength, H. Dominicus, Prague, 1867 (2). Limkatanyu, S., Reinforced Concrete Models with Bond-Interfaces for the Nonlinear Static and Dynamic Analysis of Reinforced Concrete Frame Structures (Ph.D. Dissertation), Department of Civil, Environmental, and Architectural Engineering, University of Colorado, Boulder, 2002.

ENERGY APPROACH TO NUMERICAL MODELLING OF CRACK SPACING IN REINFORCED CONCRETE G. CHEN and G. BAKER The University of Southern Queensland, Toowoomba, E-mail:

QLD 4350,

Australia

chens @ usq. edit, au and hakerp @ usq. edu. au

This paper presents a new numerical methodology for the prediction of crack spacing in reinforced concrete. It is assumed that the deformation pattern of crack spacing consumes the least energy among all kinematically admissible deformations, and the energy minimization approach is applied to predict crack spacing. To simplify the problem, a lattice model is used, in which the cracking process is represented by the softening of the concrete bar elements. The crack spacing due to tension and bending is investigated. The important result is that distinct cracks are predicted within a continuum formulation, with uncracked, unloaded material between them and the energy criterion is validated over the classical tangent stiffness equilibrium approach.

1

Introduction

Prediction of crack spacing has been studied by many researchers in both experimental and analytical methodologies (Chowdhury and Loo, 2001; Creazza and Russo, 1999). On the other hand, though a large number of numerical studies have been dedicated to cracking in reinforced concrete, however, none directly tackles crack spacing. Cracks are usually assumed to follow pre-defined propagation paths by either inserting a discrete crack or introducing imperfections (Rots and Blaauwendraad, 1989). This paper aims to present a different methodology to model crack spacing. Instead of pre-assuming the crack positions, this methodology assumes that the structure follows the deformation pattern that consumes the least energy. The energy minimization principle determines where and when the crack arises and how it propagates. To simplify the problem, a lattice model is adopted. The cracking process and fracture are simulated by strain softening and breakage of lattice members. 2

Lattice Model

In lattice-type models, the continuum is discretized into a framework of bar elements (van Mier et al., 1995). A bar element is defined by two coplanar points i and j , as shown in Fig. 1(a). The concrete bar element obeys a softening constitutive law shown in Fig. 1(b). When the strain s is greater than the elastic limit strain ee, the material will develop plastic deformation; as it reaches the ultimate strain s„, the bar breaks. For a

569

570

small deformation, the strain increment is related to the displacement increments by {aj - a,)(Auj As

- AM,.) + (bj - b,)(Avy - Av,)

„ .

(aj-a.y+Qj-b.Y

(«i.Vi)

f

w (c)

SN

ft

r I

X

8

i *

«

N /

V**

J?

Fig. 1 Bar Element (a) Bar Element, (b) Constitutive Law; (c) Calculation of the energy increment The strain increment As is decomposed into the elastic part, As e , and the plastic part, As p which are calculated as: h -As As„ = (2) As E+h " E+h where h is the hardening modulus; for softening materials, it is called a softening modulus. Eqn. (2) holds only for plastic loading. For the case s > s „ , the bar element breaks, any further loading will not increase the elastic strain, i.e., As p = As . For unloading or reloading, if the bar does not break, all the strain increment is elastic, that is, Ase = As ; while for a broken bar, it is simply assumed that Ase = 0. After knowing the elastic strain increment, As e , the stress increment is calculated by Aa = E- As e . There are three possibilities for energy calculation: During the strain increment As, the bar element concerned goes through (1) elastic deformation, (2) elastic to plastic deformation, and (3) elastic deformation, plastic deformation, and breakage. The energy consumed is shown in Fig. 1(c) by the shadowed and hatched areas for the last two possibilities. By summing the energy increments for all the elements, the total energy increment of the lattice model is obtained. Powell's conjugate method (Rao, 1996) is used to perform the energy minimization. As

=

571

3

Numerical examples

The lattice in Fig. 2 models a reinforced panel of 2.70m x 0.9m . Three reinforcing bars are represented by the heavy lines. The cross-sectional areas are assumed unity for the elements representing concrete and 0.25 for reinforcing bars. The concrete material properties are: E = 4x 104 MPa, /, = 4MPa , h = -0.05.E . A linear softening law is used. For reinforcements: E = 2x 10 5 MPa, the yield stress a y = 5 0 0 M P a , h = 0.01E. The reinforcing bars are hinged at one end and at the other end subjected to prescribed displacements that model either a uni-tension or pure bending.

The crack formations and propagations are shown in Fig. 3 for both tension and bending, in which the damage degree is indicated by the brightness (black for intact, white for broken) of the elements. Crack spacing is obviously obtained in both cases. At the beginning, the damage is distributed; with further loading, discrete cracks form and the damage localizes into several distinct cracks and the bordering concrete unloads. 4

Conclusions

A new numerical methodology for prediction of crack spacing in reinforced concrete has been proposed, based on the energy minimization principle. The numerical analyses confirm that crack spacing can be treated as a strain localization problem. Among all kinematically admissible deformations, the deformation pattern of crack spacing consumes the least energy. Both the uni-axial tension and pure bending examples have been investigated. The important result is that distinct cracks are predicted within a continuum formulation, with uncracked, unloaded material between them. Hence the energy criterion is validated over the classical tangent stiffness equilibrium approach.

572 /•^

5

fM

References

1. Chowdhury, S.H. and Loo,Y.C. (2001). A New Formula for Prediction of Crack Widths in Reinforced and Partially Prestressed Concrete Beams." Advances in Structural Engineering, 4, 101-110 2. Creazza, G. and Russo, S. (1999). A new model for predicting crack width with different percentages of reinforcement and concrete strength classes. Materials and Structures, Vol. 32, n221, 520-524 3. Rao, S.S. (1996). Engineering Optimization. Wiley: New York. 4. Rots, J.G., and Blaauwendraad, J. (1989). Crack models for concrete: discrete or smeared? Fixed, multi-directional or rotating? Heron, 34(1) 5. Van Mier, J.G.M., Schlangen, E, Vervuurt, A. (1995). "Lattice type fracture models for concrete." In Continuum Models for Materials with Micro-structure, Muhlhaus, H.B. (ed.), Wiley: Chichester, 342-377.

EFFECT OF BOLT CONNECTIONS ON DYNAMIC RESPONSE OF CYLINDRICAL SHELL STRUCTURES Q. H. CHENG , S. ZHANG ° AND Y. Y. WANG Institute of High Performance Computing, 1 Sci. ParkRd., #01-01 The Capricorn, S'pore E-mail: [email protected]

117528

No.15 Institute, China Academy of Launch-Vehicle Technology, PO Box 9200-71, Beijing E-mail: [email protected]

100076

A modeling technique of finite element method (FEM) is presented to investigate effect of bolt connections on dynamic response of structures. This method deviates from conventional practice in which bolt connections are represented by beam elements or a rigid bar elements or even more simply by a set of common nodes shared by the two connected parts. In this study, bolt connections would be modeled in detail. Interaction of the connected flanges would be considered by contact algorithms. Prestress of the bolts would also be incorporated. Normal mode results by ABAQUS code are presented to show the effect of the bolt-connection on natural frequencies of the structure.

1

Introduction

Bolt connections are widely employed in various industries. In aviation and aerospace structures, two cylindrical shell sections are usually fastened with two frame flanges by a number of bolts that are circumferentially distributed. A number of investigations of bolt connections have been reported. The selfloosening behavior of a bolt subjected to harmonic excitation was predicted [1]. The structural stiffness and strength properties of a column flange-endplate connection were studied by 3D FE model using ABAQUS [2] and ANSYS [3]. While most of the works focus on local behavior and phenomena of bolt connections, few authors paid attention to the role that bolts play in global behavior of structure assemblies. When applying FEM to analyze such structures, conventionally the bolt connections are modeled as beam elements or rigid elements (usually called MPC in commercial FEM codes) or even more simply representing a bolt by a unique node shared by the two connected parts. Because effects of preloading in bolts and interaction between connection flanges are ignored in this method, result accuracy might be a problem. However, very few papers can be found in the literature to study the feasibility and accuracy of that modeling technique. In this paper, a cylindrical shell structure is analyzed by a 3D FE model using the ABAQUS code. Bolt connections in the structure are modeled in detail. Pretension in bolts and contact interface are considered by a nonlinear analysis procedure. Effect of the connections on structural normal modes is presented. 2

Methods and Finite Element Model

Illustrated in Figure 1 is a tapered cylindrical shell structure consisting of two sections. Both sections are 3-meter long; beginning and ending diameters of Section 1 are 1.2 m and 1.0 m while they are 1.0 m and U.8 m for Section 2 respectively. Each section has two

573

574

ending frames of a thickness 8 mm, and is strengthened by 7 intermediate frames and 36 stringers evenly distributed along circumference as shown in Figure 2. The two inner ending frames serve as the flanges through which the two sections are fastened together by twelve Ml6 bolts, (see Figure 3)

Figure 1. Exploded view of a cylindrical shell structure.

Figure 2. Intermediate frames and stringers for shell sections.

Figure 3. Connections.

Dynamic response of this construction is investigated by finite element method (FEM) using the commercial code ABAQUS. Four-node shell element with reduced integration (S4R) is used to model the skin (see Figure 4). The intermediate frames and stringers are also modeled by S4R elements but the ending frames by 8-node linear brick element (C3D8) as shown in Figure 5.

Figure 4. Shell elements for skins.

Figure 5. Mesh for frames and stringers.

The FE meshes in regions around bolt connections are tuned much finer than other areas. A close observation of the mesh for one of the connections is illustrated in Figure 6. The bolts are also modeled by C3D8 elements. Note that the bolt nuts are assumed in circular shape instead of conventional hexagon in order to facilitate generating the mesh. Effect of this simplicity would be localized but insignificant for global behavior studied in this paper. There are three interfaces in this figure, i.e. two represented by the shorter thick lines between bolt nuts and flanges and one represented by the longer thick line between the two flanges. Interactions at the former two interfaces are reasonably simplified by a matching mesh between the bolt nut and the flange. Phenomenon at the

575

third interface is the main target of the investigation. The interaction is implemented with the contact algorithm. 3

Numerical Analysis and Results

Normal mode analysis of the structure is carried out using the described FE model. To incorporate the effect of preloading in bolt connections, the analysis is conducted in two steps. The first step is a non-linear static analysis to simulate the pretension of the bolts. It would develop axial forces in bolts and contact forces between the flanges. Force balance would be achieved at the three contact interfaces while other parts of the construction are in a strain-free state. Stress distributions in one flange as well as within one bolt are illustrated in Figure 7. It can be seen that stress in the flange concentrates in a small portion in the vicinity of the bolts. The total contact force is 1012.1 kN which would produce an average stress of 419.5 MPa in the bolt shank. This figure is about 60% of the ultimate strength of a high-strength bolt. The maximum Von Mises stresses in bolts and flanges are 531 MPa and 441 MPa respectively. The second step is a dynamic analysis to extract the normal modes of the structure. The strain state at the interfaces obtained from the first step is automatically incorporated in the second step. Only global vibration modes are concerned. Figure 8 shows the first four mode shapes.

Figure 6. Local view of FE mesh for one bolt connection.

Second bending mode, freq. = 256.33 Hz Figure 8. Normal mode shapes of the first four orders.

Figure 7. Stress distribution in flange and bolt.

First axial mode, freq. = 279.35 Hz

576

To examine the preloading effect of the bolt connections, two more alternative FE analyses are examined. One analysis applies only the second step with the above model. This means no pretension of bolt connections is included. For the other analysis, the mode is simplified in that the ending frames are represented by S4R shell elements. Neither bolts are modeled in detail nor interface contact is considered. The bolt connections are simply represented by beam element. A linear normal mode analysis is conducted for the modified model. Normal mode frequencies from the three analyses are summarized in Table 1 with percentage difference of the frequency result by the later two analyses from the first one. Due to structural symmetry, a pair of frequencies exists for each bending mode, but only one of them is listed. It can be seen that significant difference of frequency results is found for lower-order symmetric modes, i.e. the 1st bending and 1st axial modes. Considering the contact interface and modeling the bolts without pretension present higher frequency values. On the other hand, modeling the bolts by beam elements gives much lower results. Variation for higher-order symmetric modes (3rd and 5th bending) is attenuated but still higher than other non-symmetric modes. Table 1. Normal mode frequencies. (Hz)

Mode No. 1 2 3 4 5 6 7 8 9 10 4

Freq. with pretension 83.93 184.54 256.53 279.35 307.27 340.06 515.98 566.21 573.54 579.49

Freq. without Pretension 107.25 27.8% 185.84 0.7% 0.5% 257.88 24.3% 347.14 0.1% 307.70 5.2% 357.66 -0.6% 512.97 0.8% 570.82 -1.9% 562.58 2.7% 595.12

Freq. by Simplified Model 51.03 -39.2% 179.36 -2.8% 252.15 -1.7% 170.30 -39.0% 0.7% 309.39 324.06 -4.7% 509.41 -1.3% -4.2% 542.71 -3.0% 556.05 -6.3% 542.86

Mode Description 1st bending 1st torsion 2nd bending Is' axial 2nd torsion 3rd bending 4th bending 3rd torsion 2nd axial 5th bending

Discussion

Normal mode analysis of a cylindrical shell structure is carried out with consideration of bolt connection pretension and contact interface. It is found that bolt connections have significant effect on natural frequencies of symmetric vibration modes, even for those lower modes. Conventional modeling technique of bolt connections by beam elements is therefore questionable. Further study would be conducted to suggest a reasonable but simplified method to model bolt connections. References 1. Zadoks R. I. and Yu X., An investigation of the self-loosening behavior of bolts under transverse vibration. J. of Sound and Vib. 208 (1997) pp. 189-209. 2. Bursi O. S. and Jaspart J. P., Calibration of a finite element model for isolated bolted end-plate steel connection. J. Construct. Steel Res. 44 (1997) pp. 225-262. 3. Bahaari M. R. and Sherbourne A. N., Behavior of eight-bolt large capacity endplate connections. Computers and Structures 77 (2000) pp. 315-325.

SIMULATION OF DUCTILE FRACTURE IN TUBULAR JOINTS THROUGH A VOID NUCLEATION MODEL X. D. QIAN, Y. S. CHOO*, AND J. Y. R. LIEW Center for Offshore and Maritime Engineering, National University of Singapore, 10 Kent Ridge Crescent, Singapore 119260 E-mail: [email protected] This paper presents a numerical approach in simulating the global ductile fracture for Circular Hollow Section (CHS) joints. The void nucleation algorithm based on Gurson's model is employed in the global strength analysis. Three types of joint configurations are investigated. Comparison is made with the available test results. Due to lack of material property data required in the Gurson's model, sensitivity study is carried out on these properties to observe their effects on the tubular joint strength.

Introduction Crack initiation and propagation is one of the common failure modes in tubular joints under tensile loading. Conventional FE approach based on continuum mechanics is not able to predict the occurrence of crack which violates the material and geometry continuity. Detail FE simulation of the cracking effect requires the geometry and route of the crack to be known a prior. Alternative approaches in tackling the effect of cracking include the arbitrary strain criteria [1], continuum damage mechanics [2], smeared crack model, discrete crack model and fracture mechanics. Gurson's approach in simulating the ductile fracture consists of two parts: the material plastic flow rule and the void nucleation process. Yield criterion modified by Tvergaard [3] is defined in Eq. (1). V 0(a,av,f):

+ 2q,f cosh

3q2p - ( l + q f 2 ) = 0 3

(1)

2a„

In Eq. (1), q refers to the effective Mises stress. o y is the initial material yield stress, p stands for the hydrostatic pressure and f indicates the void volume fraction. The change in void volume is comprised of two parts: growth of existing void and nucleation of new voids, as illustrated in Eq. (2) (2)

I — I growth i I nucleation

p

W h e r e f nucleation = A

E,

E

Gy and A =

!p=exp I—*

'e -e

v

(3)

sNv2Jt

In Eq. (3), 8N refers to the mean plastic strain level at which void nucleation takes place. fN represents the void volume fraction of the void nucleating particles. sN is the standard deviation of the nucleation strain. The parameters that need to be defined for Gurson's model includes two series: the q, (1=1,2 or 3) factors and the material parameters, eN, sN and fN. The q> factors are found to simulate the best material behavior by Tvergaard [3] when they are equal to values indicated in Fig. 2(c). On the other

577

578

hand, there is little data available in the literature addressing the values of material parameters. The numerical model in the current study is generated using MSC Patran[4]. Analysis is carried out by ABAQUS [5]. Both nonlinear material and geometric properties are taken into account. A typical configuration of a tubular joint is shown in Fig. 1, together with the non-dimensional joint parameters.

d0: Chord diameter; di: Brace diameter; to: Chord wall thickness; ti: brace wall thickness; lo: chord length; 9: brace to chord angle; g: gap between the two braces; P = dydo; y = do/2to; x = Vt,,; a = 21,,/do

Figure 1 Typical configuration of a tubular joint

2

Benchmark Study

The conventional bar-necking problem has been studied to verify different void nucleation models by many researchers [6]. The FE verifications carried out in these studies were based on the 2D axi-symmetric elements (CAX8R in ABAQUS element library). In order to ensure the applicability of Gurson's model in 3D continuum elements, the bar necking problem is re-analyzed using solid elements (C3D20R in ABAQUS element library). The geometry and numerical results are shown in Fig. 2. The 2D model

- 3 D model eN=0.30 - 2D model s„=0.10, f.,=0.04

u

4.5

_ ] Xft

r Tvergaard model q, = 1.5, q, = 1.0, q, = 2.25

0.510=4R

0.0 0.0

(a) 3D quarter model

(b) 2D axi-symmetric model

0.2

0.4 S

0.6

(c) Load-deformation

Figure 2 (a) 3D quarter model; (b) 2D axi-symmetric half model; (c) Load-deformation; for the tensile bar

0.8

579 is obtained from the ABAQUS benchmark manual [5]. Hardly any difference is observed between the 2D and 3D model. 3

Tubular Joint Behavior

Three types of joints: X- T- and K-joints; are obtained from published experimental results [7, 8 9]. Since Gurson's model involves the calculation of the plastic strain, a sufficiently fine mesh is required. Three meshing schemes are adopted as shown in Fig. 3.

(a) Fine mesh

(b) Medium mesh

(c) Coarse mesh

Figure 3 (a) Fine mesh; (b) Medium mesh; and (c) Coarse mesh for the X-joint model Table 1 Geometry and strength of three types of joints studied. JOlllL

X

d0 (mm)

P

Y

X

a

407.4 407.4 298.5 217.4

1.0 0.4 0.5 0.7

26 26 7.5 25

1.0 0.8 1.0 0.8

17.5 4.9 10.2 13.9

XI X2

T K(9 = 60°)

Test(kN) 2248 397 225

FE (Lu's) kN FE/Test 2055 0.91 388 0.98 1937 210 0.93

The geometry of the three types of joints is shown in Table 1. The comparison of the joint ultimate strength is also incorporated in the same table. There are two replicated tests for each X-joint shown in Table 1. The joint strength shown in 1 is the average of the two tests. The comparison of X- (XI) and T-joint behavior is illustrated in Fig. 4. The effect of different mesh schemes is also incorporated. The FE mesh density does not seem to show a significant effect on the joint ultimate strength. However, void nucleation process simulating the ductile fracture effect is a strain controlled criterion, which is directly dependent on the mesh scheme employed. This is demonstrated in Table 2, in which the displacement levels corresponding to 15% plastic strain are compared for different mesh Table 2 Comparison of the displacement at 15% plastic strain Mesh density Fine Medium Coarse

X(mm) XI 20 22 26

X2 12 38 61

T(mm)

K(mm)

11 14 36

4 7 9

580 -X1 (test A) FE medium - X1 (test B) FE coarse -no void p=l.O y=25.5 a=l7.5 - FE fine d =407,4mm

-Test [5=0.51 y=7.46 -no void T=I.OOC=IO.I8 - FE fine d0=298.5mm - FE medium - FE coarse

16 12 8 -

40

20 5 (mm)

0

10

20 5 (mm)

\a) ^V-JUHIL ucnaviui (b) T-joint behavior (a) X-joint behavior Figure 4 (a) X- joint behavior; (b) T- joint behavior; with Gurson's model simulation schemes. Large variations in the displacement levels are observed. The effect of the meshing scheme is apparent. Load reduction is most pronounced in the fine mesh, which results in the largest strain value as shown in Fig. 4. Sensitivity Study on £N, fN and sN The accuracy of the void nucleation analysis relies on the material input. However, there is no rigorous formulation in the literature to compute the material parameters needed for Gurson's model. The material properties may be affected during the manufacturing process [10]. For this study, three values of £N(0.0, 0.10 and 0.30), two

V=0.0

• A

P=1.0y=25.5

£N=0.1 T=1.0a=17.5 f-N=0.3 do=407.4mm X1 (test A) X1 (test B)

•

A

-f N =0.04 -f N =0.10 fN=0.20 X1 (test X1 (test

120

120

**

100

/•L

n 0

10

20 5 (mm)

30

(a) Effect of £N

~.

//.. J*

20

^-—""

mJtr^

, o80 u." SL 60 40

(i=l - 0 7=25.5 1=1.0 a=17.5 d0=407.4mm A) B)

eN=0.10, sN=0.05 ! 10

,

! 20 8 (mm)

i

; i 30

(b) Effect of fN

Figure 5 (a) Effect of eN; (b) Effect of fN; on X- joint behavior

"--

40

581 values of sN(0.05 and 0.10) , and three values of fN(0.04, 0.10 and 0.20) are looked into. Arndt & Dahl [11] reported that void nucleation process initiates once yielding occurs for high strength steel, hence eN =0.0 is selected. A relative large value of eN =0.30 is also incorporated for comparison. fN is normally taken less than 0.10. The extreme large value of fN =0.20 is selected to observe the amplified effect of void volume fraction. Figure 5 illustrates the effect of the material parameters on the X-joint (XI) behavior. With a eN=0.0, the joint strength is slightly reduced compared to the other two cases once plasticity occurs. A large value of £N postpones the initiation of void nucleation. The load reduction is only observed in the case of eN=0.10. The void volume fraction of the nucleating particles plays a significant role in the joint behavior as demonstrated in Fig. 5 (b). Relatively small value of fN does not initiate a strength reduction within the prescribed displacement. On the other hand, a very early reduction in the joint strength is observed if a rather large fN is taken. The joint behavior does not show a strong dependence on the standard deviation sN. The comparison is therefore is not shown here. 4

Conclusion

Gurson's model offers an alternative way in simulating the effect of ductile fracture for the tubular joints. Load reduction due to ductile tearing is captured in the numerical analysis. The ultimate strength obtained in Gurson's approach lies around similar level compared to that of the tests. Tubular joint behavior shows a high dependency on the material properties, especially the £N and fN values. High fN results in conservative estimation in the joint strength, with a premature effect of fracture being observed. On the other hand, a large void nucleation strain postpones the effect ductile fracture. References 1. Dexter E. M. and Lee M. M. K. Static strength of axially loaded tubular K-joints. I: Behavior. Journal ofStructural Engineering. (1999) pp. 194-201. 2. Jurban J. S. and Cofer W. F. Ultimate strength analysis of structural components using the continuum damage mechanics approach. Computers & Structures. (1991) pp. 741-752. 3. Tvergaard V. Influence of voids on shear band instabilities under plane strain conditions. International Journal of Fracture. 17 (1981) pp. 389-406. 4. Qian X. D., Romeijn A. Wardenier J. and Choo Y. S. An automatic FE mesh generator for CHS joints. Proceedings of the 12'h International Offshore and Polar Engineering Conference 4 (2002) pp. 11-18. 5. ABAQUS User Manual. Version 6.2.1 Hibbitt, karlsson & Sorensen Inc. (2001) 6. Tvergaard V. and Needleman A. Analysis of the cup-cone fracture in a round tensile bar. Acta metallurgica et materialia. 32 (1984) pp. 157-169. 7. Sanders D, h, and Yura J. A. Strength of doube-tee tubular joints in tension. Offshore Technology Conference. OTC5437 (1987) pp. 139-150. 8. Zerbst U., Heerens J. and Schwalbe K. H. The fracture of a welded tubular joint - an ESIS TV 1.3 round robin on failure assessment methods part I: experimental data base and brief summary of the results. Engineering Fracture Mechanics. 69 (2002) ppl093-1110.

582

9. Wang B., Hu N., Kurobane Y., Makino Y. and Lie S.T. Damage criterion and safety assessment approach to tubular joints. Engineering Structures 22 (2000) pp. 424-434. 10. Thomason P. F. Ductile Fracture of Metals. 1990 11. Arndt J. and Dahl W. Effect of void growth and shape on the initiation of ductile failure of steels. Computational Materials Science. 9 (1997) pp. 1-6.

STRESS INTENSITY FACTORS FOR DOUBLER-PLATE REINFORCED TUBULAR JOINT SUBJECTED TO AXIAL LOADS R. JIANG, Y. S. CHOO* Department of Civil Engineering, National University of Singapore, Singapore 117576 Email: [email protected] Corresponding author Doubler-plates are used to reinforce the tubular joint in offshore structures. As these are usually fillet-welded to the chord, the weld root is one of the key areas in considering the fatigue strength of the joint. This paper reports on the results of an on-going project with the objective to improve the understanding of the weld root failure phenomena through a systematic parametric study so that proper proportioning of the doubler-plate reinforced joint may be achieved. It is found that the doubler plate size should be carefully chosen to reduce high stress intensities at the weld root.

1

Introduction

Tubular joints are widely used in offshore structures. Fatigue strength is a major concern in design of this kind of structures. Many researchers have been studying the fatigue of the tubular joints. Lee et al (2000) developed a set of equations for estimating weld toe magnification factors (Mk) for semi-elliptical cracks in T-butt joints, from multiple regression analyses of the Mk factors gained in parametric study.. Their equations have been included in the new British Standard BS 7910. Lie et al (2002) also developed their own method in evaluating the stress intensity factors for cracks existing in weld toe area. In positions of structural weakness or areas of importance, doubler plate is often used to strengthen a tubular joint. In considering the fatigue strength of doubler-plate reinforced tubular joint, the weld root area is one of the key areas where lack of penetration usually exists due to inaccessibility during the welding process. Little research results are found in the technical literature in the fatigue failure of this area. Therefore, one of the aims of our project is to better understand the weld root failure so that proper proportioning of the doubler reinforced joint may be achieved.

2 Finite Element Analysis Doubler plate reinforced tubular X-Joints subject to brace axial compression (Figure 1) or tension were studied by numerical analysis. SIF values were evaluated for the root of doubler plate to chord weld. A systematic parametric study was carried on brace diameter, doubler plate thickness and doubler plate size which were considered to have significant influences (Table 1). The virtual crack extension technique embedded in ABAQUS code was adopted to evaluate the SIF values. To verify the validity and accuracy of this method, comparisons with standard connections were carried out prior to the current study. The results for

583

584

method verifications were reported elsewhere (Choo, 2002) and the SIF method was found to provide good correlation with referenced cases. Table 1. Range for parametric study

Brace diameter / Chord diameter Doubler plate thickness / Chord thickness Doubler plate length / Brace diameter

0.25, 0.5, 0.64, 0.8 1, 1.25, 1.6 1.25, 1.5, 1.75,2,2.5,3

Figure 1. Doubler-plate reinforced tubular joint subject to brace axial compression.

Numerical analyses were performed using ABAQUS while the models were generated in PATRAN. Due to symmetry, only 1/8 of the joint was modeled. 20-node brick elements were used for the analyses and the material used was steel with elastic modulus of 205000 MPa and Poisson's ratio of 0.3. For accurate evaluation of SIF, the slit tip area was modified as shown in Figure 2. The SIF values were evaluated at the root along both transverse and longitudinal welds. It was found that under brace compression, there was separation between the doubler-plate and the chord at the weld root along the transverse weld (from crown to the doubler plate corner). The position with the highest SIF value was found to be the crown position of the joint (Figure 3). At the corner where the transverse weld meets the longitudinal weld, the opening up of the slit tip is restricted by each other and the SIF value is expected to be smaller than positions along the welds (circled in Figure 3). This can be observed from the deformation of the numerical model and agrees with previous study [3]. That means the corner point is not a critical position in considering fatigue strength. Therefore, parametric study was carried out on joint crown position on three geometric configurations that have major influences.

585

'v. crown

Figure 2. Tubular X-joint subject to axial compression (left) and tension (right), in typical 1/8 model.

i

>n -

90 ' - . .

E w

•• •

60

V\

C/3

o

30

o _ 0.0

Crown

0 0.5

•

1.0

Saddle Positions along doubler plate edge

Figure 3. Typical results for one 1/8 model subject to axial compression.

3

Results and Discussions

From the results obtained, it was observed that in the consideration for doubler plate size for a particular joint configuration, there is a maximum root SIF value at crown position. From this value, both increase and decrease in doubler plate size will lower the root SIF, as shown in Figure 4 (left). This is true within the range studied and for all the doubler plate thicknesses studied. It is also observed that increase in brace over chord diameter ratio {dxld0) results in an increase of the root SIF at the crown. As shown in Figure 4 (right), it is found that the doubler plate thickness does not have significant effect on root SIF, within the range studied.

586

Figure 4. Combined influence of brace diameter and doubler plate size, ( = 10mm •

4

Conclusion

The paper presents results from a parametric study on the stress intensity factors (SIF) of doubler plate reinforced joints. The parametric study included systematic variation of the brace diameter, doubler plate thickness and doubler plate size. From the results obtained within the parametric ranges, it is observed that for a given joint configuration there is a maximum root SIF value at crown position for a certain doubler plate size. From this reference size, both increase and decrease in the doubler plate size will lower the root SIF. It is also observed that an increase in brace over chord diameter ratio (
Reference 1. ABAQUS Standard manual Vols. 1 & 2, Theory Manual, Version 6.2. Hibbitt, Karlsson and Sorenson Inc (2001). 2. PCL and Customization for MSC.Patran. MSC Software Corporation (1999). 3. Y.S. Choo, R. Jiang and V. Thevendran, Stress Intensity Factors for Doubler Plate Reinforced Connections, ISOPE (2002). 4. D. Bowness, M.M.K. Lee, Prediction of Weld Toe Magnification Factors for SemiElliptical Cracks in T-butt Joints, Int. J. of Fatigue (2000). 5. S.T. Lie, S.P. Chiew and Z.W. Huang, Finite Element Model for the Fatigue Analysis of Cracked Tubular T-Joints under Complex Loads, ISOPE (2002).

VIBRATION ANALYSIS OF POROELASTIC BAR T. Z. CHEN, Z. ZONG AND K. C. HUNG* Institute of High Performance Computing, 1 Science Park Road, #01-01 the Capricorn, Singapore Science Park II, Singapore 117528 *Email: [email protected], Tel: (+65)6419-1564, Fax: (+65)6778-0522 Vibration analyses of poroelastic bar are carried out analytically and numerically in this paper. The numerical method is characterized by two steps: temporal and spatial discretization. Runge-Kutta method is used for temporal discretization and a local interpolation scheme for spatial discretization. It is a truly meshless method. First, a free vibration is computed. The phase difference between displacement and fluid pressure is about 7t/2 in time history. Then, a displacement forced vibration is simulated. The fluid flow works like a damper in the dynamics response of the bar. The numerical method is validated since its result coincides with analysis solution.

1

Introduction

Poroelasticity is a continuum theory for fluid-saturated porous medium. It was originally motivated by problems in soil and geomechanics. These problems generally concern massive structures, such as consolidation problems, seismic wave propagation, etc. This application of poroelastic theory is relatively mature. However, relatively few papers have thus far investigated the poroelastic light structures. On the other hand, the poroelastic theory has also been extensively applied in biomechanics, e.g. bone mechanics [1,2]. In this field, a lot of objects can be modeled by light structures. For example, many bones can be simplified as fluid-saturated poroelastic bar. Cederbaum et al [3] investigated the poroelastic beam and plate. But for dynamic response, their research is not enough. Poroelastic dynamics is of paramount importance for better understanding of some biological phenomena, particularly those related to impact injury, brain trauma and bone fracture. In this paper, analytical solutions of two special conditions are obtained. However, analytical solution for arbitrary conditions is difficult to work out. For these problems, numerical analyses are required. As a promising alternative to finite element method, meshless methods use only a set of scattered nodes. In [4], the authors provided a truly meshless method, local interpolation collocation method. Here, it is adopted to simulate vibration of poroelastic bar. 2

Equations for poroelastic bar

For a poroelastic bar subjected to axial load, diffusion in the longitudinal direction is viable, while the flow in the perpendicular directions can be neglected. The governing equations for the poroelastic bar can be obtained as Eq. (1) [3] in non-dimensional form within the small deflection theory and Biot's theory, with relative motion between the solid and fluid governed by Darcy's law.

587

588

d2u

df

dx2

dx

d2f

df

3* 2

2d2"

„

dt2

. d2u

(0<x0)

(1)

n

—r- + A =0 dt dxdt

Here, u and / are two unknown time-dependent functions, non-dimensional axial displacement and non-dimensional pressure of fluid. 7], /and A are material parameters. The boundary and initial conditions are different in different cases. The boundary conditions on axial displacement are that u is given at certain points. The diffusion boundary condition for a permeable end is that/is given at the end, while for an impermeable end is that df/dx is given as 0 at the end.

3

Simulation results

The numerical method in this paper is characterized by two steps: temporal and spatial discretization. Runge-Kutta method is used for temporal discretization and a local interpolation scheme [4] for spatial discretization. It is a truly meshless method. Two cases, one for free vibration and the other one for forced vibration, are simulated here because we can work out analytical solution of these two cases. In numerical simulation, 51 nodes are generated on the bar, and the time step is 10" . 3.1

Free vibration

In this case, the bar is fixed at both ends, and both ends are assumed as impermeable, so the boundary condition can be written as: u(0,t) = 0 H(l,f) = 0 df df

- f (0,r) = 0 - f (U) = 0 dx

(t>0)

(2)

(0<x
(3)

(0<x<\,t>0)

(4)

dx

The initial condition is a static status as: w(jc,0) = sin7ir f(x,0) = 0

u(x,0) = 0

The analytical solution of this case is: u(x,t) = R(t) sin izx y2 •• n

where

R{t) = cxePt +(czcosbt + c3sinbt)el"

While r\ = 1, 7 = 1 and X = 1, P =-8.88196885479441, a =-0.493817773147473, cl =0.01353475908632359, c2 = 0.986465240913676,

(f>0)

(5)

b = 3.27463045961263, c3 = 0.185471119476820

589

The numerical result of this case is shown in Fig. 1 and 2. It is agree with the analytical solution. The shape of displacement remains sine curve from 0 to n, while the shape of fluid pressure always is cosine curve from 0 to n. The fluid flow works like a damper. The phase difference between displacement and fluid pressure is about 7i/2 in time history.

0.0

0.2

0.4

0.6

0.8

Figure 2. Time history of u and/in free vibration

Figure 1. Shapes of u and/in free vibration

3.2

Forced displacement vibration

The initial condition of this case is set as static as K(JC,0) = 0

M(JC,0)=0

/(X,0)=0

(0<JC<1)

(6)

The bar is assumed to be impermeable at both ends, and it is fixed at one end, while the displacement of the other end is forced as 1 - coscot, so the boundary condition can be written as f-(0,f) = 0 | ^ ( 1 , 0 = 0 ox ox

(7)

(t>0)

w(0,f) = 0 w(l,f) = l-cosfi# If Ji, s2 and s3 are roots of the following equation: s +(nn) s +

'——s+

=0

(n = 1,2,3, — ,°°)

(8)

The analytical solution of this case is f(x,t) = 2Xco^(-l)n+l

cos(nnx)[

.?, (sl sin cot+ w cos ax-coe ' ) (5, -S2)(S1

-^3)(^i2

s2 (s2 sin cot + cocos cot -coe (s2-sl)(s2-si)(s2 +tt) 2 )

2

+C02)

)

s?,2(s? sin cot + co cos cot-coeS3')^ .„ + —2—^-2 —L] + A(l - cos cot) (si-sl)(s3-s2)(s3 +co )

(9)

590 N„+)

sin(n7Dc) .$](.?!+n TT X^ sina»+ a)cos£Of-oe ' )

7T ^ " ™

(*1

n=l

+

s 2 Cs 2 +«

2n_ 2

" -^2 K-5! — -^3 X'5'!2 + G>2 )

)(s2 sin cot + a cos cot - (oe 2

2

)

2

(10)

(s2-sl)(s2-s3)(s2 +co ) s3 (s3 + n2n2 )(s3 sin G» + cocos cot -coeh') ] + x(l-cosax) {s3 - sx){s3 - s2){s2 + a2)

The numerical result of this case is shown in Fig. 3 and 4 (co- 10). It is agree with the analytical solution. At begin, vibration attenuates because of fluid flow while the wave propagate from the forced end to the other end. However, with long duration, vibration at other position may larger than the forced end.

; p o A

> .

$*2
E O

0.

ta

o. 3 ° -0

V o

J \f

< D

'W\

O

t=0.2 t=0.4 t=0.6 t=0.8 t=1.0 t=1.2 t=1.4 t=1.6

•-^s^

-0

•Analytical iolutioh

— Analytic il Solution

-1 0.0

0.2

0.4

0.8

Figure 3. Shapes of u in forced vibration

4

1.0

0.0

0.2

0.4

Figure 4. Shapes of/in forced vibration

Discussion

Vibration of poroelastic bar is simulated in this paper. From the simulation, the fluid flow works like a damper to the bar vibration. The numerical results agree well with the analytical solutions. So, the numerical method is validated. Only two cases are simulated here because we can work out analytical solution of them. More cases solved by the numerical method and the essential of such vibration should be discussed later. Afterward, the research may extend to shock analysis of bone. References 1. Cowin S. C , Bone, Mechanics Handbook. CRC Press 2001 2. Cowin S. C , Survey article: Bone Poroelasticity, Journal of Biomechanics 32 (1999) pp 217-238 3. Cederbaum G., Li L. P. and Schulgasser K., Poroelastic Structures. ELSEVIER 2000 4. Chen T. Z., Zong Z. and Hung K. C , A local interpolation collocation approach to the wave equation, Comp. Meth. Appl. Mech. Engng (submitted)

FINITE ELEMENT FAILURE MODELLING OF CORRUGATED PANEL SUBJECTED TO DYNAMIC BLAST LOADING J.W. BOH Centre for Offshore and Maritime Engineering, Faculty of Engineering National University of Singapore, Singapore 117576 L.A. LOUCA Department of Civil & Environmental Engineering, Imperial College of Science, Technology and Medicine, London, SW7 2BU, U.K.

Y.S. CHOO Centre for Offshore and Maritime Engineering, Faculty of Engineering National University of Singapore, Singapore 117576

By adopting the Abaqus/Explicit finite element code, the authors have investigated the use of a force-based failure criterion and rupture strain-based criterion to assess the integrity of a corrugated panel under dynamic loading. The responses obtained are found to be able to describe the tearing of the panel with good accuracy while the computed strain distribution is marginally conservative.

1

Introduction

Corrugated panels as firewalls are commonly found in offshore installations. In the low probability event of a hydrocarbon explosion, large plastic deformation is usually allowed although extensive tearing of the panel must be prevented. One area of interest to engineers is the failure criteria of the panel. In addition, numerical finite element codes, such as Abaqus/Explicit, have been successfully carried out in the past to model high transient dynamic stress wave propagation simulations. Two numerical failure models, namely the spot weld (SW) model and rupture strain (RS) model, are adopted in this study to investigate the integrity of the corrugated panel under blast loading. Past studies [1,2] have indicated that both models are at least able to quantitatively describe the tearing of the panel with some success. This paper attempts to further calibrate the two models with available experimental results highlighting their relative strengths and weaknesses in failure modelling. 2

Experimental Setup and Observations

The 2.5 mm thick stainless steel corrugated panel is a shallow profiled AO blast wall approximately 2.5m square (Figure 1). At the time of 64.2 ms,

591

592

the firewall ruptured initially at the bottom centre (S3A) of the transverse weld as shown in Figure 1. At the end of the test, the panel had almost completely dislodged from the angle frames with the corrugations significantly flattened and the transverse angles substantially deformed. 3 Finite Element and Failure Modelling The corrugated panel was modelled using first order reduced integration shell elements with an inbuilt hourglass viscosity as shown in Figure 2. Equation of motion was solved by means of central difference method and numerical integrations through the thickness of the shell were carried out using Simpson's rule. Both geometric and material non-linearities were included in the analysis.

Figure 1: Locations of strain and displacement gauges

Figure 2: Finite element model for the corrugated panel.

The peak blast loading of 2.45 bar was assumed to be uniformly distributed on the panel. An idealized bi-linear triangular pressure pulse, with equal rise and decay time of 116 bar/s, was employed in the study. The nominal stress-strain material curve of the stainless steel panel was obtained from quasi-static uniaxial tension test. The rupture strain (RS) model was implemented in the finite element model by letting the outer elements of the panel behave as the weld

593

material. Failure is assumed at an integration point when the incremental equivalent plastic strain (^£pi ) exceeds the rupture strain (£crit),

X e ^- e crit

(1)

The spot weld (SW) failure model is essentially a force-based failure criterion. No rotational restraint was considered. Failure is assumed when 2 FS max \

FN, FNult

FS

<1.0

(2)

ult

FNmax a'nd FNuit are the maximum and ultimate tensile force respectively; FSmax and FSuit are the maximum and ultimate shear force respectively. 4 Results and Discussions A typical finite element model showing the tearing of the panel is shown in Figure 3. Results obtained from the two failure models are compared with the experimental results in Table 1, and typical nominal strain distributions are shown in Figure 4 for outer element (S3A) and Figure 5 for inner element (SIA). f l j C T g i f i ' - , " T ? " 1 " •"•'• ; • • • " * ? • ! '

•

SW

'•••

Failure Time* Location S3AYA S3BY# S1AYA S1AX# S1BY# S1BXA S2X

Figure 3: Tearing of panel using RS 8% model.

Test

63.6

RS (8%) 63.6

64.2

S3A 0.051 0.138 0.006 0.039 0.015 0.010 0.009

S3B 0.031 0.054 0.014 0.058 0.010 0.008 0.061

S3A 0.004 0.017 0.006 0.018 0.007 0.008 0.003

Table 1: Comparison of failure models with test data. ( * refers time of Is' weld failure in ms; A refers to compressive strain; * refers tensile strain )

594

61

64

Time (ms)

Figure 4: Longitudinal nominal strain for S3A

Figure 5: Longitudinal nominal strain for SI A

Both models predict closely the deformation and tearing behaviour of the panel with experimental observation, in addition of the 1st weld failure time. Both models, however, have predicted different locations of the initial failure. One possible reason is the sensitivity of the rupture strain model to the steep strain gradient due to the profile of the corrugations, on top of the fact that the spot weld model is not capable of predicting the through thickness strain variations in these regions. Initial results have also showed that forced-based failure criteria gives better strain predictions for the inner elements than the rupture strain failure model. The reverse is however true for the outer elements. 5

Acknowledgements

The authors are grateful to British Gas for the permission to publish their experimental data. References 1. Louca, L. A., Harding, J.E. and White, G., Response of Corrugated Panels to Blast Loading, Offshore Mechanics and Arctic Engineering (1996), Florence pp. 297-305. 2. Louca, L.A. and Friis, J., Modeling Failure of Welded Connections to Corrugated Panel Structures under Blast Loading, Offshore Technology Report, OTO 00088, (2000).

SIMULATION OF ACOUSTIC RADIATION AND SCATTERING USING BOUNDARY ELEMENT METHOD Institute of High Performance Computing, 1 Science Park Road #01-01 The Singapore Science Park II, Singapore 117528,Singapore

Capricorn,

Z. Y. Yan, K. C. Hung, H. Zheng yanzy @ ihpc. a-star. edu. sg Acoustic radiation and scattering in the unlimited exterior domain are numerically investigated using the composite Helmholtz integral equation. The hyper-singular numerical integral involved in the normal derivative equation of the conventional Helmholtz integral equation has been dealt with by applying a regularization formulation. The influence matrix corresponding to a composite integral operator is proved to be just the product of the two influence matrices corresponding to the two integral operators that construct the composite integral operator. Consequently, a new approach to deal with hyper-singular numerical integral is generated. To analysis the accuracy and efficiency of the new approach, several numerical examples are computed.

1 Introduction It is well known that the classical boundary element method in acoustics fails to provide a unique solution at certain characteristic frequencies [1]. To overcome the nonuniqueness problem, Burton and Miller [2] developed the composite Helmholtz integral equation (CHEE), which consists of a linear combination of the Helmholtz integral equation and its normal derivative equation. However the CHIE suffers from the main drawback of hyper-singular integral. A double surface integral method was used by Burton and Miller [2] to reduce the order of the hyper-singularity. But it was computationally inefficient to implement this method. In this paper, a high efficient approach based on this double integral method is developed to deal with the hypersingularity. 2 Theoretical Development Here the acoustic field in unbounded exterior domain is studied. To overcome the nonuniqueness problem, the composite Helmholtz integral equation (CHIE) developed by Burton and Miller [2] is employed. In operator notation [3], it can be

i / + M,+oW, U = k , + a i / + M[ represents the surface acoustic pressure, OC takes the value — ilk and k is wave number. The integral operators Lk, Mk, Nk and Mk can be expressed as

Lkfl=\ln(q)Gk(p,q)dSq

, Mk^ = \\^(q)dGk^q)dSq

Nkli = p

P I

595

(2)

596

Gk(p,q) = e~'kr/47tr,

r = \p-q\

(4)

where p and q are respectively the source point and the field point on the surface. The main drawback of the CHIE method is the numerical treatment of the hypersingular integral operator Nk. Burton and Miller [2] used the following regularization relationship to deal with the hyper-singularity. L,N0=M20~I

(5)

where L 0 , iV 0 andM 0 are integral operators identical to Lk, Nk and Mk except that the kernels of L0,NQandM0contain

G0(p,q) = \lA7tr rather than Gk. The

composite integral operator L0N0 is defined as

L0N0n(p) = JjG0(p, J J J 3 ^ 0 ^ ' f V ( 0 A \dSq

(6)

Then the following transformation was generated to remove the hypersingularity. + M20-±I

L0Nk=L0[Nk-N0]

(7)

The composite integral operators L0[Nk —N0] and MQ are double surface integrals. Therefore, It is very inefficient to numerically implement such an approach. 3 Discretization of the Integral Operators The integral operators are discretized using eight-noded, quadrilateral isoparametric surface elements. The integral operator Lk can be discretized as Lk(p = Bk{cp}

(8)

where, matrix Bk is defined as the discretized operator matrix of the integral operator Lk. B

«,m=

X

\\KiGkijdSq

(9)

">=f{j,')ASj

A new ideal is created to discretize the operator L0N0. Assuming that Viq) = \f§^mdSt

(10)

It can be discretized and expressed in the form of discretized operator matrix as {y/} = D0{n} (11) where, DQ is the discretized operator matrix of integral operator A^0. Substituting Eq. (10) into Eq. (6), we have

597

L0N0ll(p) = jJG0(p,q)yf(q)dSq

(12)

The discretization of Eq. (12) can be expressed in discretized operator matrix form as

£oM=£0{y}

(i3)

where, E0 is the discretized operator matrix of the composite integral operator L0N0 . Substituting Eq. (11) into Eq. (13), we have E0{II) = BQD0{H) (14) Because Eq. (5) is an identity formulation and fl is an arbitrary function, we have E0 = B0D0

(15)

Similarly, the composite integral operator M0 can be discretized as A^, where \ is the discretized operator matrix of integral operator M 0 . Consequently, Eq. (5) can be expressed in the form of discretized operator matrices as D0=B-\A^~I)

(16)

Now the double surface integrals in Eq. (5) have been reduced to the product of surface integrals. By applying the operation Nk =(Nk -N0) + N0, the hypersingular integral is reduced to weakly singular integrals, which can be solved using the integration scheme proposed by Lachat and Watson [4]. 4 numerical examples Several examples have been computed to validate the new approach. Because of the length limit, only one case for plane acoustic wave scattering from rigid sphere is presented here. Fig. 1 shows that half of a sphere surface is discretized using 416 elements. The dimensionless scattered acoustic pressures obtained using CHIE and HIE at r = 5a as ka — Tt are compared with the analytical solutions, see Fig. 2. Clearly, the results obtained using CHIE are unique and agree with the analytical solutions quite well. 5 Conclusions It is proved that a composite integral operator can be discretized and expressed as the product of the two discretized operator matrices corresponding to the two integral operators that construct the composite integral operator. Consequently, a high efficient new approach is developed to deal with hyper-singular numerical integral.

598

Fig. 1. Discretization for half of a

Fig. 2. The angular dependence of dimensionless scattered

sphere surface with 416 elements.

acoustic pressures at T — 5 CI as

References 1. R. D. Ciskowski and C. A. Brebbia, Boundary element methods in acoustics, Computational Mechanics Publications, Souththampton Boston, 1991. 2. A. J. Burton and G. F. Miller, "The application of integral equation methods to the numerical solution of some exterior boundary value problems," Pwc.R.Soc.London Ser. A323, 201-210, 1971. 3. I. C. Mathews, "Numerical techniques for three dimensional steady-state fluidstructure interaction," /. Acoust. Soc. Am. 79, 1317-1325, 1986. 4. J. C. Lachat and J. O. Watson, "Effective numerical treatment of boundary integral equations," Int.J.Num.Methods Eng. 10,991-1005, 1976.

NUMERICAL CHARACTERIZATION OF RC PLATE RESPONSE AND FRAGMENTATION UNDER BLAST LOADING K. X U A N D Y . L U PTRC, NTU, Singapore 639798 E-mail: [email protected] H. S. LIM Defence Science and Technology Agency,

Singapore

The risk of accidental explosion is present wherever ammunition is stored. A major harmful effect from such accident is the debris of the storage magazine, typical of which are box-type concrete structures. This paper presents part of a research programme aiming at investigating the response and break-up of concrete box structure under high explosive loading. The energy formulation and cohesive failure model is proposed. Numerical simulation is performed on representative elastic plates subjected to simulated blast loading. Characteristic responses, such as time histories of normal stress and shear stress are examined for purpose to understand the possible governing failure modes. The numerical results are used to evaluate the loading strain rate and dynamic material strength. Based on some simplified assumptions, the energy dissipates and nominal debris dimensions are estimated.

1 Introduction Experimental work on square and rectangular plates under blast loading has been conducted extensively (Nurick and Shave [5]; Olson et al. [6]). Numerical results have been published on failure of clamped, thin square, unstiffened plates (Rudrapatna et al [7]) and stiffened plates (Rudrapatna et al [8]) under blast loading. In this paper, a numerical investigation of the basic response characteristics of RC plates subjected to blasting loading normal to the plate face is carried out in order to lead to some understanding on the dominant failure modes and the underlying mechanisms. To observe the possible variation of dominant response with variation of loading parameters, in the analysis the plate is subjected to simulated blast loading with varying duration. For this purpose, some basic information on the blast loading characteristics is summarized first. Characteristic responses, such as time histories of normal stress and shear stress are examined to illustrate the possible governing failure modes and change of such modes with different explosive loading conditions. The numerical results also allow for an evaluation of the energy import and transformation, based on which an estimation of the nominal debris dimension, assuming that the dominant fragmentation is formed during the composite shock stage, can be established. The strain rate effects on the concrete behavior are taken into account.

599

600

2Blast shock wave and energy formulation for fragmentation A typical pressure-time curve for an explosive blast wave is shown in Figure 1. The negative pressure phase is not considered here, since most of the structural damage is due to positive phase (Baker [1]).

Ap„

Figure 1. Shock wave approximation by a straight line

As proposed by Brode [2], an empirical exponential form can be used to describe the explosive blast wave of the positive phase as Ap(t) = AP<1>(\-t/x)e-a"' (1) 2 in which a = 112 + Ap0 for Ap^ < 1 (kg/cm ) a = l / 2 + Ap o [l.l-(0.13 + 0.20A^(I))(^/T)] for 1 < Ap0 < 3 where, Ap(t) is the instantaneous overpressure at time t, A/70 is the peak overpressure when t is zero, x is overpressure duration. The peak overpressure and overpressure duration can be found in Henrych [3]. According to the principle of conservation of energy, the total internal energy Um absorbed by system includes work of deformation Ud, fracture energy Uc and kinetic energy [/*; hence, Uc=Um-Ud-Uk (2) The stress waves are responsible for the development of a damage zone and subsequent fragment size distribution, while the explosion gases are important in separation of crack pattern that has already been formed after the passage of the stress wave, and the subsequent throw of the fragments. It can be assumed that the formation of fragment sizes is completed upon the end of the blast shock wave. By ignoring the kinetic energy at this time, an upper bound of the energy available for fragment formation can be obtained. The cohesive fracture crack model can be expressed as: 0,

O„Au„=0

where h=-— w

(3)

601

where a and w are the critical stress and displacement, respectively. A«n is the crack opening displacement. The area under the tensile cohesive law is the fracture energy. For concrete plate, if linear variation of cohesive stress-crack opening displacement is considered, the energy required for the formation of opening crack surface can be expressed as: Uc=\st,ds

= \l;twj™odwdl =

-twLaw

(4)

where C, is energy change per unit area from cohesive crack into opened crack. L is crack length, tw is plate thickness. Considering the strain-rate effects, the static tensile strength should be substituted by dynamic tensile strength. The maximum amount of energy that can take in the form of strain energy is limited by the dynamic strength of the material and the corresponding fracture strain. Assuming that the total effective input energy is transformed eventually into the fracture strain and crack opening energy, subsequently the total crack length has. L - ^ =

(5)

If the dimension of plate length is b and width is a and a a

n-\

a

for [L - nx (a + b)\ < a (6) *2 ' n-\ n where h and l2 are the nominal fragment length and width, n is integer of L/(a+b). /, =

j

3 Numerical investigation and estimation of fragment size Ap

Mid-span

Nol

;

No4

;

No 7 -

Support No 7 ;

Nnlll-

Figure 2. Geometry of the RC plate under investigation (not to scale)

Computer program LS-DYNA is used to perform the numerical computation. The pressure loading from the explosive charge is assumed to be triangular shock wave (Figure 1) and uniformly distributed on the surface of plate. The material properties used in the calculation are as

602

follows: elastic Young's modulus E = 20 GPa; material quasistatic tensile strength o- = 3.86MPa ; material density p = 2427.516 kgm ; Poisson's ratio v = 0.3. For comparison purpose, the total impulse is kept constant (equal to 5 MPams) for different duration shock loading, hence three shock wave loadings are produced with maximum overpressure of 20 MPa, 10 MPa and 5 Mpa, respectively, while the correspond durations are respectively 0.5 ms, 1 ms and 2 ms. This roughly represents a 100kg explosive charge at a scaled distance of 0.5-1.0. Figure 2 shows the geometry of the plate under investigation. The width of the plate is assumed to be lm. Totally about 10,000 solid elements are meshed to model the plate in a 3-D model. From the Figure 3, the maximum normal stress takes place at the top layer and maximum shear stress is near the middle of the section. Clearly, the peak normal stress is much high than the shear stress in the entire elastic response. As the concrete tensile strength is smaller than that of compression, the actual failure would happen during the shock period, at which time it can be seen that both the normal stress and the shear stress are of similar magnitude; hence, the fracture will actually result from a combined tension and shear. From the elastic analysis results, the total energy absorbed by the plate at the end of shock wave loading are calculated to be 139 KJ, 127 KJ and 111 KJ respectively. Loading 20 MPi dunlion D.5 mi — Muimumihcudrcu (U middle thicknns of support) (HI top urfice of rnppon)

Figure 3. Maximum x-normal stress and xy-shear stress for 20MPa-0.5ms 1

The average loading strain rates are found to be 19.65 s" , 10.90 s"1 and 5.84 s"1 for the three shock loading cases, respectively. According to the dynamic strength model presented in Lu and Xu [4], the ultimate dynamic tensile strengths are found to be 16.87 MPa, 14.83 MPa and 13.01 MPa, respectively. Using the equations of (2), (5), and (6), the dimensions of fragment size can be obtained. The equivalent diameters of the fragment are about 0.065 m, 0.062m and 0.062m for the three loading cases, respectively.

603

4 Discussion The energy formulation and cohesive failure model are proposed for the prediction of fracture and fragmentation of RC plate subjected to blast shock loading. Numerical simulation is performed on representative RC plate under simplified blast shock pressure. Characteristic responses, such as stress time histories are examined to illustrate failure mode and the possible changes of such characteristics with the variation of the loading duration. The order of strain rate and magnitude of stress in the plate response is evaluated. The significance of the strain rate dependence of the material strength on the fracture RC plates is discussed. The analysis procedure can be applied to predict nominal debris size of RC plate structure. References 1. Baker, W. E., Explosions in Air. (University of Texas Press, London and Austin, 1973). 2. Brode H. L., Blast wave from a spherical charge. The Physics of Fluids 2 (1959). 3. Henrych J., The Dynamics of Explosion and Its Use. (Elsevier Scientific Publishing Company, 1979). 4. Lu Y. and Xu K, Numerical characterization of RC plate response and fragmentation under blast loading. Project Report. (Protective Technology Research Center, Nanyang Technological University, 2002). 5. Nurick G. N. and Shave G. C , The deformation and tearing of thin square plates subjected impulsive loads-an experimental study. International Journal of Impact Engineering. 18 (1996) pp. 99-116. 6. Olson M. D., Nurick G. N. and Fagnan J. R., Deformation and rupture of blast loaded square plates-predictions and experiments. International Journal of Impact Engineering. 13 (1993) pp. 279-291. 7. Rudrapatna N. S., Vaziri R. and Olson M. D., Deformation and failure of blast-loaded square plates. International Journal of Impact Engineering. 22 (1999) pp. 449-467. 8. Rudrapatna N. S., Vaziri R. and Olson M. D., Deformation and failure of blast-loaded stiffened plates. International Journal of Impact Engineering. 24 (2000) pp. 457-474.

DYNAMIC ANALYSIS OF BRICK-CONCRETE STRUCTURE BY USING WILSON-9 METHOD D . M . HOU, Y. B. W A N G AND M. YIN School of Civil Engineering and Mechanics, Xi'an Jiaotong Xi'an, 710049, P. R.. China E-mail: [email protected]

University

X.Y. MA Mathematics Division, Xi'an Electrical Power Xi'an,710032, P.R. China 710032

College

In this paper, the Wilson-0 integration method is thoroughly studied. By using Wilson-8 integration method and single particle shear model, the dynamic response of brick-concrete structure is given. The simulation results of dynamic response of brick-concrete structure under sine wave load are compared with analytical results, which are available in literatures. From this study, it demonstrated that the present approach of combining Wilson-6 integration method and single particle shear model method is a useful technique in the dynamic and vibration analysis of brick-concrete structure. Furthermore, by optimizing and selecting suitable parameters of Wilson-8 method, this approach also can be used to analyze the seismic dynamic response analysis of the brick-concrete structure.In traduction

1

Introduction

There are many dynamical problems in engineering application that cannot be solved by analytical method of mathematics. Such as, building response under complex earthquake wave loading. In this case, numerical method, such as step integral, is needed. As a consequence, the problem of precision and stability emerges. In this paper the precision and stability of Wilson-0 method, which is used for an analysis building under a sine wave loading, was studied. 2 2.1

Wilson- 9 Method Hypotheses and Mathematical Model

x

The dynamical shear model of single freedom is illustrated as figure 1. m is the mass of the particle, k is the lateral stiffness of rod, xg is the horizontal acceleration of earthquake and x is the horizontal displacement of particle. The dynamical equation of particle can be expressed in Eq. 1 (no damping case). mx + kx = -mxg

1

WVyW^ —*^~ x F l 1 Shear m o d e l

« -

where x and x are horizontal displacement and acceleration of particle relative to the earth. According to the hypothesis of linear step integral method, x will change linearly between the period of time from t to t + At. Furthermore, the acceleration of particle is expressed at the moment of t + x ( 0 < T < A?)*f+r=*,+T(*,+At-x,)/Af

2

In 70's Wilson improved this method and hypothesized that x will change linearly between the period of time from t to t + 6At. Therefore, the acceleration of particle is expressed at the moment of t + T ( 0 < T < GAt), 604

605 x

,+r = x,+

T

(^+eA/ -x,)/6At

3

It is proved that this method is absolutely stable when 0 > 1.37 . Obviously it is the linear step integral method in case of 0=1. Generally, this method, which is called Wilson- 9 method, is always used in engineering by using 0=1.4. 2.2

Integration Equations

If the state of the particle are known at the moment t state of particle can be described by Eq. 4 at t + At . X

t+At

X

t+Al

=

X

I

+

\Xt+d&t

~

X

I ' ' "

=~ xt + xtAt + (xt+m

-

x,)Atl(20)

X

,+A, =x<+ XA*+XA*212+(*,+<*

where x t+etj

- x, ) A ' 2 / ( 6 0 )

+dAtxt+\62At2xt)

F-k(x, m+

\d2At2k

F = -mxg (t) + [-mxg (t + At) + xg (t)]6 where At represents the increment of integral time for one step, increment 2.3

Analytical

Solution

If the acceleration of earth is a sine wave the dynamical equation of particle can be expressed as in Eq. 7. m'x + kx = -mxg = A sin cot 1 where, 0) is the frequency and A is the amplitude of the sine wave. Furthermore, the state of particle can be expressed in Eq. 8 at t, m Ik Aco sin J—t + sin cot x, = k — mco k V m k — mco Aco - Aco T -cos cot x, = k - mco2 -cos —t + mzj' m k ACO Aco k_ -sin cot sin x, = -mco2 m mco

3 Simulations Numerical results of single freedom, are showed in figure 2., using Wilson- 9 method (Eq. 4) and also analytical method ( Eq. 8). Moreover, the comparisons of errors between the two methods are given. The parameters of the model were selected as: m-20kg , k = 1000 N Im , xg=Bsmox , B = 0.25m/s2,

co = 2nf,

f = 0.2 Hz.

Acceleration (m/s ) "Wl son Sal vt b Time(s)

Fig 2. Response with 6 =2.0

606

3.1 Error due to 9 Value In order to examine the error caused by the choice of 9 , calculations were carried out when 9 was changed from 0.5 to 2.0. All the calculations had the total integral time t = 15 second and an integral increment A t = 0.01 second. The acceleration response of the particle is illustrated in figure 2 with 9 =2.0. In the discussion, the following subscriptions are used: 'w' for Wilson- 9 method, 'p' for analytical method, 'i' for the number of integral step and 'n' for the total number of integral steps.

\x... -

JC„

p\_

is the

Displacement (

Acceleration ( m/s 2 )

absolute error of displacement at i step and

-|v6R

the Mean Square Root (MSR) error of displacement is the and MAX absolute MAXimum (MAX) O.B error of displacement. The °" statistical error of simulative calculation is illustrated in figure 3. Furthermore,

Fig 3. Absolute error Percent 30

represents

25

the relative error of displacement at i

15

\(xw —x)/x\X

100%

-displacement -velocity

20

acceleration

10 5

step and

^S,\(xw

-xp)/xp\

\ln

0 0.4

represents the average relative error of displacement. Zero value exists in academic solution so that the Percent error is added up if ABS (xp)

> MAX (xp)x

5%

•

0.6

0.8

1.0

1.2

1.4

1.6

1.1

Fig 4 . Averaee relative error Acceleration ( m/s ) 2. 50E-04 "MSR I "MAX L

1. 50E-04 The result of average relative 1. 00E-04 error is showed as figure 4. As shown in figure 3 and figure 4, 5. 00E-05 the error is minimal when 9 is about 0.79. In order to show it 0.700.750.80 0.85 °- 7 0 0-75 0.80 0.85 Fl 5 b g Absolute error clearly, the error of acceleration is Fig 5a . Average relative error Fig 5. Error of acceleration only illustrated in figure 5.

Ve

3.2 Errors about Increment In order to examine the influence of increase, calculation was made at 6 — 1.4 with At hanged from 0.001 to 0.05. Only the error of acceleration is considered below.

607

Error

Error 6.E r 0i»_ MSR

.Increase 0. 00 0. 02 0. 04 0. 06 Fig 6. Absolute error of acceleration

0. 000

Percent

Percent

0. 008

Increase 0.02

0.04

0.06

Fig 7. Average relative error

The statistical absolute error of simulation is illustrated in figure 6 and the average relative error is illustrated in figure 7. 4 Stability of Wilson-6 In most cases, an earthquake persists about 60 seconds, therefore the stability of Wilson- 9 method is only needed to be checked at this period. As a result showed above, the optimized selection of parameter of Wilson-0 method is 0 = 0.79 and At < 0.004 . The result of simulation is expressed in table 1 in the case of 6 = 0.79; At = 0.004 and total integral time = 60 second Table i.d = 0.79; At = 0.004; Integral time=60

DISPLACEMENT VELOCITY ACCELERATION

5

MAX VALUE

MSR ERROR

MAX ERROR

6.078E-3 1.296E-2 5.400E-2

3.780E-8 2.611E-7 3.632E-6

9.927E-8 6.528E-7 7.135E-6

AVERAGE RELATIVE ERROR

0.0013% 0.0055% 0.0120%

Conclusions

(1) Wilson- 9 method is adopted that 9 is better to be decided about 0.79 and the result of simulation is more accurate. (2) The integral step of Wilson- 9 method is better to less than 0.004 second. (3) For seismic analysis, it is stable that 9 =0.79 and At < 0.004 to adopt Wilson- 9 method. References Mostaghel,N.and Khodaverdian ,M., Seismic Response of Structures Reported on R-FBI System, Earthquake Eng. Struct.Dyn., 16 (1983)pp.33~56. 2. Xing,L.P., Effective location of active control devices for building vibrations caused by periodic excitation acting on intermediate storey, Earthquake Eng. Struct, Dyn.. 2 (2000)pp.l77~193. 3. Yu, M.H.,Ma,G.W.,Wang,Y.B., etc, Seismic analysis of the fundamental isolation of brick masonry building, Learned Journal of Construction, 4 (1996) pp.52-59.

A NEW COMPUTATIONAL MATHEMATICAL MODEL OF HYDRAULIC DAMPER Y. B. Wang and D. M. Hou School of Civil Engineering and Mechanics, Xi'an Jiaotong Xi'anJ 10049, P. R. China Email: dmhoe @mail.xitu. edu. en

University

This paper presents a new computational mathematical model of hydraulic dampers for cars. Through simulation and measurement for the dampers, the velocity force curve of damper was derived. According to the curves, a new four linear damper model was established. As an example, the car was simulated by using this new four-linear damper model and equivalent linear model. From simulation results, it can be shown that the present new damper model is more accurate than equivalent linear damper model for simulating the dynamic characteristics of damper for the car. As four-linear model is asymmetric about velocity, it can more accurately represent the actual performance of damper in car system. From this study, it can be found that the new computational mathematical model for damper can provide more useful information for the car damper design.

1

Introduction

Suspension system is most main system of vehicle and its dynamic characteristic is important for riding comfortability and running stability of vehicle [4]. The system must have the capability to reduce and absorb vibration coming from road, moreover damper can carry out this mission. So many researches about damper were made in last years and many mathematic damping models were presented. Such as equivalent linearization damping model(ELDM), nonlinear hysteresis loop damping model, WEN model and Binghan model of electro-rheological fluid damper and magneto-rheological fluid damper[l]. All these models are symmetrical model, viz force of damper is symmetrical about positive and negative velocity. But in some cases, unsymmetrical damper is beter than symmetrical damper. In this paper an unsymmetrical damping model is proposed and called four-linear damping model(FLDM). After numerical simulation a comparison of symmetrical damping model and unsymmetrical damping model is given [2]. 2 . Dynamical Model of 1/2 Vehicle System I

The dynamical model of 1/2 vehicle suspension system is illustrated in Fig l.[3]. m, ,k wl ,Fcl and m 2 , k w 2 , Fc2 are masses, stiffness, forces of wheel 1 and wheel 2, respectively. m3 is mass of vehicle body and I is moment of inertia about center of

608

mi

TTHT'gQl

ml

^

Figure 1. 1/2 Vehicle Model

609

vehicle body, f ^ and f2 are the road displacements inputting to suspension system from wheel 1 and wheel 2, respectively, a and b are horizontal distances of wheel 1 and wheel 2 to the mass center of vehicle body. X!, x 2 and x3 are vertical freedoms of particles. 9 is the slope of vehicle body. From the model drawn in Fig 1, the relative displacements and velocities of damper 1 and damper 2 are expressed as follows: D x l = a9 + x 3 - x j D x2 = -b6 + x3 - x 2

Vxl = a9+ x3 - x v x 2 = - b e + x3 -

where D x l , V x l a n d D x 2 , Vx2 are the relative displacement,velocity of damper 1 and damper 2. Dynamical equations of suspension system are expressed as follows: fi)k w 1 (x m i A 1 = D vx l, Kk ,1 + F cl J ( X f2)k w 2 2 m 2x 2 D x2 + F c2 m 3 x 3 = - D x I k , - Fcl - D x 2 k 2 - Fc2 v 16 = -aD x l k [ - aF c l + bD x 2 k 2 + b F c 2 where x , x and x are acceleration, velocity and displacement of particles. 8 is angular acceleration of vehicle body. And expressed in matrixes as follows: M x + C ( x ) + Kx = F ,

M=

"m,

0

0

0 0 0

m2 0 0

0 m3 0

0" 0 0 I

'

F -

V wlfl

k w ] + k]

0

0

kw2 + k 2

K=

Kj

-akj

— K

2

bk2

k

w 2

Ki

f

2

0

0}T

— kj

-ak.

-k2

bk 2 ak t - b k 2 a2k,+b2k2

TXO

ak[ - b k 2

C(x) is function of damper force. 3 . Mathematical Model of Damper 3.1 Experimentation of hydraulic damper A hydraulic damper used in the front and the back of car was investigated on MTS (called front damper and back damper). Fig 2 shows the experimental results which the input displacements are sine waves. It shows obviously that the force is unsymmetrical about velocity. The maximum positive damping force is about lkN and the minimum negative damping force is about -0.2kN. (Fig 2a., the amplitude and frequency is 4mm and 5 Hz, 40mm and 0.5 Hz. Fig 2b., 20mm and 1 Hz, 4mm and 0.5 Hz, respectively.)

610

3.2 Four-Linear Damping Model The results of Fig 2. indicate that the damping force about velocity is nonlinear, especially the damping force about positive and negative velocity is unsymrnetrical. So the ELDM is not suitable for this case. Base on this point, the FLDM is presented as Fig 3. where, DB and AC are horizontal beelines, OB and OA diagonal beelines. Point A and B is the saturated point of positive and negative velocity, they can be determined by the experimental curves of damper. The saturated point means that the damping force does not increase or very little while velocity is increasing from this point. Point A and B can be determined by positive and negative saturated velocity Vi and v2 and maximum and minimum damping force Fcri and

Force

-150

.2" -0.4" -0.6 (a) Front Damper Force

Velocity (mm/s) -200

FCr2-

In order to compare the ELDM to FLDM, the ELDM is supposed that equivalent viscous damping coefficient and damper forces are cj ,c 2 and Fcl , Fc2 of damper 1 and damper 2, respectively, cl 'xl X c. Fc2 = V x 2 X C 2

200

.2 -0.4 (b) Back Damper

Figure 2. Velocity and Damping Force k(kN)

Figure 3. Four Linear Model

l

and the FLDM, damper forces are Fcl and Fc2 as follows,respectively, r

cr2 V'xl v 1 x F a2> rr7/VV'xl„ x F rcrl' r,/V '1 r

crl

Fcr2

(V x , =< V 2 ) (V 2 < V x l =< 0) , (0
r

c2

(Vx2=
Vx2xFcr2/V2

(V2
Vx2xFcrl/V,

(0
Fcrl

(*)

(Vl
4 . Numerical Simulation of Suspension System According to 1/2 vehicle dynamical model (Fig 1), two simulation systems (ELDM and FLDM) were established in MATLAB SIMULINK. TWO kinds of road displacement functions(RDF) were input to the system, which was sine and pulse wave. The parameter of vehicle suspension system is showed in Table 1.

611

Table 1. Parameter of Vehicle Suspension System m^g

m2/kg

40

40

m3/kg I/kgm2 k wl /Nm'' 730

1230

175500

lq/Nm"1 k 2 /Nm 1

kw2/Nm"'

17500

175500

a/m

b/m

1.0

1.0

17500

According to the maximum acceleration (MA) and mean-square root of acceleration (MSRA) of vehicle body are the index of comfortability and stability of vehicle suspension system[5]. A good vehicle suspension system can reduce the value of MA and MSRA of vehicle body, so it is researched on MA and MSRA of vehicle body. 4.1 Results of Sine Wave When RDF inputting to the simulation system is sine wave, the expressions of functions of wheel 1 and wheel 2 are given as follows: 'fl = 0.02 sin( f2

2n£lt)

= 0.02 sin( 27tQt)

where Q. is frequency, which changes from 1 Hz to 9 Hz, amplitude is 0.02m. For the ELDM, equivalent viscous damping coefficient c, =c 2 =1290 Nsm"1, the damper force FC1 = v x l x Cl , Fc2 = Vx2 xc 2 .And the FLDM, the saturated point A and B is defined as Vi=0.05ms_1, Fcri=1000N, and V2 = —0.05ms"1, Fcr2= —200N. damper forces are determined by Eq. (*). As the frequency of inputting function is 8Hz, the acceleration, MA and MSRA of Acceleration (m/s ) vehicle body are shown in Fig 4. and Fig 5.,respectively. As the results of simulation, the FLDM is better than ELDM,and also FLDM is not sensitive to the time (s) frequency of RDF, and the Figure 4. Response of acceleration FLDM can be used in wider frequency band of RDF. 2 4.2 Results of Pulse wave The RDF is pulse wave. While the amplitude of pulse is 0.08 meter, the acceleration, and MA and MSRA of vehicle body are shown in Fig 6. and Fig 7., respectively.

MSRA (m/s )

MA (m/s 2 )

f*r

frequency (Hz) 5

6

7

8

9 10

0

1

freqyency (Hz) 2

3

4

5

6

Figure 5. Response of vehicle body

7

8

9

10

612

Acceleration 20 10

MA (m/s 2 )

(m/s 2 )

ELDM — FLDM

0 -10 -20

time ( s ) 0

1

2

3

4

5

Figure 6. Acceleration of vehicle body at amplitude=0.08m

Amplitude (m) 0.08

Figure 7. Response of vehicle body

Obviously the FLDM'S MA and MSRA of vehicle body is smaller than especially when the amplitude of RDF increasing.

ELDM'S,

Conclusions •

• •

Adopting the FLDM suspension system of vehicle can be effective on wide frequency band of road displacement,and the FLDM is more stable than ELDM on the frequency of road displacement. The new FLDM can reduce obviously the MA and MSRA of vehicle body. The new FLDM can provide useful information for the car damper design.

References 1. Choi, S.B.,Choi,Y.T. and Park,D.W., A sliding mode control of a FullCar electrorheological suspension system via hardware in-the-loop simulation, J. of Dynamic SystemsMeasurement and Control, 122 (2000) pp. 114-121. 2. KiduckKinandDoyoung Jeon, Vibration suppression in an MR fluid damper suspension system, /. Intell. Mater. Systems and Struct. 10 (1999) pp.779-786. 3. Lei, Y.C., Dynamics and Simulation of Vehicle Systems (National Defence Industry Press ,Beijing,1997 ). 4. Weng, J.S., The Semi-Active Control of Vehicle Suspension Systems Based on Magnettorheological Damper (PhD Thesis, Nanjing University of Aeronautics and Astronautics ,2001 ). 5. Yu, Z.S., Theory of Vehicle (Mechanical Industry Press,Beijing, 1985 ).

BROADBAND ECHOES FROM UNDERWATER TARGETS HENRY LEW School of Electrical and Electronic Engineering, Nanyang Technological Nanyang Avenue, Singapore, 639798. E-mail: EHLew @ ntu. edu. sg

University, Block S2,

BINH NGUYEN Defence Science and Technology Organisation, PO Box 1500, Edinburgh, SA 5111, E-mail: [email protected]

Australia.

Keywords: scattering, underwater acoustics, broadband, active sonar Models of acoustic scattering that predict the echo time history of targets are essential tools for the development of signal processing algorithms in underwater active detection systems. Realistic simulation of echoes requires a multi-frequency evaluation of acoustic scattering. This can be achieved by numerically modelling the surface of the object of interest as a collection of facets and calculating the scattered field using the Helmholtz-Kirchhoff approximation. Time and frequency domain analysis of a simple object (e.g. a sphere) and a complex structure using this technique in monostatic and bistatic configurations are given as examples.

1

Introduction

The development and evaluation of signal processing algorithms for underwater active detection systems can be greatly enhanced in terms of robustness and accuracy if realistic signal/target models are used. In the past, very simple models were used for algorithm development because realistic models were computationally costly and of little benefit to lowresolution systems operating in benign environments. For example, the target echo of an echolocation system was modeled as an attenuated timedelayed and Doppler-shifted replica of the transmitted signal. However, measurements of actual target echoes have shown this to be a first order approximation at best, even for very simple targets such as spheres. With recent advances in computing software and hardware, more realistic modeling has become feasible and cost effective. In this paper we show some results of high fidelity modeling of echoes from underwater targets that are broadband in nature, to match what is possible in actual systems. Many previous results in this area have concentrated on single frequencies, narrowband approximations, or averaged/integrated quantities such as Target Strength [1], rather than the echo time series. The rest of the paper is organized as follows. We first review the modeling methodology, and then present the case of a sphere

613

614

followed by some results from the scattering of a target complex. The results are presented in both the time and frequency domains. Finally, the paper concludes with some comments and observations. 2

Model of Acoustic Scattering

The model for acoustic scattering [2] is based on the evaluation of the Helmhotz-Kirchhoff integral over the surface of the object under consideration. The surface of the object is modelled by a mesh of triangular facets. The Helmholtz-Kirchhoff integral can be evaluated analytically for each triangular facet, which helps reduce the amount of computation needed. The total scattered field from the target of interest is then obtained by a coherent summation of the scattered field from all the individual facets. The material properties of the target are encapsulated in the local reflection and transmission coefficients of the target. Additional computational complications such as multiple transmission layers, scattering from several layers, hidden surface removal, and first-order multiple scattering can be included, if necessary. Note that, at lower frequencies, diffraction of sound around components of the object can be significant, and is only crudely approximated. However, the diffraction that gives rise to the forward scattering lobe for bistatic calculations is fairly well approximated. The model has been well tested against known results for the monostatic scattering of basic shapes. However, no rigorous tests of a target with a high level of complexity are available. Currently, for large size complex targets, the model is expected to give less accurate results at the lower frequencies, where structural resonance and diffractive effects become important. At the other end of the spectrum, there is no high frequency limit of validity since the technique is intrinsically a high frequency one. In practice, however, the upper frequency limit is determined by the need to accurately represent the target surface by plane facets to some fractionalwavelength accuracy. Therefore, surfaces with a large degree of curvature need more facets to achieve a given accuracy, and hence a longer computation time is required. Target scattering over a band of frequencies can be characterized by the target's impulse response or transfer function. The two different, but equivalent, representations are related as follows:

615

hr{t)= \HT{f)el2*df

HT{f) = l4~l,

and

J

(1)

Pi\f)

where /?,(/) and ps{f) a r e the Fourier transforms of the incident and scattered pressures over the frequency band of interest, respectively1. In the following, examples of both representations of the target response will be shown. 3

Rigid Sphere

The sphere is very useful for verifying the accuracy of numerical models because it is one of the few shapes that have analytical (closed-form) solutions [3] for wave scattering. In order to illustrate the accuracy and the limitation of the numerical model, the scattering from a rigid sphere of radius, a, was calculated and compared to the analytical result. A mesh of over 84,000 triangular facets represented the surface of the sphere. In this calculation, the source and the receiver were placed at a large distance away from the sphere to achieve far-field conditions. The receiver was stepped at 1-degree increments counter-clockwise around the target. The sphere was assumed to be in seawater with a sound speed of 1500m/s and density of 1026 kg/m . The scattered field was then computed as a function of bistatic angle over a band of frequencies such that its wavenumber2, ka, varies from 17 to 42. The results, given in Fig. 1, show that the numerical model is fairly accurate except for when the bistatic angle approaches the forward direction (0=180°), where diffraction effects become important.

| 0

j 50

| 100

\ ^ 150 200 BiUlicwigtoftleg)

, 250

; 300

Lj 3S0

0

100 *00 0«l. BWHIe inO» <0«0)

0

1M MO J B.ltiUci4a«fc 3 )

Figure 1. The bistatic target strength and the magnitudes of the transfer function and the impulse response of a rigid sphere calculated using numerical and analytical methods.

Note that the transfer function and the impulse response are also functions of the distance between the target and the receiver measuring its scattered field. 2 k is the acoustic wavenumber k=2n/X

616

4

Target Complex

The main reason for the development and use of numerical models is that analytical solutions do not exist for complex structures. By the same token, this makes it difficult to directly verify the correctness and accuracy of these models. However, the following examples will show how confidence in the results can be gained by considering both the time and frequency domain representations of the target response. The target under investigation is a submarine-like structure built from a combination of simple shapes (e.g., cylindrical hull, spherical end, conical tail and airfoil sail/fins). For simplicity, the structure is assumed to be rigid. The geometry of the target complex is shown in Fig. 2. The major dimensions of this structure are the length, L, and the width, 2a. Over 24,000 facets were used to model all the parts of the structure. W&guUuTflhlft

•*•

jflMT A

Figure 2. Different views of the target complex.

The transfer function and the impulse response for monostatic (backscattering) and bistatic scattering, as a function of target aspect and bistatic angle, are shown in Figs. 3 and 4, respectively. Note that the transfer function confirms that the hull specular at broadside is independent of frequency. The impulse response, on the other hand, reveals the highlight structure of various components that make up the target complex. For sufficiently high frequencies, the relative time delays of these highlights are found to be consistent with what is expected from simple geometrical considerations. All these are indications that the model is doing what it should, and thereby providing the user with some confidence in the model.

m nfi

Figure 3. Monostatic frequency and time domain responses of the target complex.

617 T r a i l staction(Gmrfc5iiinwrtno}-Maptuii(ilB)

lmH"S«ip«iM(0»o«rti:Bii™ilB)- M«rtlud»(dB}

Figure 4. Bistatic frequency and time domain responses of the target complex.

5

Conclusions

The transfer function or impulse response contains all the information that can be revealed by a scattering experiment over a frequency band of interest. Under certain conditions, the features of the impulse response can be related to the physical attributes of the target, such as its size and construction. Once either of these functions is known, then it is relatively straightforward (at least in principle) to calculate the target response for any arbitrary input waveform. This is particularly useful when time series simulation of the scattered field is required. Note that even though the transfer function and the impulse response contain the same information, sometimes certain features of the target response are better revealed by one representation over the other. This paper has shown an example of a target complex in which the time domain target highlights gave a more physically intuitive interpretation of the results. References 1. Urick, R.J., "Principles of Underwater Sound", 3rd Ed., Peninsula, 1983. 2. MacGillivray, I.R., Model(V.3) Software, DSTO. 3. Anderson, V.C., "Sound Scattering from a Fluid Sphere", J. Acoust. Soc. Am. Vol. 22, No. 4, pp. 426-431 (1950).

Competing Risks For Reliability Analysis Using Cox's Model F. A. M. Elfaki, I. Daud, N. A. Ibrahim, M. Y. Abdullah and I. Lukman Department of Mathematics.Faculty of Science and Environmental Studies, Universiti Putra, Malaysia, 43400, Serdang, Selangor, Malaysia

Abstract Weibull distribution as the basis of reliability function is generalized by introducing an additional shape parameter. The use of an algorithm based on Cox's proportional hazard specifically developed for this model, is illustrated. The usefulness and flexibility of the distributions are also illustrated by analyzing the multiple stress data sets from Crowder et al (1991). In addition simulation data are generated to further illustrate the idea. The parametric Cox's model with Weibull distribution shows similar results as Cox's with exponential distribution, especially for a sample size greater than 40 based on EM algorithm. The modification of the model of both distributions is considered.

1. Introduction The theory of competing risks is applied in the analysis of reliability and survival data involving several different failure types or risks. In an industry, for instance, one might distinguish between a mechanical device failure attributable to a component that has failed and those due to unrelated causes. This constitutes the different risks under consideration. Typically, the data include the time of failure or censoring of each individual, as well as an indicator of the type of failures. To assess the effects of covariates on cause-specific hazards, one can perform a parametric Cox's proportional hazards model, treating failure types which are of interest as censored observation ([5], [2]). A general model is adopted in this paper which incorporates most of the widely used life stresses. The model can be used for single or multiple stresses. Under this formulation, the model can be either solved as a Proportional Hazards Weibull Model (PHW) or Proportional Hazards Exponential model (PHE). 2. Methodology The proportional hazards (PH) regression model is commonly used in the analysis of survival data and, recently, there has been an increasing interest in its application in reliability engineering. Following [1], we will focus on a particular model that is k(t/z) = \(t)exp(z8) (1) where P=((3i,...,PP) is a vector of regression coefficients, t is continuous random variable representing an individual's lifetime, and z = (z],...,z ) is vector of regressor variable associated with the individual. Model (1) is flexible enough for many purposes. The modification of model (1), can be represented as, h(t/z) = h0(tXTsexp(&)) (2)

618

619

where Ts is the censored time to failure. Model (2) is not limited to nonnegative (3 s or categorical covariates and has a very interesting contrast with model (1). The full likelihood based on the data (t^S^Zj), i = 1,2,...,n, is given by [6] and [4], as follows:

LfQ) =

f[f(ti;B,zlf'R(ti;B,zJ-*'

(3)

where 8j's are the event indicator variables (b i =1 if the ith subject fails; 8; = 0 if the ith subject is censored), 0 is a parameter that indexes the density function and zi are the vector covariates for the ith subject. 2.1. The PH Weibull Model The Weibull distribution is commonly used for analyzing lifetime data. In other words it is assumed that the baseline failure rate in equation (1) is parametric and is given by the Weibull distribution. In this case, the baseline failure rate is given by: h0(t) = T\arl(t/ap-lexp[-(t/a)r[\ (4) where a is the scale parameter depending on z and rj is the shape parameter. The reliability function can be derived as, (5) The likelihood function for PHW model, as follows: \

PJ^T1 exp

l = Yjn i=\

0 exp • r V -

-IS

S,k-C

\k=0

Z°JtJ

(6)

JJ

2.2 The PH Exponential model In reliability studies, the exponential distribution plays an importance role of analogous to that of the normal distribution in other areas of statistics [7]. The baseline failure rate can be defined by:

\{i) = a 'exp

^ie'z'

•t

(7)

a Log-likelihood function for PHE model can be written as: f

l = Yjn

( (

-2L

V\

m

exp

exp V*=o

(8)

-T:e>'° k=\

620

Note that, if we substitute (3 = 1 in the likelihood function for the Proportional hazards Weibull (PHW), it will become similar to the likelihood function for the Proportional hazards Exponential (PHE) model. Table 1: Results from simulations study comparing model (1) and (2) with Weibull distribution, based on EM algorithm

Sample Size 15

Cen %

25

Method

Parameter

Mean

Bias

RMSE

(2)

4 e2 fi

-26.000 -5.8060 1.1336 -25.922 -25.585 6.8047 -32.052 -0.2858 1.0029 -32.052 -0.2858 1.0008

-25.922 -5.8059 0.1336 -25.923 -25.585 5.8047 -32.053 -0.2858 0.1119 -32.053 -0.2858 2.1011

31.591 30.854 0.5867 31.591 31.369 6.98096 3.2683 0.3006 0.1132 3.2683 0.3006 7.2416

%

15

25

(1)

100

25

(2)

100

25

(1)

e2 fi 4 e2 fi 4 &2

fi

3. Simulation Data The objective of this simulation study is to compare the mean, bias, and the root means square error (RMSE) obtained from fitting model (1) and (2) based on the EM algorithm method. The simulation data is generated from Kevlar 49 Failure Data [3] with two covariates (stress and spool). All data generation is carried out by SAS program. The data generated is run 1000 times for every sample size corresponding to the different percentage of censoring. The result of this studies are shown in Table (1) and (2) for PHW and PHE, respectively. As mentioned before in section 2.2, to obtain PHE, we need to substitute (3 = 1 into PHW (equation (6)). From the simulation study we conclude that, for sample size greater than 40, as can be clearly seen in Table (1). Table 2: Results from simulations study comparing model (1) and (2) with exponential distribution, based on EM algorithm

Sample Size

Cen

15

25

Method (2)

Parameter

Mean

Bias

RMSE

3

-4.8554

02

-10.000

-5.7554 -10.000

1.8423 8.6101

621

15

25

(1)

100

25

(2)

100

25

(1)

4

-4.8554

-5.7554

5.8259

02

-10.000

-10.000

8.7101

3

-32.052

-32.053

3.2683

o2

-0.2858

-0.2858

0.3006

3

-32.052

-32.053

3.2683

e2

-0.2858

-0.2858

0.3006

4. Conclusions Two different lifetime distributions of the competing risks model via Cox's model, namely Weibull and exponential with censored data is presented. The modification of the models for both distributions is considered. The EM algorithm is used to estimate the parameters. It is observed that Weibull distribution describes well the nature of the model concerned as compared to the exponential distribution. It is also observed that the EM algorithm behaves, reasonably well in the estimation of the parameters concerned and provides consistent estimates of for both formulations. For sample size greater than 40, both PHW & PHE give similar result. However when the sample size is less than 40, we cannot draw the same conclusion. More work is needed to determine the efficiency of both models for smaller sample sizes. References [1] Cox, D. R. "Regression Models and Life Tables (with discussion)., JR. Statist. Soc. 34, (1972) pp. 187-220. [2] Cox, D. R. and Oakes. D., Analysis of Survival Dat, London: Chapman and Hall (1984). [3] Crowder, M. J., Kimber,. A. C , Smith. R. L., and Sweeting. T. J., Statistical Analysis of Reliability Data, London: Chapman and Hall, (1991). [4] Kalbfleisch, J. D and J. F. Lawless , Estimation of Reliability in FieldPerformance Studies, Technometrics, 30, (1988) pp.365-388. [5] Kalbfleisch, J. D and R. L. Prentice, The Statistical Analysis of Failure Time Data, New York: Wiley (1980). [6] Lawless, J. F., Statistical Methods In Reliability, Technometrics, 25, (1983)pp.305-335. [7] Mann, R. N., Schafer, E. R., and Singpurwalla, D. N., Methods for Statistical Analysis of Reliability and Life Data., John Wiley and Sons, New York (1974).

PARALLEL MULTIBODY DYNAMICS USING THE MESSAGE PASSING INTERFACE B. FOX AND F. J. WELNA Parallel Computing Research Group, School of Electrical, Electronic and Computer Engineering, The University of Western Australia, Crawley WA 6907, Australia. Email: {budfox, welna-fj} @ee. uwa. edu. au D. J. LILJA Department of Electrical and Computer Engineering, Institute of Technology, University of Minnesota - Twin Cities Campus, 4-174 EE/Csci Building, 200 Union Street S.E., Minneapolis, MN 55455, USA Email: [email protected] L. S. JENNINGS School of Mathematics and Statistics, The University of Western Australia, Crawley WA 6907, Australia. Email: [email protected] The multibody modelling and computation of an arbitrary length pendulum system is investigated. The equations of motion are cast as either Differential Algebraic Equations (DAEs) or the underlying Ordinary Differential Equations (ODEs) with augmented constraint equat ions, and are computed using the Differential Algebraic Equation System Package (DASPK) [1] and the Livermore Solver of Ordinary Differential Equations with Automatic method switching and Root finding (LSODAR) [2] respectively. Coarse-grain parallelism is implemented through the use of the Message Passing Interface (MPI) [3] library and two different architecture types are compared.

1

Equations of Motion

From Kibble's [4] treatment of variational calculus, one may recall encountering variational changes of the function f(t,qj,qj)

of the independent variable t and the

generalized coordinates q{, for i = \,...,n . The stationary integral of this function led to the Euler-Lagrange equations df dqt and if the kinetic energy 7J =—mq{t)

d df dt dqi

„

of an arbitrary body is substituted for

f(t,qj,qj)

and the stationary integral of this function sought, then one may obtain the Lagrange equations in the form of kinetic energy, that is,

dT dqt

(2)

Hr=^ dq

t In the study of planar multibody dynamics, an expression for the kinetic energy of an arbitrary body may be written according to Shabana [5] as

1 -r

•

1

Ti=-RimRRiRi+-me6:idi

622

-2

,

(3)

623

where qi = (Ri,9i)T,

# « M , = j piIdVt = mt I, wee,- = Jp,«, 7 M, dVt , mt, p, and Vt

are the body mass, density, and volume respectively, and ^ is a local position vector on body i. On substitution of (3) in (2) yields Mq\

mRR,i 0

0

Qe.

"88 i

(4)

and augmenting the constraints C(q,t) = 0 and the constraint forces 2c -~CTqX to (4) yields q=v Mv + C^=Qe.

(5)

C{q,t) = 0 This is regarded by Ascher [6] as an index-3 DAE, since three differentiations (two differentiations of C[q,t) = 0 and the replacement of jl = A) are required to allow the DAE (5) to be written as the ODE q=v T

Mv + C qii = Qe CT,v =

(6) -[Ctq)iq-2Cq,q-Ctt

The DAE (5) may be computed directly using DASPK, which solves a semi-explicit system of the form f(t,y,y)

= 0 [1], where y = [q,v,?J) . The ODE (6) may be

computed by LSODAR, which solves the explicit system

y = f(t,y),

where

y = {q,v,nf • The results of the following section concern the use of LSODAR to compute (6), and preliminary investigations are made using DASPK on (5), with the option of allowing DASPK to solve the initialization problem: given the differential variables Y_d = (qr, v) , calculate their derivatives Y l d = (^, v)

and the algebraic variables

Y_a = A , the Lagrange multipliers [1]. 2

Implementation

Both sequential and parallel computation of the system equations was performed using LSODAR, however, due to the inherently sequential nature of numerical integration, little parallelism at the coarse-grain level in between times steps, was exploited: although nonblocking MPI_Send() and blocking MPI_Recv() function calls were employed through MPI to allow for potential concurrency between the times steps of integration. DASPK does in fact allow for the use of multiple processes in the integration process [7], but this was not employed here due to convergence failures in single-process computation using DASPK.

624

It is of particular interest to identify, through the use of profiling, which routines perform at least 75% of the computation. The linear algebra package LAPACK [1], in particular the routine dgesv_(), used in the Gaussian elimination of the system of equations, consumed most of the execution time. However, the user provided routines used for the construction of the system of equations, were each allocated a separate processor in a parallel master-slave computational approach as shown in Figure [1]. Slave loop

Master loop

Construct the following system for each time-step using

Process 0

jc = [^,qr,/i] and t. Jc = [?,^,ju] and t.

M

C,

MPI Recv

Qe

iQd.

Construct

M

C,

Construct MPI Recv

Construct MPI Recv

Construct

Construct MPI Recv

Qe Qd

Process 2

Process 3

Qe Construct

MPI Recv

Process 1

M

Process 4

Qd

Figure 1. MPI master/slave computational flow structure.

Figure [2] shows that the parallel implementation of the code takes longer to compute the system of equations than the sequential implementation, for both the dual Intel 1GHz Pentium III and the SGI 38000, 500 MHz R14000 Origin machines [8]: this is due to a high communication-overhead/computation ratio.

625

3

Conclusions and Future Research

The numerical integration of a classical n -bodied pendulum was performed using LSODAR and DASPK on a distributed network of two different architecture types. The parallel implementation suffers from a high communication-overhead/computation ratio, however, for a greater work-load in between integration steps, the parallel run-times are expected to be faster. For example: in a planetary system containing n bodies, where each of the bodies has a gravitational effect on the others, n(n-\)/2 gravitational force computations are required. Although there is a quadratic relationship between the number of bodies and the number of force computations, the increase in communication overhead is expected to be linear; preliminary investigations indicate that n would need to be greater than 1500 bodies. SGI 38000 Origin & Intel Pentium III Run Times

i

45000• 40000. 35000•„• 30000£ 25000-

20000-

X

1500010000-

^

jr

P3-S -<

—*

P3 - P

Origin - P

. . . - - " • > ^

,.........^j

5000.

//

/ y/f

"

——J»-^-3fc7..~5£-

050

100

150

200

250

300

number of bodlas (n)

Figure 2. Sequential (S)/Parallel (P) implementations for n < 300 bodies.

4

Acknowledgements

This work was supported in part by the Minnesota Supercomputing Institute: http://www.msi.umn.edu/, under the supervision of Prof. D. J. Lilja. References 1. 2.

3. 4. 5. 6. 7. 8.

http://www.netlib.org/ - DASPK2.0, LAPACK. Petzold, L. R. and Hindmarsh, A. C. "Livermore Solver of Ordinary Differential Equations with Automatic Method Switching and Rootfinding", Computing and Mathematics Research Division, 1-316 Lawrence Livermore National Laboratory, Livermore CA 94550, (1987). Gropp, W., Luck, E. and Skejullum, A., Using MPI: Portable Parallel Programming, with the Message-Passing Interface, The MIT press, Cambridge, Mass., (1994). Kibble, T. W. B , Classical Mechanics, 3rd Ed., Longman Inc., New York, (1985). Shabana, A. A., Computational Dynamics, John Wiley & Sons, New York, (1994). Ascher, U. M. and Petzold, L. R., Computer Methods for Ordinary Difrerential Equations and Differential-Algebraic Equations, SIAM, Philadelphia, (1998). http://www.engineering.ucsb.edu/~cse - DASPK3.0 http://www.msi.umn.edu/

SOME COMPUTATION ASPECTS IN MODEL-ORDER REDUCTION OF FLEXIBLE STRUCTURES ROBERD SARAGIH Department of Mathematics, Institut Teknologi Bandung, Jin. Ganesha No. 10 Bandung, 40132, Telp. 062-22-2502545, Fax. 062-22-2506450, Email: [email protected] Model reduction is part of the dynamic analysis of flexible structures. Typically, a model with a large number of degrees of freedom, such as the one developed for the static analysis of structures, causes numerical difficulties in the dynamic analysis, to say nothing of the high computational cost. Additionally, if one takes into account that the complexity of a controller depends on the plant order then it is not difficult to see that a full-order controller for a high-order plant is hardly implementable. Thus the reduction of system order solves the problem assuming that the reduced model acquires essential properties of the full-order model. This paper is concern with the computation problem of reducing a high-order model of flexible structures to a low-order one without significant errors. Some methods will be discussed and computationally will be compared to the other methods such as modal truncation, balanced truncation and singular perturbation approach

1

Introduction

A major difficulty in control of flexible structure or any other large-scale system is, in the words of Bellman, the curse of dimensionality. A flexible structure is, by nature, a distributed-parameter system, and, hence, it has infinitely many degrees-of-freedom. Even approximate structural model by some discretizations approach are generally still too large for using in control design applications. Moreover, many controller design methods such as H^, or //-synthesis yields a controller with the order at least equal to the plant order. Such high-order controllers are designed to optimize performance objectives, but often can not be used in practical applications. As mentioned in [1] that a controller with a large number of degrees-of-freedom can cause numerical difficulties, uncertainties, and high computational cost. Thus it is desirable to have methods available for designing low-order controller that will guarantee closed-loop stability and performance. One approach to obtain the low-order controller is, firstly, the order of plant is reduced and then the low-order controller is designed for that reduced order plant. This paper is concern with the computation problem of reducing a high-order model of flexible structures to a low-order one without significant errors. Some methods will be discussed and computationally will be compared to the other methods such as modal truncation, balanced truncation and singular perturbation approach. Firstly, the method based on the modal truncation will be presented. The feature of the modal truncation is conceptually simple and computationally cheap. In

626

627

the frequency domain terms where a stable transfer function matrix is in partial fraction form, the low-order system is obtained by throwing away the smallest value of maximum imaginary magnitude. Secondly, the balanced truncation is reviewed. The balanced truncation method tends to have smaller errors at high frequencies and larger errors at low frequencies. It is undesirable in some applications. In contrast, the singular perturbation approach display opposite character. The concept of the singular perturbation is that the stable variable of the system is divided into the slow and the fast modes. The low-order model is obtained by setting the velocity of the fast modes is equal to zeros. 2

Model of Structure

The structure has four-stories and is tower-like in shape. To simplify the modeling processes, some assumptions are made. Each story is modeled such that it has a single-degree-of-freedom in the transverse direction (the same direction as the excitation) and one more degree-of-freedom in the angle of torsion around the centroid of the story, which yields that the whole structure has 8 degrees-of-freedom. This structure has long and short spans symmetric with respect to the central axis, but has a deviation on the right and the long side on the third story due to an auxiliary mass, which thereby creates a coupling between the transverse and torsional vibration. The mass distribution of each story is homogeneous and the stiffness of four columns are supposed to be the same in the direction of the excitation at all stories. On this condition, the distance from the centroid to the spring of the right side of / th-story and the distance from the centroid to the spring of the left side of the i th-story are equal and all the cross terms have no value. On the third story, however, there is a lumped load at the right side. Therefore, the cross terms have certain values and the structure possesses transverse-torsional coupled vibration modes. By the using the Lagrange equation, we can obtain the dynamic model of the structure in the second order differential equation, i.e.

Mpm+cpm+KpX(t)+dpm+bpnt)=o For model analysis, the model of structure is transformed into the state space form and can be written as: x(t) - Ax{t) + Bu(t) y(t) = Cx(t)

628

3

Reduced-order Model

The model reduction has a quite long history, and many reduction techniques have been published. In this paper we use modal truncation, weighted balanced truncation, and weighted balanced singular perturbation (modified singular perturbation). 3.1 Modal Truncation The truncation of modal realizations is common in engineering practice because it is often the case that high-frequency modes may be neglected on physical grounds, or because the phenomena resulting in such modes only play a secondary role in determining the model's essential characteristics. Truncation method of model order reduction seeks to remove, or truncate unimportant states of model. If a state space of model has its A -matrix in Jordan canonical form, the state space truncation will amount to classical modal truncation. The features of the modal truncation are conceptually simple and computationally cheap. In this method, the model is transformed into modal coordinate. By using the modal coordinate, the contribution of the eigenvalues is identified and the low-order model is obtained by truncation the eigenvalues having the smallest contribution. 3.2 Weighted Balanced Realization Truncation Recent control literature shows that the balanced realization truncation techniques is widely used in the model reduction procedures. Some model reduction methods are also based on approximation via balanced realization and closely related Hankel norm optimal approximation procedure. More (1981) [4], first introduced the internally balanced realization and showed its application to the model reduction problem. The controllability and observability Grammians are used to define measure of controllability and observability in certain directions of the state space. The Grammians are not invariant under coordinate system in which the Grammians are equal and diagonal. The corresponding system representation is called balanced. A low-order model can be obtained from the balanced representation by deleting the least controllable and observable part. In this paper we adopt the model reduction developed by Enns [2]. The standard optimal model reduction problem is expressed as follows. Consider the n th-order model: P(s) = C(sl - A)~l B. Find an r th-order (

r
)

model:

Pr(s) = C (si -Ar)~xBr

which

minimizes

629

7=|W,.(s)[P(s)-P r (s)]W o (s)L, where W^s) and W0(s) are input and output weighting matrices, respectively. Let W, (s) = H{ (si - Fi )_1 G, + Df be an asymptotically stable frequency weighting as an input weight to the asymptotically stable system P(s). Define the associated system matrices by A

=

A

and fi , =

BDfl

Suppose that ju -

U

0 u2}\ 12 is the nonnegative definite solution of the following Lyapunov equation Anew{i + nATnew + BnewBTnew - 0. Define Y as the positive definite solution new

of the following Lyapunov equation YA + ATY + CTC-0. Consider a transformation to the realization (A,B,C,0) , which makes u new = Ynew = 2 = diag{ox,02,...,0„} with Gl > <7M . Now the frequency weighted approximation can be achieved by eliminating the rows and columns of the new realization (A, B, C,0) of P{s) corresponding to the smallest singular values, so that the low-order model is (Au,Bl,C1,0) where Au is the top left rxr of the new A. In case <7r ><Jr+i, the approximation guarantees the system to be stable. 3.3 Weighted Singular Perturbation Approach The weighted balanced truncation method tends to have smaller errors at high frequencies and larger errors at low frequencies. It is undesirable in some applications. In contrast, the weighted singular perturbation approach displays the opposite character. The concept of the weighted singular perturbation is that the stables variable of weighted balanced system is divided into the slow and the fast modes [3]. The low-order system are approximated by setting the velocity of the fast modes equal to zeros. In this paper we modified the singular perturbation by using the weighting function. Consider the weighted balanced realization (A, B, C,0) and the state variable is divided into the slow and the fast modes as the following form:

"*, (Ol _

"•U

x2 (0] ~ _A21 y(t) = [c,

5,1 ~*,(0l u(t) + B2l] A 22 J x2 (OJ ^12,'

c2plW

1*2 W !

630

If we define {Ar,Br,Cr,0) C r — Cj

4

x2

is as the fast mode, the low-order system is Br — Bx AUA22B2 where Ar = A, • AnA22A2x

^2^22-^2!.

Simulation Results

In this simulation we use all reduction method to reduce the order of model. In this case the order of original model is 16 and can be reduced to 4th-order for the low frequencies. The error bounds in the sense H„-norm are nearly same for all methods. From this simulation we verified that the singular perturbation has smaller errors at low frequencies than the balanced truncation and the modal truncation. The weakness of the weighted singular perturbation is computationally expensive and conceptually complex, in contrast, the modal truncation is theoretically simple and computationally cheap.

Freq uency Response 40

•

i

i

i

20

0 CQ

2, CO

"o -20 'E O) CO

5

-40

'—'—T~^~^^

/

-60

.Rn

I

10

I

I

20 30 Frequency [Hz]

I

40

50

Figure 1. Frequency response of the full order and reduced order model

631

References 1. Anderson, B. D. O., Controller Design: Moving from Theory to Practice, IEEE Control System 13 (1992) pp. 16-25. 2. Enns, D. F., Model Reduction with Balanced Realizations: An Error Bound and a Frequency Weighted Generalization, Proc. 23rd IEEE Conference on Decision and Control (1984) pp. 127-132. 3. Liu, Y. and Anderson, B. D. O., Singular Perturbation Approximation of Balanced System, International Journal of Control, Vol. 50, No. 4 (1989) pp. 1379-1405. 4. Moore, B. C , Principal Component Analysis in Linear Systems: Controllability, Observability, and Model Reduction, IEEE Transaction on Automatic Control, Vol. AC-26, No.l pp. 17-31.

MESHLESS ANALYSIS OF THE OBSTACLE PROBLEM FOR TIMOSHENKO BEAMS BASED ON A LOCKING-FREE FORMULATION J. R. XIAO Department

of Mechanical and Aeronautical Engineering, University of Limerick, jiarun.xiao @ul. ie

IrelandE-mail:

F. WANG, Q. H. CHENG Division of Computational Mechanics, IHPC, Singapore 118261 E-mail: [email protected], [email protected] A meshless method is developed, based on the meshless local Petrov-Galerkin (MLPG) approach and the local point interpolation method (LPIM), along with the locking-free formulation, for the obstacle problem for thick beams by means of variational inequalities and the corresponding linear complementary equation. The meshless method is based only on a number of randomly located nodes. No global background integration mesh is needed, no element matrix assembly is required and no special treatment is needed to impose the essential boundary conditions. An obstacle problem for thick beam is analysed by the proposed method and the numerical results are compared with analytical solutions.

1.

Introduction

In the present study, the solution of an obstacle problem for Timoshenko beam is investigated by means of variational inequalities and a local Petrov-Galerkin approximation along with the locking-free formulation [1]. The LPIM [2] is employed for constructing both trial and test functions, and the test function was deliberately selected to enable users to simplify the construction of the global stiffness matrix by eliminating the need for element matrix assembly [3]. Implementation details and numerical example are presented. 2.

Problem Formulations

In this paper, M and K represent the moment and curvature, Q and y represent the shear force and shear strain, EI is the bending stiffness, kGA is the shear rigidity, and w is the displacement. It was showed in [1] that the shear-locking phenomenon in the thin beam limit can be removed by simply changing the dependent variables in the governing equations, where the transverse displacement w and the transverse shear strain y are used for dependent variables, instead of using the total rotation § and the displacement w, as long as the y field is one order lower than that of w. The corresponding locking free governing equations can be written in terms of w and y as follows: EI(w'-y)"'-q

= 0,

EI(w'-y)"

+ kGAy = 0

(1)

The essential boundary and natural conditions are written using w and y as W = w on r„,w'-Y

= 0 on r^M=

M on rM;Q = Q on T e .

In a contact problem, the non-penetration condition leads to a boundary condition in the form of inequalities:

632

633

g(x) > 0,

g(x)F(x) = 0,

F(x) < 0

(2)

where F(x) is the normal contact force density on the interfaces, g(x) is the gap function along the contact interface Tc. Consider a thick beam unilaterally supported by a frictionless rigid body, with an initial gap 50(x) between the beam and the rigid body. The equilibrium equation is given by: Q,x = q + F,M,x-Q = 0 (3) In this case, the solution spaces are defined as: C(Q) ={Ve H\Q) I V = 0on rm g(x) = $'(x)-V>0on rc} (4a) G(£2) = {y a e H'(X2)| V ' - ^ O o n / ^ W e C(£2)} (4b) Introduce the following continuous forms:

a(w,y,V,ya)

= jaM(w,y)K(y,ya)dx,

Hy,ya) = jpQ(y)yadx

(5a)

(q,V) = jaqVdx

(5b)

The following variational inequality can be obtained:

(W, f> e C(i2)xG(Q) I a(w,y;V-

w,ya -y) + b(y,ya-y)>

V (V, Ya) e C(Q)xG(£2)

(q,V-w) (6)

The variational formulation (6) can be approximated by either the finite element technique or a meshless method. 3.

Local Point Interpolation Approximation

The LPIM [2] interpolates w(x) and slope 9 for the thin beam from the surrounding nodes of a point xQ using polynomials: 2n

w(x, xQ) = ^ p. (x)ai (xQ) = PT (x)a(xQ)

(7)

;=i

0(x,xQ) =

dw(x,XM n) T-

-

ax

In

= lpix(x)ai(xQ) i=i

T

= Pl(x)a(xQ)

(8)

where P (x) is a complete monomial basis of order 2n, and n is the number of nodes in the neighbourhood of xQ. a,(jce) is the coefficient. The y field is one order lower than that of w. In this case, no derivative of the variable is needed and the basis number is taken as the number (n) of nodes in the influence domain. n 7<X xQ) = X Pi O K (*Q) = PT (x)a(xQ) (9) The LPIM determines the coefficients a, by enforcing Eqs. (7-9) to be satisfied at the n nodes surrounding point xQ and writing the result in terms of ( w and 0 ) and y: w(x)=0T(x)we y(x)=&I(x)Y

=&lw

+ &Qd

(10) (11)

634

where vf = [w, , 0 , , w 2 , 0 2 , . . . , wn,On]

, f = fa ,f2 , . . , ? „ F • w,-, 0,-, and ?,. are

the nodal values of w, 0, and yat x = xt, respectively. The shape functions in Eqs. (10) and (11) possess the delta function property, and the essential boundary conditions can be easily imposed. 4.

MLPG and LPIM Discretisation

We define subdomains Qs with boundary rs, which is assumed to be the support of nodal test function v, centred at each nodal point xt. In the present study, the Petrov-Galerkin approximation procedure is presented. The variational formulation (8) can be rewritten in the following local weak form where the test functions (V-w), (ya - y) are represented by vw and Vyi

El\ (v"w-v')(w"-y')dx + kGA\ vyydx> f vjdx

(12)

The test functions are approximated by linear combinations of the nodal shape functions for nodal point x, obtained from the procedure in section 3: vwi(x) = \f/wi (x)CCi + If/Qi (x)fii

(no summation)

(13a)

Vyi(x) = y/y (x)^t

(no summation)

(13b)

where a,, /?, and £, are the fictitious nodal displacement, slope, and shear strain, respectively. The test functions are constructed, using Eqs. (10-11) based on three points i.e. two boundary points of the sub-domain of the node JC, and the node x, itself. Only the nodal shape function for nodal point xt is used (no summation). Substituting Eqs. (10-11, and 13) into the local weak form (12) leads to the following discrete equation:

Ky

> f.

d4)

Because the above relation (14) should hold for every local sub-domain 1 2 / , we can finally obtain the following matrix equation for the whole discrete system by collecting the equations obtained from each local sub-domain Qj'', without any element assembly:

Kwe>f

(15)

A numerical integration is needed to evaluate Eq. (14). Gauss quadrature is employed in each local sub-domain. For each Gauss quadrature point xQ, point interpolation is performed to obtain the integrand. Therefore, for a node x„ there are two local domains: test function domain Qs for V; * 0 (size rs), and interpolation domain £2t for xQ (size r,). These two domains are independent and defined as rs = a^, and r, = cc4b respectively, where a, and a; are coefficients and dt is the distance from the node i to its closest neighbouring node. It should be noted that it is sufficient to integrate in each local subdomain by using a conventional numerical integration scheme without any numerical difficulties. In this study, the variational inequality problem has been transformed into a linear complementary problem following a similar procedure as given in [3]. Then, the corresponding linear complementary equation is solved using mathematical programming solvers.

635

5.

Numerical examples

A cantilever beam gradually contacting with a rigid cylindrical supporting surface with a constant curvature l/R is analysed. This problem has been studied using thin beam theory in [3]. For a certain load the beam begins to come into contact with the cylindrical supporting surface with a contact region AD of length lc as the dashed line in Fig. 1. In the thin beam theory [5], there is no contact reaction force within the contact region AD. Instead, there is a concentrated contact force at the transition point D. This is not true in the thick beam limit [6]. Taking EI = 1, kGA = 1, L = 3, R = 5, and P = 0.1. The analysis is performed using 81 uniform nodes. The calculated contact region is 2.025, which agrees well with the exact value of 2.0348 with an error of 0.48%. Fig. 2 gives the calculated reaction force along the contact interface for an intermediate thick beam with kGA = 1 0 0 which shows good agreement with the analytical solution. Finally, the calculated contact regions under different values of kGA are given in Fig. 3 and compared with the analytical results. All the results in the graph show excellent agreement between the numerical and analytical results, and indicate that the proposed method gives high accuracy and no shear locking in thin beam limit. / - <

A

I

P > •

'.

B

I

c y Figure 1. Geometry of cantilever beam and cylindrical supporting surface.

6.

Conclusions

A meshless method based on the Meshless Local Petrov-Galerkin (MLPG) and Local Point Interpolation Method (LPIM) has been presented to solve the fourth-order boundary problems of thick beams involving unilateral contact conditions, based on a locking-free formulation. In this meshless method, polynomial interpolation functions with the delta function property were constructed by the LPIM technique. The problem of beams involving unilateral contact conditions is described by variational inequality. The corresponding linear complementary equation for this highly non-linear problem was derived by using the developed meshless method, and solved by mathematical programming. A contact problem for beams was examined to verify the presented approach. The present method is completely locking-free in the thin beam limit. References 1.

Cho J. Y., and Atluri S. N., Analysis of shear flexible beams, using the meshless local Petrov-Galerkin method, based on a locking-free formulation, Eng. Computet., 18 (2001) pp. 215-240.

636

Gu Y. T., and Liu G. R., A local point interpolation method for static and dynamic analysis of thin beams, Comput. Meth. Appl. Mech. Eng. 190 (2001) pp. 55155528. Xiao J. R., McCarthy M. A., and Liu G. R., Local form of variational inequality and meshless analysis of a beam involving unilateral contact conditions, submitted to Comput. Model. Eng. Sci. (2002). Timoshenko S. Strength of Materials, Part II: Advanced Theory and Problems, 3th-edtion, (Robert E. Krieger, New York, 1983). Hu H. C. Variational principles of elasticity and their applications, (Science Press, Beijing, 1981). (In Chinese)

4. 5.

1.2 — Exact solution o o

•

Numerical result

0.8

u (0

O 0.4 u

0

»«MMMIMf

0

0.5

i

1.5

2

2.5

X Figure 2. Contact force along the contact region (kGA = 100).

2.5 • Exact solution This study

O

TTTmn—rTTHTirr

1.E+00

1.E+02

1.E+04 1.E+06 kGA

1 r I I TII |

[ 1 Tl M i l

1.E+08

Figure 3. Contact regions with different values of kGA.

1 I I I I III 1

1.E+10

EFFICIENT PARALLEL ALGORITHM FOR LARGE-SCALE MOLECULAR DYNAMICS SIMULATION IN MICROSCALE THERMOPHYSICS BING WANG JIWU SHU WEIMIN ZHENG Department

of Computer Science and Technology, Tsinghua University, Beijing, China, wbing97@ mails, tsinghua. edu. en

100084

JINZHAO WANG Department of Engineering Mechanics, Tsinghua University, Beijing, China,

100084

Molecular dynamics (MD) simulation is an important research method in thermophysics. But it is difficult to implement the simulation with traditional serial algorithms because of a complex numerical calculation. In this paper, we propose algorithms based on a new force decomposition approach called Half Force-Block Decomposition (HFBD). The HFBD approach greatly reduces the memory usage and the communication cost, so it can be more easily to simulate a large-scale particle system. Furthermore, we propose two new strategies to maintain load balance, which is the main problem when parallel algorithms based on force decomposition apply to short-range MD simulation. The first technique which we called Random Redistribution strategy (RRD), randomly permuting the particle ordering when load is imbalanced and the other approach, called Optimal Redistribution strategy (ORD), makes a simple load-balance calculation based on the computing time of all processors, and achieves the optimal particle ordering. The parallel algorithm based on the above approaches was tested on a system of 4,000,000 particles and implemented on an SMP-cluster and achieved an efficiency of 67.2% on 120 processors. The numerical results show that the proposed parallel algorithm can simulate a thermophysical system with more particles than before efficiently.

1

Introduction

Molecular Dynamics (MD) is a numerical simulation method to study the dynamic behavior of multi-particle systems, which is widely used in microscale thermophysics. But the MD simulation always deals with a great complexity of computation due to numerous particles and simulating time steps, so that it can not be successfully solved with serial algorithms. The availability of high performance computing resources provides a new way to solve the multi-particle molecular dynamics simulation. Computational scientists has developed three types of parallel algorithms, namely, atom decomposition algorithm(AD), force decomposition algorithm(FD) and spatial decomposition algorithm(SD). In AD algorithm, particles are randomly distributed among processors irrespective of their spatial positions [1]. In FD algorithm, particle-pairs are evenly assigned to each processor [2, 3]. In SD algorithm, the simulating domain is divided into some sub-domains and each sub-domain is assigned to a processor [4]. Of the three algorithms, the AD algorithm has a very bad scalability and can not be applied in more than 10 processors. On the other hand, the SD algorithm always suffers a load imbalance problem. So the FD algorithm is the mostly used one in microscale thermophysics. Taylor in paper [3] proposed an efficient force decomposition algorithm. In this paper, a new force decomposition technique is proposed. This technique, which we call HFBD (Half-Block Force Decomposition), offers a new decomposition strategy of force matrix. Our new algorithm and Taylor's algorithm are both implemented on a cluster system with 144 processors, and the numerical results show that the former

637

638

can give a more efficient solution to the parallel MD simulation. And also, we propose two strategies to maintain load balance, namely, the Random Redistribution strategy (RRD) and the Optimal Redistribution strategy (ORD). When used to simulate a system of 4,000,000 particles, the new decomposition method, together with the new load balance strategy, achieves an efficiency of 67.2%. The rest of this paper is organized as follows. Section 2 and section 3 describe HFBD method and load balance strategies respectively. The benchmark and numerical results are given in section 4. 2

New parallel algorithm

2.1

force decomposition

Atam

| i 2 j <|« t r a | 9 ra u n\n it u ii\

list*

[

I I

K

l

.1 l | « ,

>', T < | 9 10 I I £ | »

14 l.< iZl

1

M i ^ t •£&•••'

Figure 1 Taylor's Algorithm

h

2

3

1 4

5

6

7

8

9

10

Figure 2 HFBD algorithm

A matrix called force matrix is often used to describe the FD algorithm as showed in Figure (1), which gives a picture of Taylor's algorithm. The element (i,j) in the force matrix stands for the force of particle j on the particle i, . In Taylor method, the force matrix is divided into P blocks, so that each processor can be assigned to a block of the force matrix. The position vector x is divided into VP sub-vectors and each has N / V P particles. We use p to denote a processor corresponding to the sub-block (i,j), which calculates all of the forces between particles from sub-vector i and sub-vector j . Due to the all-known NEWTON third law, f- and f ^ are equal in magnitude and opposite in directions. That is,

fj,=-f,j

(1)

So the tasks of Ptj and P-{ are in fact repeating. Taylor distributed the task between Py and PJI . For example, with N - 16; P = 16 (N is the number of particles and P is the number of processors) as illustrated in Figure (1), Pn is responsible to force calculation between particles

639 (1,2,3,4) and particles (5,6), and on the other hand, P21 is responsible to force calculation between particles (1,2,3,4) and particles (7,8). When both Pl2 and P2X complete the calculation, the two processors exchange the force results and so each processor has all of the forces between particles (1,2,3,4) and particles (5,6,7,8). Taylor's algorithm contains four types of communication: (1) the communication between/>. and/>7 as described above with a cost of Wll4P ; (2) the gathering of forces on sub-vector i among processors in the ith row, with a cost of Nl-fp; (3) the scatter of position information of sub-vector i to all of the processors in ith row, with a cost of N /y[P ; (4) the scatter of position information of sub-vector j to all of the processors in jth column, with a cost of N I^P • So the total communication cost of Taylor's algorithm is 9Nll4P (2) 2.2 HFBD force decomposition algorithm This section presents an algorithm based on a new force matrix decomposition technique, which is called Half Force-Block Decomposition algorithm. From equation (1), we can draw a conclusion that the force matrix is a skew-symmetric one as described with equation below:

FT=-F

(3)

The forces in the upper (or lower) part of the force matrix can be obtained easily once the forces in the lower (or upper) part are available. So only the lower (or upper) part of the force matrix is necessarily calculated. Our decomposition technique only deals with the lower force matrix, described in Figure (2). P, is responsible for block (1,1), P2 and P$ are responsible for block (2,1) and block (2,2) respectively. And the block (i,l) to block (i,i) are assigned from />(j(f_1)/2+1) to P,!U+l)/2) • The processor responsible for block (i,j) is Piia-n/2+j)'

wm

c h calculates forces between particles in sub-vector i and those in

sub-vector j . Suppose the lower part of 1 6 x 1 6 force matrix has been divided into 10 blocks and each is assigned to one of the 10 processors. So the block (4,1), (4,2), (4,3) and (4,4) are assigned to P 7 , Ps, P9 and Pm respectively. P5 not only is responsible for the force calculation of particles (5,6,7,8) on particles (9,10,11,12), but also calculates the forces of particles (9.10.11.12) on particles (5,6,7,8). Compared to Taylor's algorithm, the communication cost of the HFBD has been reduced. Only two kinds of communication are involved in the new algorithm: (1) the gathering of forces on sub-vector i among processors in the ith row, with a cost of ; (2)the scatter of position information of sub-vector i to all of the processors in ith row, with a cost of NI . The first and fourth communication types in Taylor's algorithm are not necessary in HFBD. So the total communication cost is only

2N/4F, where P is the number of processors in Taylor's algorithm when force matrix is divided into the same size with the HFBD. We have, VF = (V8P + l-l)/2

(4)

For example, when P in Figure (2) is 10, the P in Figure (1) is 16. So the total communication cost of HFBD is

640

2A^/VP7 = 4iV/(V8P + l - l ) = V2yV/VP,

(5)

which is less than one third of the cost of Taylor's algorithm. The HFBD reduces the communication cost of traditional FD algorithms so that is expected to offer a higher efficiency and a better scalability. The comparison result between HFBD and Taylor's algorithm is given in section 4. 3

Load balance strategy

Murty in paper [5] proposed static load balance (LB) strategy, which assigned equal number of particle-pairs to each processor before the simulation began. This kind of strategy is useful in a long-range MD simulation, in which each particle-pair stands for a unit of force calculation, so because processors have equal number of particle-pairs, they also have equal amount of calculation task. But when dealing with short-range MD simulation, in which only a pair of particles close enough to each other do have interaction. Even processors with equal number of particle-pairs always have different calculation tasks. Generally speaking, Murty's load balance strategy can not be successfully applied to short-range MD simulation. We present two dynamic strategies to maintain the run-time load balance. The first one is Random Redistribution strategy, which randomly permutes the particle ordering when load imbalance occurs. So a force matrix with uniformly sparse is expected. But this strategy has an obvious shortcoming that is the random redistribution only leads to a random result. Often we get the satisfactory result, but in some cases, we do get a bad result. So we present the second strategy to solve this problem, which we call Optimal Redistribution strategy. This strategy permutes the particle ordering too when load imbalance occurs, but the permutation is based on the spatial distribution of particles, not random. The ORD strategy is described in two steps as below: Dividing the simulated domain into cubes as LinkCell methods does and then ordering the cubes in a spatial index; determinating which cubes each particle belongs to and then building an index of particles in which particles from the same cube have successive index numbers. Reordering particles as below:

l,M +l,2M +l—;2,M + 2,2M +2, — ;

;M,2M, — \

where,

M =VP 7 = (V8P + l - l ) / 2

(6)

After the above two steps, the particles in the same cube are distributed as evenly as possible among all processors. So the force matrix has the uniformly sparse and the load of each processor is equal to each other. In the next section, numerical results are illustrated to show the comparison of load balance strategies. 4

Numerical results

Firstly, as illustrated in Figure (3), we make a comparison of the speedups of HFBD and Taylor's algorithm. The newly proposed HFBD algorithm is more efficient than Taylor's in almost all of processor levels (with N = 108.000). Secondly, the comparison of load balance strategies is showed in Figure (4). Form

641 that figure, we can see that both RRD and ORD strategies are more useful than Murty's static strategy. Furthermore, the ORD has a little higher efficiency than the RRD, the same as our theoretical analysis. Thirdly, we use the HFBD method to simulate a multi-particle system with 4,000,000 particles and achieve an efficiency of 67.2% in 120 processors. The result shows that the HFBD algorithm has a high scalability. - Murty's static strategy - R R D strategy ORD strategy 800000

,§.

600000 -

| C 3

400000200000 -

simulating step

number of processors

Figure 3 Speedup of algorithms

5

Figure 4 Comparison of LB Strategies

Conclusions

In this paper, a new force decomposition algorithm called Half Force-Block Decomposition is presented to do molecular dynamics simulation for microscale thermophysics. This decomposition technique is divided the lower part of force matrix into blocks and reduces the communication cost of traditional FD algorithms. We propose two load balance strategies, namely, Random Redistribution strategy and Optimal Redistribution strategy, which maintain the load balance in run-time.

References [1]

W. Smith, A replicated data molecular dynamics strategy for the parallel Evald sum, Comp. Phys. Comm. 62 (3) (1992) 392-406

[2]

S. Plimpton, Fast parallel algorithms for short-range molecular dynamics, J. Comput. Phys. 117(1) (1995) 1-19.

[3]

V.E.Taylor, R.L. Stevens, K.E.Arnold, Parallel molecular dynamics: Communication requirements for massively parallel machines, in: Proceedings of the Fifth Symposium on the Frontiers of Massively Parallel Computation. 1994,156-163

[4]

D. Brown, J.H.R.Clarke, M.Okuda, T.Yamazaki, A domain decomposition strategy for molecular dynamics simulations on distributed memory machines, Comp. Phys. Comm. 74 (1993)67-80

[5]

Ravi Murty, Daniel Okunbor, Efficient parallel algorithms for molecular dynamics simulations, Parallel Computing. 25 (1999) 217-230

IMPROVING THE CELL MAPPING METHOD AND DETERMING DOMAINS OF ATTRACTION OF A NONLINEAR STRUCTURAL SYSTEM

Q. DING Department of Mechanics, Tianjin University, Tianjin, PR China, 300072 E-mail: [email protected] Z. S. LIU Computational Mechanics Division, Institute of High Performance Computing 1 Science Park Road, #01-01 The Capricorn, Singapore Science Park II, SINGAPORE, 117528 E-mail: liuzs @ ihpc.nus. edu.sg J. J. LI Department of Mechanics, Tongji University, Shanghai, P. R. China, 200092 E-mail: [email protected] A process defined as "mapping trajectory pursuit (MTP)" was introduced to cell mapping techniques based on spatial Poincare sections. Such an improvement brings about the exact determination of properties of all cells in analyzing sequences and further reduction in memory and computational time. For the purpose of prediction of stability boundary as a function of initial conditions (domains of attraction), an initial condition region was defined. The proposed method was then applied to analyze the aeroelastic behavior of an aeroelastic system with bilinear-type structural nonlinearity. Different types of periodic motions were determined through presentation of domains of attraction.

1

Introduction

Nonlinear dynamic systems can have several distinct steady-state solutions depending on the particular initial conditions. However, determination of domains of attraction using direct numerical integration is often extremely time-consuming. In 1980, the "simple cell mapping" (SCM) method was proposed by Hsu [1,2] as an advanced computation technique for global analysis. Based on this concept, "generalized cell mapping" (GCM) [3] and "interpolated cell mapping" (ICM) [4] have also been developed thereafter. But these methods still remain time- and memory-consuming when applied to high-order systems. A further development to reduce the amount of cells in calculation led to the introduction of "Poincare like simple cell mapping" (PLSCM) and "Poincare linear interpolated cell mapping" (PLICM) [5, 6], which combine the use of spatial Poincare sections with SCM and ICM, respectively. Due to either the mid-points of cells or interpolated points of cell vertices but not the actual mapped positions are used as the initial values of every iteration, the solutions obtained are unavoidably approximate. Thus the limitation to cell sizes confines application of the methods in high-order dimensional systems. Besides, for interpolated-type methods, the cells generated in a procedure may wrongly be determined as "sink cells" even if the trajectory only leave the domain of interest temporarily. In this paper, a process defined as "mapping trajectory pursuit (MTP)" was introduced to cell mapping techniques based on spatial Poincare sections. The initial condition region was also defined for the special purpose of prediction of the domains of attraction. Using the improved method, the complicated flutter of a binary aero-elastic system with bilinear-type structural nonlinearity in torsion was analyzed.

642

643

2

Improvement on cell mapping method

In SCM and ICM, an M-dimensional dynamical system is transformed into point-to-point ip-p) mapping by numerical integration over a time interval T such that x{j + \) = P(x{j)), P: RN^R" (1) which means that x(j), a point in state space, is mapped by P after a period of time x into a point x(j +1). Then cells in the state space are defined according to the procedures described in [1, 2, 3] on the basis of a series points obtained by (1). Instead on time sections, we obtain the p-p mapping (1) on a spatial Poincare section 2 , an (Af-7)-dimensional hyperplane in R" state space being transversal to trajectory of the system. Such a procedure results in P: 2->2, and reduces the dimension of analyzing space by keeping one coordinate to be constant. The cell mapping unraveling is then applied to the intersecting points obtained on z . In addition, we record xJ (j^l) as the representing point of cell z', R(z'), and use it to determine the state of the trajectory. We also define an initial condition region Q , a subspace of R" from one- to Af-dimension, which covers all initial conditions to be investigated. Q is different from the domain of interest S c 2 [5], both in size and/or order of dimension. There are four cases one may encounter at each step during constructing a processing 1. 2.

3.

The newly generated cell zJ is virgin. In this case, x ; is recorded as R(z')and the integration of the present sequence is continued. z' has appeared before in the present sequence. A new periodic motion is found only when the distance between the new obtained point x ; and the representing point of the cell z', \R(z') - x J | , is less then a given small value d, • z' has appeared in one of the previous sequences. The current processing sequences is believed being attracted to an attractor only when |/J(z J )-x J '|is less then a given

small value d2 (usually, d2 can be reasonably set much larger than dl). 4. A cell is mapped outside S . We continue the numerical integration until the mapped points either return into S again or be assured to be divergent. The process that the actual positions of mapped points on z are recorded to represent the cells and followed until the final determination (even during they leave S) is defined as "mapping trajectory pursuit (MTP)". The MTP results that the size of S can be much smaller because it is unnecessary to contain the whole steady-state orbits in R" state space, but only part of their intersecting points on z. Contrarily, the sizes of cells can be reasonably larger because the criterion applied in exact numerical integration procedures is used to determine whether a newly mapped point is the representing point of the cell. These two aspects lead extensive reduction of amount of the required cells in calculation. Consequently, the computer time is also reduced considerably. So the proposed approach appears more appropriate for global study of high-order systems.

644

3

Analysis of a nonlinear aeroelastic system

Consider a rigid wing of constant chord pivoted at its root in bending and torsion such that there is no stiffness coupling between the motions. The equations of motion of the system are derived using quasi-steady aerodynamics [7] in dimensionless form as Aij + (pVB + D)g + (pV2C + 1E)q = 0 (2) where q = (y,ey , 7 is bending angle, 9 the torsional angle, p the air density, and V the air (or M(6) the wing) speed. A, B, D, C and E are mass, aerodynamic damping, structural damping, aerodynamic stiffness and the structural stiffness matrices, respectively. The bilinear stiffness in torsional direction is considered, as shown in Figure 1. In the following analysis, we take 9 = 0 as X to cope with the stable equilibrium points and limit cycles as well. The one-sided intersections of Figure 1. Bi-linear stiffness in trajectory with X from negative 9 to positive 9 are torsional direction taken as the p-p maps (1). For simplicity, Q includes only initial conditions in 0 direction, and there are always -y(0) = 0, 7(0) = 0 • Letting k = K'JK„ =0.1, the domains of attraction are determined using the proposed CM method as shown in Figure 2, which show occurrences of different motions as functions of v - 9(0), V ~ 9(0) and e(0) - 9(0) • Motions are classified as: damped stable motion (to trivial equilibrium position), limit cycle oscillation (LCO), complicated periodic motion (with period >2), chaotic motion and divergent flutter. The result demonstrates that the small initial conditions, say |9(0)|<1.5 or |9(0)|<0.6> result damped motions for V<26.2 m/s, unsymmetrical LCOs over the velocity range 26.2i.5or |e(0)|>0.6> symmetrical LCOs occur when V>10m/s. However, the motion becomes divergent under larger disturbances for V > 18 m/s. Domains of attraction of the system were also determined using other cell mapping based methods. Without the MTP procedure, some points inside but near the border of Q could be wrongly determined due to the transient trajectory left S temporarily during simulations. The efficiency of the proposed CM methods was also illustrated by comparing the CPU operation time required. The proposed CM method was found 2 to 15 times faster than the direct numerical integration. 4

Conclusion

Because of the introduction of MTP technique to the spatial Poincare sections based cell mapping method, both the amount of cells and the computation time can be greatly reduced. The global dynamic properties of all cells in analysis sequences can exactly be determined. The definition of the initial condition analysis region leads the method especially appropriate to predict the stability boundary as a function of initial conditions.

645

The proposed CM method has been proven to be efficiency in revealing the global behaviors of high-order nonlinear dynamic systems.

. «"«>*- ' W W

I I 111 M I I.I I I I I I l i » I I I I I I 111 I I I I I I

H-H I I I T I + H I.I

I

I u-f.-H n n'hfrl i i i:+x i I I i l l 111 I J I 11 i.n nil.11.1,1.1.1.11"

» « . * « « » •"• »•+•• * W W 7 HI(K»11»»I1+|."^TO:

" » » » ! " • " » ' : ' TO77 ««iiMx«ii«i<--vggw.. " » " » » • « • •?TO^r

• X X « H » X X^M. ' W C W • » « » « | ' « » ».#*- -* VAAAiV X M X M X K X X X+-. • S A A W »»1(«HI<«»I<^':OTTO:V

iji n i l i

f

» « » » « « " »*+'• WAA.*•.+. - v j j j y

MI

1111«

»»).i««++-'OTvW •"""•"^-WyVV ,».»ii»ii»W:-TOTO7 ;*KX.*XX+<».«

-de •.!!"!! ! ! !

U l-H;t;.l,l I I * "

;W\W

'** M" x.x.+
: XXXXXX X X > l A " . W W

-15 .-*+**"«•<».-

*»»««"« »*+':• W A V .xxx*i«ii.xx>*. •;\AAXV: 4o;

45

"50-

'•° rf+XXXXxx: ." -f+XXXX.KX-

;ss y

55 y

111 §

x •*++++++.++«'.•*.*:* «•.« x x + + + + + + + + • Vii,x> x x ® x x x x x x « x x

:

+ ©•_+:. ^_=+ • * ' • - . * . , • '••:;. '+ + . + „ Q .... . + + + .

x . x ; x y x

-IS

0(0)

0(0) V=15 m/s

y=45 m/s

Figure 2. Domains of attraction: '+' — damped stable motion; 'x' — LCO, '*'— period 2 motion; ' • ' periodic motion with period greater than 2; 'V' — chaotic motion; and ' ' (blank).— Divergent motion.

References Hsu C. S., A theory of cell-to-cell mapping dynamical system, Journal of Applied Mechanics 47 (1980) pp. 931-939. Hsu C. S. and Guttalu R.S., An unravelling algorithm for global analysis of dynamical systems: an application of cell-to-cell mappings, Journal of Applied Mechanics 47 (1980) pp. 940-948. Hsu C. S„ Cell-to-cell mapping: a method of global analysis for nonlinear system (Springer-Verlag, New York, 1987). Tongue B. H. and Gu K., Interpolated cell mapping of dynamical systems, Journal of Applied Mechanics 55 (1988) pp. 461^166. Levitas J., Weller T. and Singer J., Poincare-like simple cell mapping for non-linear dynamical system, Journal of Sound and Vibration 176 (1994) pp. 641-662. 6. Levitas J. and Weller T., Poincare linear interpolated cell mapping: method for global analysis of oscillating system, /. of Applied Mechanics 62 (1995) pp. 489^195. 7. Hancock G. J., Wright J. R. and Simpson S., On the teaching of the principles of wing flexure-torsion flutter, Aeronautical Journal 89 (1985) pp. 285-305.

HIGH RATE DYNAMIC RESPONSE OF STRUCTURE USING SPH METHOD Z. S. LIU Computational Mechanics Division, Institute of High Performance Computing, 1 Science Park Road, #01-01 The Capricorn, Singapore Science Park 11, SINGAPORE, 117528 E-mail: [email protected] S. SWADDIWUDHIPONG AND C. G. KOH Department of Civil Engineering, The National University of Singapore,10 Kent Ridge Crescent SINGAPORE, 119260 E-mails: [email protected]; [email protected] The dynamic responses of structures under high rate loading using Smooth Particle Hydrodynamics (SPH) approach are studied. The SPH equations governing the elastic and elasto-plastic large deformation dynamic response of solid structure are derived. The proposed additional stress points are introduced in the formulation to mitigate the tensile instability inherent in the SPH approach. The incremental rate approach is introduced and the solution algorithm is developed. Examples on high velocity normal impact of the solids are presented and results from the proposed SPH approach compared with finite element solutions illustrating that the high rate dynamic response such as high velocity impact problem can be effectively solved by proposed SPH approach.

1

Introduction

In many engineering problems, the transient response of solid involving high rate deformation and loading is often encountered. If the rate of onset of the load is high compared to the time needed to reach the steady state, wave propagation phenomena have to be considered. The dynamic response due to high velocity impact is a special case, in which inertial effects must be considered in the governing equations and the stress wave propagation plays an important role in the analysis of this class of problems. For this reason, the high rate dynamic response becomes quite complex. With the development of high performance computing, the most popular and cost-effective approach for solving high rate (such as high velocity impact) problem is the discretization method, such as finite element or mesh-less method. In the past few decades, the finite element method has been developed to simulate the dynamic response of structures subjected to high rate loading (e.g. high velocity impact) and the method has been widely used. However, one of the main drawbacks of the mesh-based method like FEM to treat high velocity impact is the need to remesh for severe element distortions especially when the solid continuum undergo high rate large deformation. Unfortunately the remeshing procedure introduces projection error and reduces the accuracy of the numerical solutions. In order to

646

647

remove this inaccurate remeshing procedure, a meshless method (or particle method) such as Smooth Particle Hydrodynamics (SPH) has been developed to solve the large deformation and high rate dynamic problems in solid mechanics. SPH is a meshless Lagrangian method that offers considerable promise as a numerical tool for modelling problems involving large deformations and large distortions whereby the motion of a discrete number of particles of a solid is followed in time. SPH was first introduced and developed for treating astrophysics problems, and was applied successfully to high velocity impact problems [4]. Since then, credibility of the SPH method for modelling solid media has been verified with numerous experimental impact results. As SPH uses a Lagrangian formulation for the equations of motion, it does not involve a distortion limiting grid and is therefore very attractive for high velocity impact simulation. In this paper, the dynamic responses of structures under high rate loading using SPH approach are presented. The proposed additional stress points are introduced in the formulation to mitigate the tensile instability inherent in the SPH approach. The incremental rate approach is introduced and the solution algorithm is developed and implemented. 2

Governing equations of SPH method for solid mechanics

The foundation of SPH is the interpolation theory. In solid mechanics, the SPH form of the conservation equations can be expressed as [4,5]. dt dt dE,

(i

r

j)

dx*

Pj

, ' p] a«

p v

p) dx*

, „ ;

dW,

dt p) Y a*" where i and j represent the particle number; m; and p ; are the mass and density of particle j ; of , v° are the stress tensor and velocity of the particle j , respectively. £ is the energy of particle i and Wtj is a kernel function which satisfies some special properties. Although a few possible kernel functions exist, the most widely used cubic B-spline kernel is adopted in this study.

648

3

Constitutive equation

In classical plasticity, hydrostatic pressure p is usually calculated using the linear Hooke's law when p is small. For severe hydrostatic pressure, the pressure should be evaluated with Equation of State (EOS) having the functional p = p(p, E). The EOS employed in this study is the well-known Mie-Gruneisen EOS for solids [2]. In the elastic regime, the deviatoric stress rate can be determined through Hooke's law, S = 2Gz'. For finite rotation, the deviatoric stress should be determined through the incremental plasticity theory. To account for the large rotation effect, the elastic deviatoric stress rate, Safl, is computed using the Jaumann rate definition: S"B =2G(e c , p -(l/3)5 t , p e Y r ) + 5 aY ri PY + 5 Y,, n aT

(2)

where eaP and £2°* are the strain and rotation rate tensors, respectively. The SPH forms for evaluating the strain and rotation rates are E* = ( l / 2 ) £ ( m , /p.)[(v; -v:W^

+(v] -v*)WijM]

CX* = ( l / 2 ) £ ( m . /p 7 )[(v; -v, a )W^ -(vj -vf )WiJa]

(3)

j

As large deformation elasto-plastic transient dynamic analysis is pathdependent, the incremental procedure is adopted in the present study. 3.1 Elastic Case The incremental stresses can be expressed in terms of incremental strains. In the hydro-dynamics analysis, the incremental stress can be defined in terms of hydrostatic pressure and traceless symmetric deviatoric stress: '^af^Sf-P^ and p = -(l/3) ,+ >7 (4) which '*" of is the increment stress of particle i from time t to t + At. The incremental deviatoric stress, '^Sf, is computed using the Jaumann rate definition as stated in equation (2). 3.2 Elasto-plastic Case For elasto-plastic case, the incremental stress can be expressed as functions of incremental strain in an average sense [1]. If ,+^Ca(iY5 are elastic-plastic stiffness coefficients during the time interval (t, t + At), the constitutive equation can be given by /+4/

a|J _I+4/(-•

1+4/pv8

(C\

649

4

Tension instability treatment

Standard SPH methods have been plagued by a serious problem referred to as tension instability.In 1-D problems, tension instability will cause the simple elastic bar to break apart in tension. For 2-D and 3-D problems, tension instability will produce a clustering of particles which may lead to a premature fracture. Dyka et al. [3] proposed a stress-point method to treat this problem. The basic principle includes calculating the values of stress at points other than the SPH centroids in order to remove the instability. This approach completely eliminates tension instability for a 1-D bar producing accurate solutions for several SPH formulations. The concept is adopted and expanded to cover 2-D and 3-D problems. The stress-point method for SPH is analogous to full integration in FEM. For standard SPH, the stress components are calculated at the centroid of the SPH particle analogous to a reduced integration form of FEM. In this approach, stress, internal energy, and density are calculated and tracked at the stress points while the displacement, velocity and acceleration are all calculated and monitored at the centroid of each particle [2]. As the stress tensors in particle are included in the linear momentum equation of the same particle, the tension instability is eliminated. 5

Numerical examples

In order to validate the performance of the proposed SPH approach, two examples are presented here. The first one is to analyze the dynamic response of two aluminum bars at high speed impact as shown in Figure 1. A FE simulation with commercial software ABAQUS is also conducted. Figure 2 shows the stress profiles along two impact Aluminium bars at time 0.35 micro-sec. Comparison of the results shows that the effective stress by SPH agree well with FE solutions. In order to overcome the numerical instability caused by shock wave, artificial viscosity is adopted in SPH motion equation. The second example is a square aluminum plate subjected to a high velocity impact by a steel cylinder as shown in Figure 3. The deformed particle position contour at time 2.0 micro-sec. is shown in Figure 4.

Figure 1. Problem description and SPH model

Figurer 2. Stress profiles along two impact Alu. bars

650

Figure 3. SHP model for impact of plate and cylinder

6

Figurer 4. Deformed particle position contour

Concluding remarks

In this paper, the dynamic responses of structures under high rate loading using SPH approach are presented. The SPH equations, which govern the elastic large deformation dynamic response of solid structure, are derived. Two examples considering high velocity impact of structures are presented. The results illustrate that the high velocity impact problem can be effectively solved by the proposed approach and the SPH is a reliable method and could be used to deal with high rate dynamic response. References 1. Chen, J. K., Beraun, J. E. and Jih, C. J., A corrective smoothed particle method for elastoplastic dynamics, Computational Mechanics, 27 (2001) pp. 177-187. 2. Drumheller, D. S., Introduction to Wave Propagation in Nonlinear Fluids and Solids, (Cambridge University Press, Cambridge, 1998). 3. Dyka, C. T., Randies, P. W. and Ingel, R. P., Stress points for tension instability in SPH. Int. J. Numer. Meth. Engrg. 40 (1997) pp. 23252341. 4. Libersky, L. D., Petschek, A. G., Carney, T. C , Hipp, J. R. and Allahdadi, F. A., High strain Lagrangian hydrodynamics, a three dimension SPH code for dynamic material response. J. Comp. Phys., 109 (1993) pp. 67-75. 5. Liu, Z. S., Swaddiwudhipong, and Koh, C. G., Stress wave propagation in 1-D and 2-D media using Smooth Particle Hydrodynamics method. Structural Engineering and Mechanics. 14 (2002) pp455-472.

T H E G E N E R A L I Z E D D I F F E R E N T I A L QUADRATURE RULE

T. Y. WU AND Y. Y. WANG Computational Mechanics Division, Institute of High Performance Computing, #01-01 The Capricorn, Singapore Science Park II, Singapore, 117528 E-mail: [email protected], [email protected] G. R. LIU Department of Mechanical Engineering, National University of Singapore, 10 Kent Ridge Crescent, Singapore, 119260. E-mail: [email protected] The basic idea of the differential quadrature (DQ) method is to approximate a derivative of a function at a point as a weighted linear sum of the function values at all the discrete points. The present authors have advanced the generalized differential quadrature rule (GDQR) expressing the DQ as a weighted linear sum of both the function values at all the discrete points and the function derivatives at points wherever necessary. The conventional DQ method is usually applied to solve differential equations which are constrained by one condition at one point. The GDQR aims at solving high order differential equations, which may have more than one boundary/initial condition at any discrete point. The GDQR enforces the same number of independent variables as the number of constraint conditions at any discrete point. Now that the GDQR becomes the DQM when the number of conditions at any discrete point equals one, we can conclude naturally that the GDQR is a generalization of the DQM. The authors have extended the DQ technique to some cases where the DQM has never been used. The conventional 5-point technique proposed by Bert and associates has been completely eliminated. The GDQR is a general method to solve differential equations in a global form, as opposed to the finite difference (FD) method in a local form. This paper reviews its recent various applications and points out some restrictions and further potential applications.

1

Differential quadrature method (DQM)

Bellman and Casti [1] proposed the DQM in 1971 to solve nonlinear partial differential equations. The DQM has since been applied to diverse areas and gradually demonstrated itself as a numerical method to solve initial and boundary value problems [2,3]- The DQM approximates a derivative of a function y/(x) at a discrete point x, (i=l,2,..., AO as:

^ > ax

=

X 4 V ; (* = l,2,...,iV;r>D . =1

where y/py^jc,) and A]p are the weighting coefficients for a rth-order derivative at point Xj, N is the total number of discrete sampling points in the domain. The review paper [2,3] presented both the state of the art of the DQM and a survey of its application fields. It should be emphasized that the conventional DQM could only cope with differential equations that have one condition at one point, since the DQM only chooses one independent variable (function value) at one point. As reviewed [2], the DQM is also called the generalized collocation method. In order to apply the DQM to high-order differential equations with multiple conditions at a point, a 5-point technique [3] was proposed and used to structural beams, plates, and shells in the last decade. The 5point technique forced an adjacent domain point to act as a boundary point. Therefore, one condition still corresponded to one point.

651

652

An apparent and natural choice of trial functions for the DQM is the Lagrange interpolation polynomials, while their general weighting coefficients have been said to be first found by Shu and richards [11] and named as Generalized differential quadrature (GDQ). As pointed out [3], the Lagrange interpolation polynomials are only one choice of trial functions. Then Shu called the coefficients as "Shu method" in his monograph [10]. In fact, Michelsen and Villadsen [9] derived these coefficients in the name of collocation method in as early as 1972. It is about the same time when the DQM was proposed. Shu and richards [11] found an alternative way to obtain only the diagonal terms in the differentiation matrices.

2

Generalized differential quadrature rule (GDQR)

The GDQR aims at solving high order differential equations without using the 5-point technique. As opposed to the DQM, the GDQR considers a more general situation, where the field function y/(x) is governed by a differential equation and constrained by a set of given conditions at any point. The solution domain is divided into points x, (/=1,2,..., AO that include all the points with given conditions. If the function !//(*) has to satisfy n, conditions (equations) at JC„ the GDQR expresses its differential quadrature as: [12,15-18]

"*

j=\ 1=0

k=\

where EJp (which are a convenient expression of E%') are the GDQR's weighting nt is the number of the total independent variables Gk:

{GvG2,...,Gk...,GM h k > \ W,)....,W'-1)>...,<. vtfUvtf"- 0 } where V/ = V (*/) (£=0,1,2,..., n,-\) are its Ath order derivatives. It is shown that the GDQR forces the same number of independent variables V' (•*/) (k=0,l,2,..., n—l) as that of the equations at a point and that its independent variables are chosen as the function value and its derivatives of possible lowest order wherever necessary. One of the most important parts of the DQ technique is the determination of the weighting coefficients. The authors have derived the GDQR's explicit coefficients using the Hermite interpolation functions for third-, fourth-, sixth-, and eighth-order boundaryvalue problems and initial-value differential equations of second to fourth orders [12-19]. It is apparent that the GDQR's application to high-dimensional problems are quite different from the DQM's same applications [18,22]. The notation for Hermite functions should be differentiated clearly. The Hermite orthogonal function has a domain [-°°, +°°]. The often-discussed Hermite interpolation functions only define the function values and their first derivatives at all the discrete points. The Hermite interpolation functions can be generalized to use function values and any corresponding lowest order derivatives at any discrete point. In interpolation theory,

653

the Hermite interpolation functions with various lowest order derivatives at any discrete point are also called the generalized Lagrange interpolation functions or Hermite-Fejer interpolation functions. In their differentiation forms, it is clear that the GDQR is a generalization of the DQM. The present authors not only generalize the DQ method itself but also obtain various explicit weighting coefficients.

3

GDQR's applications and discussions

The following applications show that the GDQR is a general method for solving high order differential equations. 1. Third-order boundary-value ordinary differential equations (ODEs): Blasius and Falkner-Skan equations. [5] 2. Fourth order boundary-value ODEs: beam, circular plate, and shells of revolution equations. [12,13,14,18,19,20,21] 3. Sixth-order boundary-value ODEs: Onsager equations in fluid mechanics and circular arch equations in solid mechanics. [5,6,16] 4. Eighth-order boundary-value ODEs: cylindrical barrel roof equations. [8,17] 5. Second to fourth order initial value ODEs: Duffing equations. [4,12,15] 6. Domain decomposition applications for structural beams, circular plates and circular arches. [6,13,14,19,21] 7. Partial differential equations: rectangular plate and beam vibration problems. [18,22] As compared with the 5-point technique, the GDQR presents a straightforward application of multiple conditions. The FEM and FDM are suitable for complex geometrical and discontinuous problems due to their locality, while the DQ methods have corresponding difficulties. In essence, this means that the DQ methods may primarily be a complementary approach, to be efficiently used for nonlinear problems with simple geometry and high smoothness, rather than being a real alternative for the FEM or FDM.

References 1. 2.

3. 4.

5.

Bellman R. and Casti J., Differential quadrature and long term integration. Journal of Mathematical Analysis and Applications 34 (1971) pp.235-238. Bellomo N., Nonlinear models and problems in applied sciences from differential quadrature to generalized collocation methods. Mathematical and Computer Modelling 26 (1997) pp.13-34. Bert C.W. and Malik M., Differential quadrature method in computational mechanics: a review. Applied Mechanics Review 49 (1996) pp. 1-27. Liu G. R. and Wu T. Y., Numerical solution for differential equations of Duffing-type non-linearity using the generalized differential quadrature rule. Journal of Sound and Vibration 237 (2000) pp.805-817. Liu G. R. and Wu T. Y., Application of generalized differential quadrature rule in Blasius and Onsager equations. International Journal for Numerical Methods in Engineering 52 (2001) pp. 1013-1027.

654

6.

7. 8.

9. 10. 11.

12. 13.

14.

15. 16.

17.

18.

19.

20.

21.

22.

Liu G. R. and Wu T. Y., In-plane vibration analyses of circular arches by the generalized differential quadrature rule. International Journal of Mechanical Sciences 43 (2001) pp.2597-2611. Liu G. R. and Wu T. Y., Multipoint boundary value problems by differential quadrature method. Mathematical and Computer Modelling 35 (2002) pp.215-227. Liu G. R. and Wu T. Y., Differential quadrature solutions of eighth-order boundaryvalue differential equations. Journal of Computational and Applied Mathematics 145 (2002) pp.223-235. Michelsen M. L. and Villadsen J., A convenient computational procedure for collocation constants. The Chemical Engineering Journal 4 (1972) pp.64-68. Shu C., Differential Quadrature and its Application in Engineering (London, Springer-Verlag, 2000). Shu C. and Richards B. E., Application of generalized differential quadrature to solve two-dimensional incompressible Navier-Stokes equation. International Journal for Numerical Methods in Fluids 15 (1992) pp.791-798. Wu T. Y. and Liu G. R., A differential quadrature as a numerical method to solve differential equations. Computational Mechanics 24 (1999) pp. 197-205. Wu T. Y. and Liu G. R., Axisymmetric bending solution of shells of revolution by the generalized differential quadrature rule. International Journal of Pressure Vessels and Piping 11 (2000) pp. 149-157. Wu T. Y. and Liu G. R., A generalized differential quadrature rule for analysis of thin cylindrical shells. In Computational Mechanics for the Next Millennium, Vol.1, {Proc. of fourth Asia-Pacific Conference on Computational Mechanics, December, Singapore, 1999), ed by C. M. Wang, K. H. Lee and K. K. Ang, (The Netherlands, Elsevier Science Ltd., 1999) pp.223-228. Wu T. Y. and Liu G. R., The generalized differential quadrature rule for initial-value differential equations. Journal of Sound and Vibration 233 (2000) pp.195-213. Wu T. Y. and Liu G. R., Application of generalized differential quadrature rule to sixth-order differential equations. Communications in Numerical Methods in Engineering 16 (2000) pp.777-784. Wu T. Y. and Liu G. R., Application of the generalized differential quadrature rule to eighth-order differential equations. Communications in Numerical Methods in Engineering 17 (2001) pp.355-364. Wu T. Y. and Liu G. R., The generalized differential quadrature rule for fourth-order differential equations. International Journal for Numerical Methods in Engineering 50 (2001) pp. 1907-1929. Wu T. Y. and Liu G. R., Vibration analysis of beams using the generalized differential quadrature rule and domain decomposition. Journal of Sound and Vibration 246 (2001) pp.461-481. Wu T. Y. and Liu G. R., Free vibration analysis of circular plates with variable thickness by the generalized differential quadrature rule. International Journal of Solids and Structures 38 (2001) pp.7967-7980. Wu T. Y., Wang Y. Y. and Liu G. R., Free vibration analysis of circular plates using generalized differential quadrature rule. Computer Methods in Applied Mechanics and Engineering 191 (2002) pp.5365-5380. Wu T. Y. and Liu G. R., Application of the generalized differential quadrature rule to initial-boundary-value problems. Journal of Sound and Vibration (in press)

RECOVERY BASED SUBMODELING FINITE ELEMENT ANALYSIS HAI GU AND ZHI ZONG Institute of High Performance Computing, 1 Science Park Road, #01-01, The Capricorn, Singapore Science Park II, Singapore 117528 E-mail: [email protected]. zonezki@ ihpc.a-star.edu.se Submodeling analysis is a technique to achieve efficiency when detailed analysis is required in local region of a large structure. Traditionally, results of the global finite element analysis, internal nodal forces or displacements, are directly applied as submodeling boundary conditions. In present paper, a new approach is developed, which uses the stresses obtained by superconvergent patch recovery procedure to determine the forces on submodel boundary. The proposed method is convenient to implement because recovered stresses are represented by polynomial expansions, it also has higher accuracy because recovered stresses are generally more accurate than raw finite element results.

1

Introduction

Submodeling finite element analysis is a technique intending to effectively obtain accurate information in local critical regions of a large structure, in which, the local region of interest is broken out as a submodel after the initial global analysis and analyzed separately using refined meshes with boundary conditions derived from the initial global results. Obviously, the accuracy of analysis depends on the quality of boundary conditions. Traditionally, the raw results of the initial global analysis, displacements or internal nodal forces, are directly applied. Several techniques are available in literature to enhance the accuracy of displacements boundary conditions [3], but few can be found for forces although using forces as boundary conditions is more likely to obtain accurate results. In present paper, recovered stresses, which are generally more accurate than raw finite element results, are employed to determine the boundary forces. By this recovery based submodeling procedure, both accuracy and applicability can be remarkably improved. Two-dimensional large deformation problem using bilinear element is mainly concerned, but the concept of the procedure can be extended to more general cases. 2 2.1

Recovery based submodeling finite element analysis Stress recovery along submodel boundary by Superconvergent Patch Recovery method

After initial global analysis, stress along submodel boundary is recovered. An effective recovery procedure discussed in [2] which is a modified version of Superconvergent Patch Recovery (SPR) procedure [1] is adopted here. Its outline is as follows. After finite element analysis, a patch is defined for each vertex node inside of the domain by the union of elements sharing the node. Over the patch, a continuous field represented by a polynomial expansion a* is assumed for each stress component aj as, <x/=Pay=[l

x

y

xy x1

y 2 ] [a) a)

655

a)

a)

a)

a«]

T

(1)

656

The unknown parameters a, are determined by solving a least square problem as F(a,)=minKaJ

(2)

f(i j )=| i k(x ffl ,yJ-(r;(^,yj] 2 = I i ^(^,yJ-P(^,y m )iJ 2

(3)

with

where ns is the number of integration points inside the patch, {xm,y„) denotes their coordinates in the deformed configuration; a){xm,ym) represents raw FEA result. Then, stresses at the assembly vertex node, central point of elements and the midpoint of element edges sharing the assembly node are computed by substituting their coordinates into Eq.(l). For some of those points, different recovered values may be computed from the patches overlapping at them. In this case, the final recovered value is determined by a weighted average scheme. That is

*;(0= ^>;>y*,J

(4)

where, o-j(x/n) is the final recovered value of stress component j at node in,
cr;M=SALto
(5)

in=l

where Nin(x) denotes the value of shape function related to the in-th node. It is worth noting that not only the nodes of the mesh but also central point of element and mid-point of element edges are recovered in previous stage, the number of nodes nn = 9. 2.2

Evaluate forces at boundary nodes using recovered stresses

The recovered stresses are used to compute nodal forces on submodel boundary. Precisely, the recovered, continuous stresses, 6P, are substituted into Cauchy equation, Eq.(6), to compute tractions f on submodel boundary. f = 6"n

(6)

where, n is the outward unit normal vector of the boundary. After that, equivalent nodal forces f on submodel boundary r are evaluated in a standard way of FEM formulations as follows. f = J r N r frrfr

(7)

where, t is thickness in deformed configuration and N the shape function matrix for displacements interpolation. Because recovered stresses are represented explicitly by polynomials, it is easy for the proposed procedure to compute forces at any nodes newly introduced by mesh

657

refinement which is the primary difficulty for traditional method. Better accuracy can also be expected by the new procedure since recovered stresses are generally more accurate than raw finite element results. 3

Numerical investigations

A classical model of hyperelasitic large deformation problem as shown in Fig. 1(a) is studied for numerical investigation. Mooney Rivlin hyperelastic material is adopted which is defined by the strain energy potential function U =Cm(ll-3)+Cm(i2-3) with material parameters clo = 0.l863A/pa and c01 = 0.00979Mpa . /, and l2 are the first and the second strain invariants respectively. Four cases are analyzed. Their precise definitions are shown in Table 1, for instance, case 1 is a plane strain problem, mesh Sub-1 is used in submodeling analysis and its accuracy is evaluated by comparing it with global analysis using mesh G-l. In plane stress cases, original thickness is 2mm. L

82.5 mm

j

n n o n o n o r w

(a): Model of the problem

(b): Original global mesh: driving mesh

D "C (c): Refined global mesh: G-l

D C (e): Refined global mesh: G-2

(d): Mesh used in submodeling anlysis: Sub-1

(f): Mesh used in submodeling anlysis: Sub-2

Figure 1. Model of example and meshes used in global and submodeling analysis

Global analysis is run with mesh shown in Fig. 1(b). This initial global mesh is not fine enough to capture details at adjacency of the hole, precisely region FGHDE (12,), where stress concentration occurs. Therefore submodeling techniques are applied. Submodeling analysis is run with two refined meshes Sub-1 (Fig. 1(d)) and Sub-2 (Fig. 1(f)) obtained by halving and quartering edges of elements of the initial mesh respectively. An error factor T) defined in Eq.(8) is used to indicate the accuracy of submodeling results by comparing them with appropriate global solutions, that is, submodeling solutions using Sub-1 and Sub-2 are respectively compared with global solutions of G-l (Fig. 1(c)) and G-2 (Fig. 1(e)).

658 Aie'ng

"n

l\ni°£g'Ak/ng

(8)

xlOO%

In Eq.(8), n is the region for which the error factor is computed; ne is the number of elements in the region and ng the number of integration points of each element; Ajt is the area of element before deformation; a"f4g denotes the Mises stress at integration points of reference solution. a'eJs indicates the difference in Mises stress between submodeling solution and correlative reference solution. Both the proposed method (RSM) and the traditional method using displacement boundary conditions (DM) are applied. Error factor r\ is computed for both £ls and n , . These results are listed in Table 1. From this table, first of all, it can be seen that both the two methods give satisfied solution with error less than 5%. Moreover the proposed RSM is obviously much better than DM in accuracy. Due to the effect of boundary conditions, the error for ils is greater than the error for a,. This is the reason for requiring that the submodel boundary should be far away enough from the area of interest. A tendency is observed from Table 1 that the superiority of RSM is more significant at area of interest (£i,) which is far from submodel boundary. In plane strain cases, error of RSM is half of that of DM for as, almost one third for ii,, while in plane stress cases error of RSM is about 38% of that of DM for n s , but less than one third for £2,. This indicates that the RSM is more capable to capture the information at area of most interest if the submodel boundary is located at proper distance from that area. Table 1: Case definition and error factors

Case 1: Plane strain G-1 Sub-1 DM RSM 1.64 0.67

va, Vcis

4

1.97

4.05

Case 2: Plane strain Sub-2 G-2 DM RSM 0.56 0.19 0.63

1.22

Case 3: Plane stress Sub-1 G-1 RSM DM 0.78 0.22 0.51

1.38

Case 4: Plane stress G-2 Sub-2 DM RSM 0.25 0.08 0.16

0.42

Conclusion

A new submodeling procedure is developed, by which the drawback of traditional procedure using forces boundary conditions is overcome and the result accuracy is remarkably improved. References 1. O.C. Zienkiewicz, J.Z. Zhu. The superconvergence patch recovery and a posteriori error estimates, part I: the recovery techniques, Int. J. Numer. Meth. Engng., 33(1992) pp. 1331-1364. 2. H. Gu, M. Kitamura. A modified recovery procedure to improve the accuracy of stress at central area of bilinear quadrilateral element, J. of The Society of Naval Architects of Japan, 188(2000) pp. 489-496. 3. N.G. Cormier, B.S. Smallwood, G.B. Sinclair, G. Meda. Aggressive submodelling of stress concentrations, Int. J. Numer. Meth. Engng. , 46(1999) pp. 889-909.

A HIERARCHICAL APPROACH TO SURFACE PARTITION OF POLYGONAL MESHES J. SHEN AND D. YOON Dept. of Computer & Information Science, University of Michigan, Dearborn, MI 48128, USA E-mail: [email protected] Given a surface polygonal mesh in three dimensions, an algorithm is proposed to find a partition of the mesh into k subregions on the basis of discontinuity of surface normal and curvature. The algorithm consists of three main steps in a hierarchical manner. First, an input polygonal mesh is decomposed w.r.t. discontinuity of surface normal. Secondly, flat regions are identified on the results of step 1. In the third step, the polygon mesh is further decomposed w.r.t. discontinuity of surface curvature. The resulting surface partition can be used in shape optimization or other surface manipulations based on geometric characteristics of the mesh. The execution time of the algorithm is linear, but pre-computation of some data structures takes 0(nlog n)» where n = m ax(N N ) > N

and N

are

the numbers of elements and nodes in the mesh, respectively. Numerical

experiments have been conducted to show the effectiveness of the algorithm.

1

Introduction

Surface partition of unstructured meshes is important to many problems in engineering and science. The partitioning problem may arise from the requirement of computation on multiprocessor architectures. If a surface mesh is involved in a computation, mapping such a surface onto a multiprocessor machine generally requires partitioning the surface mesh into a number of subregions and assigning these subregions to different processors. Existing algorithms include 1) simulated annealing [1] motivated by physics, 2) schemes based on geometry like straight coordinate bisection [2], bisection direction by principal axes of inertia [3], and stereoscopic projection [4], and 3) graph based schemes such as graph bisection methods [5], spectral partitioning methods [6] and min-max method [7]. In addition, the surface partition may come from the requirement of shape optimization or surface manipulations based on geometric characteristics of polygonal meshes. In contrast to the case of parallel computation, the objective of the decomposition herein is not to generate equally-sized subregions with minimized boundaries. Instead, the decomposition is controlled by the geometric characteristics such as discontinuity of surface normal and curvature. In this paper, we focus solely on this type of decomposition, and limit our attention on polygonal meshes with discontinuity of surface normal or curvature. As to a mesh without any discontinuity of surface normal or curvature, there is no need to break it down into several subregions from the perspective of shape optimization or surface manipulations. If there is a requirement from other perspectives, existing algorithms for parallel computation can be used to handle this special case. The main contribution of this paper is to propose a new algorithm for surface partition of polygon meshes w.r.t. geometric characteristics, i.e., the discontinuities of surface normal and curvature. The outline of the algorithm is introduced in Section 2, and numerical experiments are presented in Section 3.

659

660

2

Methods

Since the surface partition may be dependent upon several factors such as surface normal and curvature, in order to make things simple, we propose to divide the entire task into three main steps in a hierarchical manner as follows: Step 1: surface partition by G discontinuity Step 2: identification of flat regions Step 3: surface partition by discontinuity of curvatures Step 2 is conducted on the result of Step 1, while Step 3 is carried out on the result of Step 2. To find out G discontinuity, the angle formed by the surface normal of adjacent elements is used as an index that is compared with a predetermined threshold. Whenever the index is greater than the threshold, we consider that the edge between these two adjacent elements is a sharp edge or feature edge. In order to find each surface partition enclosed by this kind of sharp edges, a breath-first search is proposed to traverse over the polygonal mesh as follows: (1) set up element neighbor list for each surface element (2) calculate normal angle change between adjacent elements (3) perform a breath-first search (3.1) initiate from an arbitrary surface element (3.2)

(3.3)

propagate over the surface until a G discontinuity line is encountered, which is identified by a condition: normal angle change > a user-specified angular threshold. go back to (3.1) and repeat this breath-first search over unprocessed regions until all surface elements are covered by a partition.

In order to simplify things, we propose an idea of identifying flat regions as early as possible such that the task of surface partition by curvature in Step 3 could be reduced. To identify a flat region, the angle formed by the surface normal of adjacent elements is used. If this angle is smaller than a very small angular threshold, we consider that these two elements form a small portion of a flat region. Similar to the idea of finding sharp edges, another pass of breath-first search over the polygonal mesh is conducted as follows: (1) loop over each surface patch generated by Step 1 (1.1) perform a breath-first search (1.1.1) initiate from a surface element that has, w.r.t. each neighboring element, a normal angle change < a user-specified angular threshold for flat planes (1.1.2) propagate over the surface until a boundary line of a flat plane is encountered, which is identified by a condition: normal angle change > a user-specified angular threshold for flat planes. (1.1.3) go back to (1.1.1) and repeat this breath-first search over unprocessed regions until all surface elements are processed. (1.2) group the remain elements that do not belong to flat regions into one or more different surface patches by means of their connectivity.

661 With the surface partition produced by Steps 1 and 2, a third pass of partition on the basis of curvature discontinuity is conducted. This pass is extremely important to curved surfaces, especially when we want to separate features like fillets from others. Our basic strategy is to let the breath-first search find out one subregion with low curvature and group the remaining elements into one or more subregions. The search is controlled to start from an element with all its nodal curvatures smaller than the nodal curvature threshold. In the search propagation, if normal curvature w.r.t. an element edge is smaller than the nodal curvature threshold, the propagation continues. Otherwise, it terminates at that element edge. Overall, the following procedures are proposed to conduct surface partition by discontinuity of curvatures: (1) loop over each surface patch generated by Steps 1 and 2, which is not a flat region (1.1) calculate nodal curvatures (1.2) calculate average nodal curvature (1.3) perform breath-first searches (1.3.1) initiate only from an element with a curvature < average curvature (1.3.2) propagate over the surface until a termination condition is satisfied: curvature > curvature threshold. (1.3.3) If there are still some elements unprocessed, go back to (1.3.1) to initiate another breath-first search. (1.4) group the remaining elements, which do not belong to regions formed by breath-first searches, into one or more different surface patches by means of their connectivity. 3

Numerical Experiments

The algorithms introduced in this paper are implemented in VC++ and tested on a Pentium III HP PC. Table 1 shows the execution time and rate on different test meshes. Since the major part of the proposed algorithm is three passes of breath-first searches over all elements of a mesh, its time complexity is 0{n), where n = ma x(N , Nv) > Ne and Nv are the numbers of elements and vertices in the mesh, respectively. However, element neighbor relationship needs to be set up as a pre-computation, which takes O(nlogn) time. Thus, overall time cost of the proposed approach is O(nlogn)- Figure 1 gives a surface partition of a typical mesh model. Table 1: Execution time and rate on different test mesh models. Vertex Element Time Model name (second) bumper 473 432 0.17 bracket 236 186 0.07 deck lid 8807 8624 3.0 curverl 143 120 0.04 56 block 46 0.01 12640 25328 6.13 base 156 ellipsoid 308 0.06 1521 3042 torus 0.67

Rate (element/sec) 2541 2657 2871 3000 4600 4132 5133 4533

Figure 1. Surface partition of a bracket model by discontinuity of surface normal and curvature. 4

Acknowledgements

This work is supported in part by University of Michigan - Dearborn, Campus Research Grant and University of Michigan OVPR research grant. We thank Frank Massey for his advice in differential geometry. References 1. Williams, R. D., Performance of Dynamic Load Balancing Algorithms for Unstructured Mesh Calculations. Concurrency: Practice and Experience, 3 (1991) pp. 457-481. 2. Simon, H. D., Partitioning of Unstructured Problems for Parallel Processing. Computing Systems in Engineering, 2(1991), pp. 135-148. 3. Farhat, C. and Lesoinne, M., Automatic Partitioning of Unstructured Meshes for Parallel Solution of Problems in Computational Mechanics. International Journal for Numerical Methods in Engineering, 36(1993) pp. 745-764. 4. Teng, S. H., "Points, Spheres, and Separators, A Unified Geometric Approach to Graph Partitioning." Ph.D. School of Computer Science, Carnegie Mellon University, 1991. 5. Vaughan, C , Structural Analysis on Massively Parallel Computers. Computing Systems in Engineering, 2 (1991), pp. 261-267. 6. Barnard, S. T. and Simon, H. D., A Fast Multilevel Implementation of Recursive Spectral Bisection for Partitioning Unstructured Problems. Concurrency: Practice and Experience, 1994. 7. Kiwi, M., Spielman, D., and Teng, S. H., Min-Max-Boundary Domain Decomposition. Theoretical Computer Science, 261 (2001), pp. 253-266.

A COMBINED MESHFREE METHOD AND MOLECULAR DYNAMICS IN THE MULTISCALE LENGTH SIMULATION Q. X. WANG, T. Y. NG, K. Y. LAM, HUA LI AND X. J. FAN Institute of High Performance Computing,1 Science Park Road, #01-01, The Capricorn, Singapore Science Park II,Singapore 117528 E-mail: [email protected] Multiscale simulation technique for material modeling has been gaining much attention in many research realms, and it has emerged as a promising approach for addressing the challenges in efficient and accurate simulation method development. Thus, a new methodology, the combined meshfree method and molecular dynamics, is developed to simulate the multiscale length coupling between the continuum and the atomistic region. Numerical examples are presented to verify the developed methodology.

1

Introduction

In recent years, multiscale simulation techniques have received much attention in the science and engineering for the description of a wide range of physical phenomena. Traditional mono-scale approaches are obviously inadequate for the analysis of certain physical problems where the studied characteristics at different orders of length-scales, for example, the turbulence problem [1] and the crack propagation problem [2]. In coupling the continuum and the atomistics, much work has been done using the traditional finite element method (FEM) and molecular dynamics (MD). In this paper, element-free Galerkin (EFG) method [3] replaces FEM for continuum analysis and is combined with molecular dynamics (MD) simulation via the development of an appropriate handshaking region. A source code is developed to simulate example problems. The results present this methodology is efficient, and also possesses the additional advantage of simplicity of implementation.

2

Methods

2.1 Molecular

dynamics

Molecular dynamics (MD) simulation involves the classical trajectories of atomic nuclei by integrating Newton's second law of motion of a system. For the MD region in the present work, the two-dimensional Lennard-Jones (LJ) "12:6" potential [4] is used. The LJ potential model is given mathematically as f

0(rff) = 4e

\

<7 r

ij

r.. < r. v

(!)

where r.. = iy , r(- = r ; — r . , £ is a parameter characterizing the interaction strength, and a defines a molecular length scale. rc is the cutoff distance, namely (j>(rr) ~ 0, if rtj >rc. The force corresponding to the potential 0 ( r r ) is computed as

f = -V0(r,) 663

(2 )

664

and the force that atomy exerts on atom i can be expressed mathematically as

48e

( a V4

r a \8

r < r l]

r

—

(3)

'c

\« J Applying Newton's second law of motion, the equation of motion of the system can be obtained. And then integrating the motion equation by the leapfrog method [5], the velocity and coordinate of the atom i can be obtained. The corresponding internal stress tensor is given by [4] N„-\

~ W„

-

, s

l

Na

I>,v,.v,. - £ £r,7f,y 1

•

(4)

7='+'

where V is the area in two-dimensional systems or volume in three-dimensional systems, for the simulation cells. Na is the total number of all atoms, and m, and v, are the mass and velocity of atom (', respectively. 2.2 Element Free Galerkin (EFG) Method The element-free Galerkin (EFG) method is used here to describe the far-field region of the simulation. It employs the moving least-square (MLS) interpolate uh(\) to construct approximation of function u(x). For a given domain Q, the moving least-squares (MLS) approximation uh(x) of a function «(x) is given as = pT(x)a(x)

u*(x) = ]T pj(x)a,(x) ;=i

(5)

where p(x) is a complete polynomial of order m in the space co-ordinates x T = [x,y ] . The coefficients rt/x) in Eq.(5) are functions of x and a(x) at any point x are obtained by minimizing a weighted discrete least-squares norm / as follows 7 ( a ) = £ w ( x - x , . ) [ p T ( x , ) a ( x ) - u,]2

(6)

where n is the number of points in the neighbourhood of x for which the weight function w(x — X •) ^ 0 , and w, is the nodal value of u at x=x;. The neighbourhood of x is called the influence domain of x. The minimum of the weighted discrete least-squares norm 7(a) in Eq. (6) with respect to a(x) leads to the following relation between a(x) and u a(x) = A _ 1 ( x ) B ( x ) u

(J)

where A(x) and B(x) are the matrices and obtained by A(x) = £ w,(x)p T (x,.)p(x ; ) = X w ( x - x , ) p T ( x , . ) p ( x , . ) i=l

(8)

i=l

B(x) = [w,(x)p(x,), w 2 (x)p(x 2 ),..., wn (x)p(x„)] uT

(9) (10)

=[ul,u2,...,un]

Hence, we have

'(x) = ££/7 7 .(x)((A- 1 (x)A(x)) jl .« I .=£y I -(x)" i i=\ j=\

i=l

(11)

665

where y/;(x) is termed the MLS shape function and defined as ^,(X) = XPJ(X)(A-1(X)B(X));,.

(12)

More details of the moving least-square (MLS) interpolate and EFG can be found in the paper by Belytschko et al. [3]. 3

Implementation of the Multiscale Simulation Technique

As shown in Figure 1, we consider a problem domain consisting of 3 sub-regions, the atomistic region QMD (lattice region), the continuum region £2EFG (far-field region) and the handshaking region QHs (transition region). The EFG method is for QEFG and the MD formulation for QMD- The atomistic region and the continuum regions are joined by the handshaking region QHs- The compatibility conditions in QHs, namely the displacement and stress compatibilities, play a critical role in the performance of the present multiscale simulation technique. The detail description of this compatibility technology can be found inKohlhoffetal. [6]. In the simulation of multiscale problems, the generation of computational data sets is a very important task. A source code is developed here to generate the computational data points for the present multiscale simulation technique. The data points used in MD region is generated automatically. For the continuum EFG region, however, the domain is discretized by scattering irregularly distributed nodes, where the nodal density is adjusted according to the nature of the problem.

Figure 1. Problem Domain — QEFG (continuum), QMD (atomistics) and £2Hs (handshaking).

4

Numerical Results

The presently developed coupled EFG/MD multiscale simulation technique is applied for a kind of face-centered crystal (FCC) silver (Ag) plate, in the (001) plane. A uniaxial tension is applied at the two ends of the plane. Using the above coupled EFG/MD multiscale technique. The deformation of the plate and the stress distribution are computed. Figure 2 shows the distribution of the stress o xx along the plate symmetrical

666

line in x (tension) direction through the entire computational region. This figure shows that the MD result possesses a large stress oscillation at the initial time step. This is probably due to the application of the simple LJ pair potential for the MD simulation. However, once it reaches the equilibrium state, the results agree well with the analytical solutions as well as those of Kohlhoff et al. [6], using coupled FEM/MD. The numerical example demonstrates the viability and efficiency of the presently developed coupled EFG/MD method. The results are very encouraging, showing distinct advantages, such as acceptable accuracy and ease of implementation.

Analytical result

/

EFG result MD result 0.0 -20

-15

-10

-5

0

5

10

15

20

x(k) Figure 2. Stress distribution a „ along the plate symmetrical line in x direction. References 1. Hou T. Y., Wu X. H., Chen S. Y., and Zhou Y., Effect of finite computational domain on turbulence scaling law in both physical and spectral spaces. Physical Review E, 58(5) (1998), pp. 5841-5844. 2. Abraham F. F., Broughton J. Q., Bernstein N., and Kaxiras E., Spanning the continuum to quantum length scales in a dynamic simulation of brittle fracture. Europhysics Letters, 44(6) (1998), pp. 783-787. 3. Belytschko T., Lu Y. Y., and Gu L., Element-free Galerkin methods. International Journal for Numerical Methods in Engineering, 37 (1994), pp. 229-256. 4. Blonski B., Brostow W., and Kubat J., Molecular-dynamics simulations of stress relaxation in metals and polymers. Physical Review B, 49(10) (1994), pp. 6494-6500. 5. Rapaport D. C , The Art of Molecular Dynamics Simulation. Cambridge University Press (1995). 6. Kohlhoff S., Gumbsch P., and Fischmeister H. F., Crack propagation in b.c.c. crystals studied with a combined finite-element and atomistic model. Philosophical Magazine A, 64(4) (1991), pp. 851-878.

SELF-SIMILAR PROBLEMS IN MULTIDIMENSIONAL CONSERVATION LAWS

SUNCICA CANIC Department

of Mathematics,

University E-mail:

Department

of Mathematics,

University E-mail:

Department

of Mathematics,

California

of Houston, Houston, [email protected]

Texas 77204-3008,

USA

Texas 77204-3008,

USA

B A R B A R A LEE K E Y F I T Z of Houston, Houston, blkQmath.uh.edu

EUN HEUI KIM

E-mail:

State University, Long Beach, CA USA [email protected]

90840-1001,

We report on an approach to analysing hyperbolic conservation laws in several space variables by examining two-dimensional Riemann problems. Use of selfsimilar coordinates reduces the problem to a system of conservation laws in two variables; however, the system now changes type, and a complete analysis requires solving unusual boundary-value problems for degenerate elliptic and degenerate hyperbolic equations, as well as free-boundary problems for such equations. Recent work has resolved some of these difficulties. The talk illustrates this by solving some problems related to weak shock reflection in prototype equations.

1

M u l t i d i m e n s i o n a l C o n s e r v a t i o n Laws

Modeling by conservation principles is fundamental to fluid mechanics, and the importance of multidimensional systems is widely acknowledged. However, there are no general existence theorems for weak solutions of systems of conservation laws in more than one space dimension, as the tools which form the basis of a theory for hyperbolic conservation laws in a single space dimension do not extend to higher dimensions. To be specific, the principal method of analysis is through solution of the Riemann problem; this constitutes a nonlinear version of the method of characteristics. The role of characteristics in propagation of solutions of hyperbolic equations is complicated in several space dimensions, even for linear and semilinear problems, and a nonlinear formulation has not yet been found. Recently, we have started to analyse two-dimensional Riemann problems. One goal of the research is to learn what sorts of singularities appear generically — that is, what are the two-dimensional analogues of shock discontinuities. Related to this, we hope to establish a priori bounds on weak solutions. In addition, a number of self-similar problems are of interest in themselves. For example, the so-called "von Neumann paradox" in weak shock reflection focuses on the failure of shock polar analysis to explain the nature of shock reflection when the waves are weak enough that the nonlinear acoustic waves dominate the linear entropy and vorticity waves. This problem can be studied in prototype equations which are simpler than the full equations of gas dynamics. We have examined the unsteady transonic small disturbance (UTSD) equation and the nonlinear wave system (NLWS).

667

668 2

Self-Similar R e d u c t i o n

A working definition of a Riemann problem (not the only definition possible), is one for which the data depend only on x/y and hence self-similar solutions in x/t and y/t are expected. A system of conservation laws in two space dimensions and time, Ut + F{U)X + G(U)y = Ut + A(U)UX + B(U)Uy = 0, where U(x, y, t) e K n and F and G are smooth maps on R n , becomes a system in two variables £ = x/t, r\ = y/t, which can also be written in conservation form: F( + Gv = (F - £U)e + (G-

r,U)v =

-nU.

A typical system of conservation laws, for example the equations of isentropic or polytropic compressible gas dynamics, is hyperbolic in space and time, with a pair of nonlinear acoustic wave speeds and a number of linear, degenerate characteristics corresponding to entropy or vorticity waves. The reduced system is hyperbolic only far from the origin and changes type at the sonic line, corresponding to the acoustic wave cone 1 ; there is a bounded set {{£,,v) £ ^ } m which the system is elliptic (if n = 2) or of mixed type (if n > 2). The reduced system is often called 'quasisteady', and there is a close analogy with the equations of steady transonic flow, which are also much used in applications but for which there is not a complete theory. In the prototype systems we have studied, the UTSD equation and NLWS, the elliptic part can be written as a second-order equation which appears to be tractable. The Euler system is more complicated. In any case, fl is not known a priori, but depends on the solution U; typically the boundary of fl is at most Lipschitz. In the hyperbolic region, solutions of the reduced system may be relatively simple. For example, for Riemann data which is piecewise constant in sectors, the far-field solution can be found by the elementary construction of solving one-dimensional Riemann problems. Interactions in the hyperbolic region of these one-dimensional waves can be analysed for small data (as a consequence of one-dimensional theory), and in some case have simple selfsimilar solutions by elementary constructions 1 . At least two types of behavior at dQ have been identified. If U is continuous at d£l then the elliptic equation is degenerate at dfi. This is the case even for linear equations such as the two-dimensional wave equation, whose fundamental solution has a square-root singularity at the wave cone. When U is also constant at dfi, the nonlinear equation possesses a nonlinear version of the same anisotropic degeneracy, which is of a type first analysed in work of Keldysh 2 ; it is different from the Tricomi singularity, which appears when the steady transonic potential equation is written in hodograph variables. This nonlinear equation had not been previously studied. Canic and Keyfitz 3,4 , and Canic and Kim 5 found solutions in weighted Sobolev spaces and in Holder spaces (see also related work of Zheng 6 ), and found that nonlinear Keldysh equations, as distinct from linear equations, may in addition have solutions which are continuously differentiable up to the degenerate boundary. Both singular and regular behavior occur, often in the same problem, on different parts of the boundary 7 .

669

The segments of d£l at which U is continuous and constant correspond to spacelike surfaces; that is, the problem of posing Dirichlet data on d£l is well-posed. However, there are configurations in which locally well-posed solutions outside of fl are not constant on dfl and do not extend to a solution in all of R 2 . Thus, even when a solution which is continuous across the sonic line is expected (from the absence of compression waves in the data, for example), it is not always possible to predict the location of 0. based on the supersonic solution alone. A second type of behavior occurs when transonic shocks appear in the solution. In this case, the solution is discontinuous across dfi. The equation may be strictly hyperbolic on one side and the elliptic part of the operator strictly elliptic on the other; however, the boundary itself is now unknown a priori. This leads, then, to a free boundary problem in which the position of the shock and the subsonic flow are coupled by means of the Rankine-Hugoniot equations, a system of nonlinear equations relating the shock slope, the (known) state outside the shock and the unknown state inside ft. In simple cases, the equation governing the subsonic flow is strictly elliptic, the shock may change continuously from supersonic to transonic, crossing a degenerate part of <9fi as it does so. Even without this additional complication, the free boundary problem is not of a standard type, as the underlying elliptic equation is quasilinear and the coupling between the shock slope and the states is highly nonlinear. This has turned out to be the principal challenge of the project up to this point.

3

Oblique D e r i v a t i v e Free B o u n d a r y P r o b l e m s

In work with Lieberman 8 which proves a stability result for steady transonic flow, and which we have extended to establish weak 9 and strong 1 0 regular reflection patterns in the UTSD equation, at least in a neighborhood of the interaction point, we have found a method to prove existence of the free boundary and the corresponding subsonic solution. The method is classical, but seems well-adapted to quasilinear equations and nonlinear boundary conditions. It is based on formulating the elliptic equation as a second-order equation Q(u) = 0, whose coefficients do not involve the derivatives of u (here u is one state variable); and on casting the Rankine-Hugoniot as an evolution equation for the shock position and an oblique derivative boundary condition, (3 • Vu = 0, on the free portion of the boundary. Taking an approximate position for the shock in an appropriate Holder space /C of curves, a mapping on K. is defined by solving the quasilinear fixed boundary problem for u and then solving the evolution equation to define a new curve. The key is is a gain of regularity in this mapping, due principally to estimates one can obtain in the oblique derivative problem; we can show that the mapping is compact and has a fixed point. Kim has shown that in some cases the solution is unique 1 1 . The lack of regularity of dil requires the use of weighted Holder norms. The lack of uniform ellipticity in the case of a shock adjacent to a continuous sonic boundary is handled by elliptic regularization. We have solved two prototype problems for the UTSD equation 9 , 1 0 , but we expect the method to work quite generally. Up to this point we have assumed that the oblique derivative boundary condition is uniformly oblique. This is true in cases

670 where the shock itself is oblique and never normal. However, in many interesting problems, such as the formation of a Mach stem, the shock is normal at one point (the foot or symmetry point), and such appears to be, in fact, the generic situation for transonic shocks. For example, a uniform planar shock spanning a subsonic region has this property at its mid-point. Our current work focuses on adapting the compactness estimates to include this degeneracy. Acknowledgments Research of the first author (SC) supported by the National Science Foundation (NSF), grant DMS-9970310 and by the Texas Advanced Research Program (TARP) grant 003652-0112-2001. Research of the second author (BLK) supported by the Department of Energy, grant DE-FG-03-94-ER25222 and TARP grant 003652-00762001. Research of the third author (EHK) supported by NSF grant DMS-0103823. References 1. S. Canic and B. L. Keyfitz. Quasi-one-dimensional Riemann problems and their role in self-similar two-dimensional problems. Archive for Rational Mechanics and Analysis, 144:233-258, 1998. 2. M. V. Keldysh. On some cases of degenerate elliptic equations on the boundary of a domain. Doklady Acad. Nauk USSR, 77:181-183, 1951. 3. S. Canic and B. L. Keyfitz. An elliptic problem arising from the unsteady transonic small disturbance equation. Journal of Differential Equations, 125:548574, 1996. 4. S. Canic and B. L. Keyfitz. A smooth solution for a Keldysh type equation. Communications in Partial Differential Equations, 21:319-340, 1996. 5. S. Canic and Eun Heui Kim. A class of quasilinear degenerate elliptic equations. Journal of Differential Equations, to appear. 6. Yuxi Zheng. Existence of solutions to the transonic pressure-gradient equations of the compressible Euler equations in elliptic regions. Communications in Partial Differential Equations, 22:1849-1868, 1997. 7. S. Canic, B. L. Keyfitz, and E. H. Kim. Mixed hyperbolic-elliptic systems in self-similar flows. Boletim da Sociedade Brasileira de Matemdtica, 32:1-23, 2002. 8. S. Canic, B. L. Keyfitz, and G. M. Lieberman. A proof of existence of perturbed steady transonic shocks via a free boundary problem. Communications on Pure and Applied Mathematics, LIIL1-28, 2000. 9. S. Canic, B. L. Keyfitz, and E. H. Kim. A free boundary problem for a quasilinear degenerate elliptic equation: Regular reflection of weak shocks. Communications on Pure and Applied Mathematics, LVt71-92, 2002. 10. S. Canic, B. L. Keyfitz, and E. H. Kim. Free boundary problems for the unsteady transonic small disturbance equation: Transonic regular reflection. Methods and Applications of Analysis, 7:313-336, 2000. 11. Eun Heui Kim. Boundary behaviors and uniqueness of solutions for a class of quasilinear degenerate elliptic equations. Submitted, 2002.

V A R I A N C E R E D U C T I O N OF M O N T E C A R L O M E T H O D S F O R OPTION P R I C I N G U N D E R STOCHASTIC VOLATILITY MODELS X. Q. LIU, Y. Y. WONG Department of Mathematics, National University of Singapore 2, Science Drive 2, Singapore 117543 E-mail: [email protected], [email protected] The Clark-Funke-Shevlyakov-Haussmann-Davis-Ocone (CFSHDO) formula is used to construct perfect control variates for vanilla and exotic option prices under stochastic volatility (SV) models and Longstaff and Schwartz? least squares Monte Carlo method is employed to estimate the conditional expectations in the CFSHDO formula. The resulting variance reduction effect is very significant and well worth the additional efforts to compute the control variates. Our results shed light on the success of a systematic approach, against various existing ad hoc techniques, to constructing control variates for Monte Carlo valuation of exotic options under complex models.

1

T h e C F S H O formula and its applications

Based on the Clark's [2] representation theorem on functionals of a Brownian motion, the stochastic integral representation of functionals of the solution to a stochastic differential equation (SDE) was derived by Funke and Shevlyakov [4], Haussmann [5], Divis [3], and Ocone [8] by different approaches. This formula results in an explicit representation of optimal portfolios for utility maximization in Ocone and Karatzas [9]. It is also used in [7] for developing variance reduction methods for simulated diffusions. An extension of the formula is found in Aase et al [1], where the extended formula is applied to the explicit calculation of the hedging function for a European call option. This paper aims to adapt the variance reduced Monte Carlo (VRMC) method of Newton to option pricing under stochastic volatility models. To approximate the conditional expectation in the representation formula, we employ the least-squares Monte Carlo estimation proposed in Longstaff and Schwartz [6] for American option pricing. Numerical experiments show that the specification of a linear form for the regressions is always adequate to provide a powerful control variate. 2

Heston's model and the C F S H D O representation of option payoffs

Hestons model is developed to capture the volatility smiles. It allows for a correlation between the asset return and the volatility as follows: dSt =

rStdt+y/VtStdWt1,

dVt = {UJ- evt) dt + £VtdW2, d[W\W2]t

= pdt.

The correlated Brownian motions W£ and W2 can be expressed by two independent Brownian motions B\ and B2, namely, dW} = dB\ and dW2 = pdB\ + 2

VT^dB .

671

672 The VRMC method for vanilla and exotic options under Heston's model requires regular MC simulations of the underlying asset prices St subject to the simulated volatilities Vj and the computation of the control variates. As a matter of fact, the payoff functional is not differentiable when the option happens to be at the money at maturity. However, this does not prevent us from applying the representation theorem to the pricing of the options because such events are of probability zero. The validity can be justified alternatively by approximating the payoff functionals using a sequence of differentiable functionals. 3

T h e V R M C algorithms for options under Heston's SV models

3.1

Algorithm for vanilla options

To reduce computing work on the control variate, the regressions are performed only on a partition with a coarser space relative to the partition for the MC simulations. The VRMC algorithm for vanilla options is detailed as follows: (1) Regular MC simulations of the underlying asset price: "m+i

=

bm + rhbm + yVmom

[Bm_^i — Bmj ,

•Vm+1 = Vm + (co- 9Vm) h + £Vm [ p (Bi+1

+ V l - P 2 {B2m+1 - B2m)] •

- Bl)

(2) A backward recursion on a k—times coarser partition $n+l = I A " )++ ( ^ "n\kh

oil

\o

+

nfc 2 \ / V.nk ^Tik

o

yB(n+i)k ~

iP

B

nkj

(S(2»+D'= - s ' 0 '

+ o i^r^)

/ 1 if ST > K _ , \0iiST
PN-I

(3) Least-squares regressions for the conditional expectations: O-Unk 0>12nk

&nk

0>13nk

0-2lnk 0-22nk

vnk

0-23nk

(4) The integration for the control variate: AT-l

Plain: I =

0-2\nk 0-22nk M-l

Interpolation: J =

\ m=0

ai2n \ 021r 022n

,

Ollnfc C-12nk

^

a

13n

fl23n

Vnk

(0,\3nk

o-nk [Bln+1)k - B\kj ,

V a23nk

r dllm

a\2m

fll3n

0-2\m 0,22m

G23n

= Interpolation ((°Unk

ai2nk

)

\ \ 0.21nk 0,22nk )

°~m {Bm+i , (ai3nk \ d23nk

~ Bm) ,N

673 (5) The computation of the VRMC price:

1=1i

3.2

The algorithm for vanilla options

The algorithm for Asian option varies from that for vanilla options in that, (i) it requires the computation of the average price St of the underlying asset on the coarse partition; (ii) the recursive scheme for sample values of the regression changes due to a different form of the Riesz characterization of the differential of the option payoff functional; (iii) the regressions are tri-variable: (2') A backward recursion: PN = 0, A l n

_ J kh\iSN> " \0

K

HSN
'

A

_ ,

.

"-^m,uj,

Pn = Pn+l3>n+l + A n . (3') Regressions: Snk = — — T (nSn-i

( 4

O-Unk Ul2nk

+ S„k) , ai3nib \

0-2\nk 0-22nk Q23nk

I

l31nk

1

0,32nk d33nk

Numerical implementations and conclusions

The algorithms in the last section are implemented to price vanilla and Asian options under Heston's model.

aw am Numbardpathi

3500

woo

Figure 1: Price vs Number of Simulations

1ffl>

200

300

wo

Nuflbor ol Samples lot Rao;i*M]n

Figure 2: VR Effect vs Size of Regressions

674 Figure 1 shows the different trends in the price of a particular Asian option produced by the MC and VRMC methods respectively as the number of simulations increases. The ratios between the variances of the MC and VRMC methods respectively are plotted against the size of the regression in Figure 2. Substantial numerical results show that the VRMC method always reduces the variance of MC method dramatically with limited increase in computing time. The computational work on the control variate can be diminished since the effect of variance reduction is insensitive to the size of regressions. Refinement of the coarse discretization and interpolation in the numerical integration for the control variate is found to enhance the reduction of variance significantly. References 1. Aase, K.; 0ksendal, B.; Privault, N.; Ub0e, J. White noise generalizations of the Clark-Haussmann-Ocone theorem with application to mathematical finance. Finance Stoch. 4 (2000), no. 4, 465-496. 2. Clark, J. M. C. The representation of functionals of Brownian motion by stochastic integrals. Ann. Math. Statist. 4 1 (1970) 1282-1295. 3. Davis, M. H. A. Functionals of diffusion processes as stochastic integrals. Math. Proc. Cambridge Philos. Soc. 87 (1980), no. 1, 157-166. 4. Funke, R.; Shevljakov, A. Ju. A generalization of Clark's formula. (Russian) Theory of random processes, No. 5 (Russian), pp. 93-96, 114. Izdat. "Naukova Dumka", Kiev, (1977). 5. Haussmann, U. G. On the integral representation of functionals of ltd processes. Stochastics 3 (1979), no. 1, 17-27. 6. Longstaff, F. A.; Schwartz E. S. 2001. Valuing American options by simulation: a simple least-squares approach. Review of Financial Studies 14 (2001), no. 1, 113-147. 7. Newton, N. J. Variance reduction for simulated diffusions. SIAM J. Appl. Math. 54 (1994), no. 6, 1780-1805. 8. Ocone, Daniel Malliavin's calculus and stochastic integral representations of functionals of diffusion processes. Stochastics 12 (1984), no. 3-4, 161-185. 9. Ocone, D. L.; Karatzas, I. A generalized Clark representation formula, with application to optimal portfolios. Stochastics Stochastics Rep. 34 (1991), no. 3-4, 187-220.

A SUPERLINEARLY CONVERGENT ALGORITHM FOR LARGE SCALE MULTI-STAGE STOCHASTIC NONLINEAR PROGRAMMING* FANW E N M E N G , R O G E R TAN A N D G O N G Y U N ZHAO Centre for Industrial Mathematics, National University of Singapore, 2 Science Drive 2, Singapore 117543 E-mail: [email protected] and [email protected] This paper presents an algorithm for solving a class of large scale nonlinear programming which is originally derived from the multi-stage stochastic nonlinear programming. With the Lagrangian dual method and the Moreau-Yosida regularization, the primal problem is transformed into a smooth convex problem. By introducing self-concordant barriers, an approximate Newton method is designed. The algorithm is shown to be of superlinear convergence. At last, preliminary numerical results are provided.

1

Introduction

In this paper, we consider the following large scale nonlinear programming: min{/(a;) | Ax = a, U{x) <0,ie

I = {1, 2, • • -,0},x

G Rn}.

(1)

where f,fi,i&J, are smooth, convex on Rn and A G fj™*n with rank(yl) = m < n. It is known that a lot of practical problems can be formulated into (1). In particular, the multi-stage stochastic convex nonlinear programming (MSSCNP), which has been intensively studied in the past few years ([5,10,11]), can be written in the abbreviated form as (1) (see, [11]). Some linearly convergent algorithms have been developed for solving (MSSCNP). However, at present there are not faster algorithms for (MSSCNP) in the literature. Thus, it is a very interesting and meaningful work to investigate some rapidly convergent algorithm for problem (1). Let T = {x e Rn | fi(x) <0,i e I}. It is known that in (MSSCNP), / and T are separable into scenarios, while the nonanticipativity constraint Ax = a is not separable. Thus at first we seek to relax the constraint Ax = a by using the Lagrangian dual of problem (1) as follows: mm{C(u)\ueRm},

(2)

where C(u) = max{-f{x)

+ uT{Ax

- a) | x e F}.

(3)

A substantial obstacle in solving problem (2) is that £ is nondifferentiable. To overcome this, we convert (2) into another convex problem by using the so-called Moreau-Yosida regularization of £: mm{ri(u)\u€Rm},

(4)

•THIS RESEARCH IS SUPPORTED BY THE GRANT FROM IHPC-CIM RESEARCH PROJECT: R-146-000-036-592.

675

676

where V(u)=

mm{av)+-\\v-u\\2M},

vERTn

(5)

Z

where M is an n x n symmetric positive definite matrix and | | W | | M := wTMw for any w e Rm. In this paper, we take M = jl, A > 0, for simplicity in discussion. Problems (2) and (4) are equivalent in the sense that the solution sets of this two problems coincide with each other. It is well known that r\ is smooth with VTJ(U) := g{u) = (1/A)(u — p(u)), where p(u) is the unique solution of (5). In general, there are a number of iterative methods that can be used as a procedure to obtain the approximate solution of subproblem (5), such as [1,3]. However, these methods tend to spend more inner iterations as the outer iteration proceeds. Hence, how to improve the efficiency of this subproblem is a key question for the whole problem. In this paper, we will investigate a method to solve subproblem (5) effectively. Roughly speaking, we add a self-concordant barrier function b to the objective function of problem (3). Thus, we obtain a new function £(//, u) = max{— f(x) — pb(x) + vT(Ax and its corresponding function r,(p,u)=

— a) | x £ i n t F } ,

p > 0,

(6)

rj(p,u):

mm {C(p,v) + l/(2X)\\v-u\\2},

A > 0.

(7)

Based on some nice properties estalished in this paper, we can use the higher order derivatives to solve the smooth subproblem (7). 2

Self-concordant P r o p e r t y

Since 77 is convex, continuously differentiable with the Lipschitz continuous gradient, hence we should develop some suitable gradient-based algorithm for solving (4). However, as we stated above, 77, V77 are obtained through the optimal solution of problem (5), which is difficult to solve. In order to overcome the nondifferentiablity of C> we add a self-concordant barrier b to this function and obtain problems (6) and (7). The definitions and properties about self-concordant functions are referred to [8]. Throughout, we make the following assumptions: ( A l ) A has full row rank, ( A 2 ) T is a compact convex set and int T ^ 0, ( A 3 ) b : T —> R is a 9—self-concordant barrier, ( A 4 ) / : T —> R is (3—compatible with b. Under the above assumptions, it is evident that problem (7) has a unique solution denoted by x(n,v), y, > 0 and v E Rm. Let q((i,x) := f(x) + fd>(x), e(fj,,v,x) := -q(fj,,x) + vT(Ax - a) and p(n,v,u) := C(M.U) + (V2-MIIW _ u ll 2 Then, we derive the following important result: P r o p o s i t i o n 1. For any fixed u e Rm, p(p,v,u) is a strong self-concordant family with parameters a(p) — p/(l + /3) 2 , u>(p) — (1 + 0)91/2/p, i/(p) —

[(i + 0)e^2 + i/2}/p.

677 It is well known that an important class of functions which can be minimized by path-following algorithms in polynomial time are self-concordant families. Therefore, p(p, v, u) plays a great role as we construct the algorithm in the next section. 3

Algorithm and Convergence Theorems

In Section 2, we have shown that for any p > 0,u G Rm, problems (5), (6) and (7) have unique solutions p(u), x{p,u) and v(p,v), respectively. Then, we have P r o p o s i t i o n 2. For any fixed u G i ? m , v(p,u) converges to the optimal solution of (5) as p —> 0. Let e > 0 , u G Rm, if there exits a vector p£(u) G Rm such that C(Pe(u)) + 2t\\Pe(u) — w|| 2 < v(u) + e> then we call p£(u) an e-approximate solution of rj(u). P r o p o s i t i o n 3 . Let u G Rm, for any e > 0, there exits p > 0 such that for each p G [0,p], v{p,u) is an e-approximate solution of r](u). Now, suppose pe(u) is an e-approximate solution of ri(u), we define the approximations of r)(u) and Vr)(u) by r]e(u) = £(p £ (w)) + l/(2A)||p e (M) — u\\2 and gE(u) = (l/\)(u—pe(u)) respectively. Then, we can compute rjE(u) and gs{u) to be arbitrarily close to r](u) and g(u) respectively as long as the parameter e is chosen small enough. Furthermore, with help of Proposition 3, we only need to compute v(fi, u) which can be chosen as an e-approximate solution of r)(u) for some small positive Jx. Next, we investigate the algorithm for problem (4). The Newton direction used for minimizing p(/j,,., u) is as what follows. Av = -(V2p(fj,,

v, u))-lVp(n,

v, u),

(8)

where V2p(p,v,u) = y 2 C(/i,^) + (1/A)/, Vp(p,v,u) = VC(t*,v) + (1/A)(i> - «)• Denote 5(p,,v, u) := y / 'p,~ l Av T V 2 p(n, v, u)Av. Regarding the outer problem (4), since it is impossible to solve an exact generalized Hessian V G <9B(?(U), we hope to compute it approximately. It is evident that if £ is twice continuously differentiable at p(u), then dsg{u) consists of the single element, namely V2??(w) = (1/A)7 — ( 1 / A ) [ / + AV 2 C(p(w))] _1 - So, we can develop an approximate Newton method for solving problem (4). Now we state the following heuristic algorithm. Algorithm Step 1. Choose e 0 > 0, e 0 := Meo) > °> 7 e (0.1). £o > 0,/x0 > 0,v°,u°,X > 0. Let k = 0. Step 2. Let v = vk, u — uk. Step 2.1. Maximize e(pk,v,.) and obtain x(nk,v). Step 2.2 Construct the Newton direction At; by using (8). Step 2.3 Choose a step size a > 0. Set v+ = v + aAv. Step 2.4 If 5(pk,v+) < efc. Set w(fc+1) = v+ and go to Step 3. Otherwise, set v = v+ and go to Step 2.1. Step 3. If pk £ £fc, set p^ = PQ, go to Step 4. Otherwise, set Pk+i = IPk- Set k = k + 1 and go to Step 2. Step 4. Let p£k(uk) = t/ fc+1 >. Compute geic(uk) = (1/A)(u fc - p £ t ( u f c ) ) , pick a positive definite symmetric matrix 14 (details are given below) and compute the search direction dr = —V^g£k(uk).

678 Step 5. Choose a step size Tfc > 0(0 < Tfc < 1), set uk+1 = uk + Tkdk. Choose a scalar 0 < Ek+i < £k- Let k = k + 1, go to Step 2. There are some ways to choose Vk, such as if V2C(pefc(wfc)) exists, choose Vk = (l/A + 7fc)J- (1/A)[7 + AV 2 C(p £ f c (u f c ))]-\ and Vk = Vk-i otherwise. Here V^ = I and 7fe is a small constant to ensure Vk is positive definite. Then, we get the following covergence theorems of the above algorithm. T h e o r e m 1. Suppose that there exists a constant n > 0 such that {Vkh,h) > rc||/i||2, for all h e Rm and all k. Suppose {Tfc} tends to 1 as k —> oo. Then, any accumulation point of {uk} is an optimal solution of problem (4) as ek —• 0. Corrollary 1. Suppose the assumptions of Theorem 1 are satisfied. Let u* be an optimal solution of (4). Then {vk} converges to u* as uk —* u* and ek —• 0. T h e o r e m 2. Let u* be an optimal solution of problem (1.4). Let X(/J,,V) and v(fx, u) be unique solutions of problems (6) and (7), respectively. Then x(fi, u) converges to an optimal solution x* of problem (1) as u —> u* and /J, —> 0. T h e o r e m 3. Suppose that the conditions of Theorem 2 are satisfied and u* is an optimal solution of (4). Assume that g is BD-regular and semismooth at u*. Suppose that Sk ~ o(\\g(uk~1)\\2); for all large A;, rk = 1; l i m ^ o o dist(Vk,dBg(uk)) = 0. Then {uk} converges to u* at least 2-step superlinearly. At last, we test Manpower planning problem and Production planning problem, which were described in [7]. Numerical results show that the algorithm proposed in this paper, which combines the barrier Lagrangian dual and Moreau-Yosida regularization, can solve problems in reasonable time. References 1. A. Auslender, Numerical methods for nondifferentiable convex optimization, Math. Programming Study, 30(1987), 102-126. 2. F.H. Clarke, Optimization and Nonsmooth Analysis. New York: Wiley, 1983. 3. R. Correa and C. Lemarechal, Convergence of some algorithms for convex minimization, Math. Programming. 62(1993),261-275. 4. M. Fukushima and L. Qi, A Global and Superlinear Convergent Algorithm for Nonsmooth Convex minimization, SIAM J. Optimization, 6(1996),1106-1120. 5. J.L. Higle and S. Suvrajeet, Stochastic decomposition, Kluwer Academic Publishers, 1996. 6. J.B. Hiriart-Urruty and C. Lemarechal, Convex Analysis and Minimization Algorithms. BerlimSpringer Verlag,19. 7. A.J. King, Stochastic programming problems: examples from the literature, Numerical Techniques for Stochastic Optimization, Y. Ermoliev and R. J-B Wets eds. Springer-Verlag, 1998, 543-567. 8. Y. Neesterov and Nemirovskii, Interior-point Polynomial Algorithms in Convex Programming SIAM, Philadephia, P E , 1994. 9. R.T. Rockafellar, Convex Analysis. Princeton, New Jersey, 1970. 10. G. Zhao, Lagrange dual method with self-concordant barriers for multi-stage stochastic nonlinear programming, Report, 1999. 11. G. Zhao, A log-barrier method with Benders decomposition for solving twostage stochastic programs, Mathematical Programming, 90(2001), 507-536.

COMPUTATION OF NETWORK DELAY WITH PRIORITISED TRAFFIC INVOLVING THE MULTI-PRIORITY DUAL QUEUE ANTHONY BEDFORD AND PANLOP ZEEPHONGESKUL Department

of Mathematics

and Statistics, RMIT University, Plenty Road, Bundoora Victoria, 3083, Australia E-mail: anthony.bedford® ems. rmit.edu.au

East,

We continue our work on the unique differentiated service network, involving the multi-priority dual queue (MPDQ), by investigating exclusively the 'high' Quality of Service (QoS) criteria - delay (waiting time). The MPDQ is a new scheduling regime shown to reduce congestion for multi-class traffic over conventional scheduling disciplines. To gain insight into the advantages of a mixed MPDQ / First In First Out network with prioritised traffic, simulations are performed on networks with and without a dual queue. The simulation analysis is discussed, including the enormity of files created for cumulative density construction, and criteria for ceasing the runs. We construct a new statistic, the Adjusted Average Network Delay (AAND), which removes the differences in class service rates to establish a relative measure of class transient times through the network. We also look at high-class traffic under different offered loads, and provide a comparison of the delay characteristics. These findings provide communication service providers valuable information in determining the improvement in QoS to differentiated networks. They also highlight the importance of simulation as a tool of evaluation of networks, and the best quantity of MPDQ's to include in a network scenario.

1

Introduction

Simulation analysis using Arena [1] is used to continue our work on delay in networks with and without a MPDQ. The simulations undertaken contained huge data files totalling 40Mb. The simulation time was set at 50000 units to allow for the loss levels (see [2]) to reach steady state. This was determined through pilot simulations, and can be seen in other work in this area [3]. Here we concentrate on class wise delay using a cumulative density function (CDF) for the three broad networks described in our earlier paper, and also use the networks described in that work [2]. 2

Route Delay

As a measure of network performance, the average waiting time for this type of traffic (differentiated with independent service rates) is not a perfect reflection of source and destination delay. We arbitrarily assigned a longer service rate for the higher class of customer in order to emulate an increased instantaneous network demand from this class. This may also mean the delay time in the network is longer. Therefore Table 1 lists an adjusted average network delay (AAND) from source to destination. This is given by AANDC>Bir = " l (w''r -{Nn
679

680

dual queue is one of the links. This also does not adversely effect the F-F link, with times almost identical to the F-F-F network. Class 3 is disadvantaged by the dual queue, with the largest delay time of all traffic in these 3-node networks. All FIFO networks show no discrimination in their delay times, with the AAND near identical for the classes. Table 1. AAND by Class and Network. ]Route

Networks 3-node: F-F-F, DQ-F-F 4-node: F-F-F-F, DQ-F-F-F 5-node: F-F-F-F-F, DQ-F-F-F-F

Class 1 2 3 1 2 3 1 2 3

F-F

DQ-F

F-F-F

DQ-F-F

54.27, 55.00 54.47, 54.70 56.75, 57.10 49.08, 48.36 49.02,48.14 50.83, 50.72 38.89, 39.42 39.15, 39.97 41.20,41.56

- , 33.50 - , 35.20 -,71.50 - , 30.63 -,31.74 -, 63.41 - , 25.78 - , 26.25 -,51.22

72.68, 63.49 73.58, 64.77 76.37, 81.90 58.76, 59.27 59.20, 59.54 61.89, 63.01

-,54.13 - , 56.23 - , 89.86 -, 44.25 - , 46.47 -,71.64

For the 4-node networks, the dual queue improves service indirectly to the FIFO nodes. The DQ-F-F-F shows slightly improved delay times for the F-F route for all classes and greatly improved delay times for the F-F-F route over the FIFO network. We begin to see a flow-over effect of resorted traffic by the dual queue to the FIFO nodes. Notably, Class 3 is again disadvantaged in the dual queue network. As the networks increase in size, the delay to this traffic is closing in on the FIFO network delay times. In the 5-node networks, the F-F route is now virtually identical for the two variants, with the dual queue network now a little larger in delay than the FIFO network along the F-F route. This is also the case in the F-F-F route. Also the DQ-F and DQ-F-F routes in the dual queue network continue to deliver lesser delay times than the FIFO. The margin is smaller, indicating that the effectiveness of the dual queue in re-sorting inter-nodal arrival traffic is reduced. With more non-sorted departures within the network, the influence of the dual queue, whilst still effective, may not deliver the requirements needed to justify higher priority traffic or the cost of implementing a dual queue. 3

CDF of delay

The behaviour of the traffic along all routes is now modelled for each class. The delays are no longer adjusted for their service times as in the previous section as we wish to investigate what is happening to the traffic in its entirety. We use the CDF of delay time to model each network, and compare them on a route basis for each class. In this way, as before, administrators can decide what prioritised customers should expect in their service requirements. We first look at Figure la, which contains the 3-node network CDF's. The optimal curve is one that reaches a probability of 1 as rapidly as possible. All delay CDF's have the familiar 'S' shape common in delay models. What is noticeable in Figure la is the desirable characteristics in the first three curves. Class 1 and 2 traffic along the DQ-F route achieves excellent delay times. The sorting influence of the dual queue is also evident, with Class 1

681

Figure 1. (a) 3-node network (b) 4-node network (c) 5-node network (d) Overall network delay

traffic along the F-F route in the dual queue network also achieving excellent delay. This influence is a way of sorting traffic, as it exits the dual queue in class order, and arrives at the next node in class order. The intermixing of this traffic with traffic from other nodes will be in a semi-ordered form. This result is excellent for this class. Class 1 traffic can be guaranteed low delay times irrespective of the route in the 3-node dual queue network. Furthermore, Class 2 and 3 traffic in the same network receives virtually identical delay functions as the FIFO counterparts. The only poor result in the dual queue network is for Class 3 traffic. The FIFO network routes all receive near identical delay functions. We next consider the 4-node networks. From Figure 1(b), we again see the Class 1 and 2 traffic in the dual queue network receiving excellent service. Class 2 and 3 customers along the F-F route in the dual queue network also do well, with them having near identical delay functions to that of the FIFO network. Unlike in Figure 1(a), Class 1 customers are now delayed substantially more in the dual queue network along the F-F route. The influence of the dual queue may have waned for Class 1 customers, as the dual queue is not necessarily adjacent to both of the F-F routes. In the 3 node routes in the 4-node network, as seen in Figure 1(b), the delay functions are closer. Notably, the magnitude of the x-axis has increased as we are now analysing traffic through 3 nodes. The results are excellent for the dual queue network, with Class 1 and 2 traffic in this network on both types of routes experiencing the shortest delay times. Class 3 again has the longest delay in the dual queue network along both routes. In the FIFO network, Class 2 and 3 are again superior to Class 1 traffic. This is due to the longer service time of Class 1 traffic. Finally we look at the 5-node networks in Figure 1(c). We again see the Class 1 and 2 traffic in the dual queue network receiving the best service. The results are virtually identical to those found in the 4-node network. The increase in capacity and nodes has little effect on the 2-node routes. However there is a distinct advantage in the dual queue network. All six class and route combinations in the dual queue are superior in delay times

682

to the three FIFO combinations. Class 1 and 2 dual queue traffic experience the best delay times, whilst the Class 1 F-F-F traffic is delayed the longest. Through the mid-section of the CDF, the gap is increasing between the priority and non-priority networks. Overall we can see the value to Class 1 and 2 customers in the dual queue networks. For service providers, the choice to allocate a priority network is dependent upon their willingness to sacrifice service to lower class traffic in order to provide better QoS to higher-class traffic. We conclude our delay analysis by looking at the broad delay CDF's for the networks. This comprises all classes of traffic from all routes in each network and gives a broad overview to the network behaviour. Figure 1(d) shows the overall delay. All functions have the same shape, with the exception of the F-F-F network. It guarantees the worst probability for short delays but the best in long delays. There is such a slight difference between the DQ-F-F-F and DQ-F-F networks that in overall terms the set up costs may not be worth the marginal network improvement. 3.1

Delay for Class 1 traffic in the 5-node network

In our final analysis of the networks, we analysed the behaviour of first class traffic in the 5-node network for various load levels. We varied the load by adjusting the interarrival rate for Class 1 customers, which in turn changed the network load. This was undertaken for the DQ-F-F-F-F network. Our finding was that, as the load decreases in the network, the delay time decreases for Class 1 traffic. Furthermore, as the arrival rate increases (and as the load decreases), the gap between CDF's reduces. 4

Conclusion

The implications of a MPDQ in various network scenarios have been explored and delay functions given, providing a framework for service providers interested in setting boundaries and a starting point for further mixed network analysis. Put simply, with FIFO networks, the more nodes, the lower the delay time. The results according to class show that differentiated services benefit in a network with the MPDQ present, with the 4-node network performing the best with the MPDQ. References 1. Kelton W. D., Sadowski R. P., Sadowski D. A., Simulation with Arena, 2nd ed., (McGraw-Hill, 2002) 2. Bedford A. and Zeephongesekul P., Simulation solutions of networks with prioritised traffic involving the multi-priority dual queue, Proc. IC-SEC (2002). 3. Bedford A. and Zeephongsekul P., Analysis of the Multi-Priority Dual Queue (MPDQ) witii Preemptive and Non-Preemptive Scheduling : A Simulation Analysis, Submitted for publication.

SIMULATION SOLUTIONS OF NETWORKS WITH PRIORITISED TRAFFIC INVOLVING THE MULTI-PRIORITY DUAL QUEUE ANTHONY BEDFORD AND PANLOP ZEEPHONGESKUL Department of Mathematics

and Statistics, RMIT University, Plenty Road, Bundoora Victoria, 3083, Australia E-mail: anthony.bedford® ems. rmit.edu.au

East,

In prior work we have shown that the multi-priority dual queue (MPDQ) outperforms conventional scheduling disciplines such as First In First Out in isolation. This work takes the MPDQ into a network situation, and compares the network loss with and without its presence. As the MPDQ improves traffic congestion, most notably in communication networks, it is not necessary to include it at every node, or service centre, within a network. Our aim here is to provide a framework for future researchers, network designers, and service providers to implement this analysis, which involves the investigation of simple three, four and five node multi-class networks. This is aimed as a guide for extension to larger networks. Queueing networks containing differentiated traffic, also known as multi-class networks, are complicated to solve analytically using existing queueing theory techniques. We discuss the complications of exact probability network solutions and describe how we used involved simulation models to obtain performance statistics only possible through high performance computing. The simulation models here are presented with a description of the workings of the MPDQ. To investigate if there is any improvement in loss levels within the network we compare, for each network, the inclusion of one or more MPDQ's. Each node contains a service centre, following specific service times for each traffic type. Traffic arrives and follows a shortest path algorithm to its predetermined destination.

1

Introduction

Priority schemes applied to the multi-priority dual queue (MPDQ) with non-preemptive scheduling provide superior performance over a single queue for a variety of scheduling disciplines such as First In First Out (FIFO) and Last In First Out (LIFO) [1,2]. The simple non-prioritised dual queue (DQ) was shown to provide superior delay and loss to customers over FIFO, Round Robin (in Wireless Local Area Networks (WLAN)) [4] and Deficit Round Robin schemes [3]. Some applications of the MPDQ include IP networks, LAN (Local Area Networks), WLAN and mobile communications [5]. With the DQ and MPDQ's versatility for application in communications established, our aim here is to analyse loss for differentiated classes of traffic in networks with and without a MPDQ via simulation. We investigate the proportion of traffic lost in three types of networks for the classes by route. This is undertaken firstly by defining some basic network structures of nodes (servers) with finite buffer space, in which traffic may arrive externally at any node and then venture through the network to a predetermined departure node. For each network simulation we fixed inter-arrival rates and service times so that comparisons across networks could be made. Traffic could take any feasible route through a network. In these trials, we used three classes of traffic. No discrimination is made between the three classes of traffic in terms of routing within the network. In our previous work [1,2] and in [5], this was seen as a practical number of classes in a differentiated network. All inter-arrival rates (time between arrivals) and service rates follow an exponential distribution. These rates are identical irrespective of the node type. In our performance analysis, we use a network load of 0.525.

683

684

2

The Multi-Priority Dual Queue

As described in [2], the steady-state solution to the MPDQ remains difficult to evaluate due to the complex nature of the solution process. For a dual queueing system with two classes of traffic and waiting space c\ for the primary queue and c2 for the secondary queue, the dimensions of the irreducible generator matrix A of the system are given by

Ac

c q ,c2

fi(c,+l)(c|+3c 2 + C l +2)Vi( e i +l)(c 2 2 +3c 2 + C l +2) 2

e 9^ 2

^

This matrix forms part of the linear system generating all transitional states of the queueing model that is given by nTA = 0 where n is the vector of the steady-state distribution of the continuous-time Markov chain containing the unique normalised nonnegative solution once solved. The dual queue requires exhaustive demands on computational resources due to the rapidly expanding size of A as c\ and c2 increase. It soon becomes apparent that it is impractical to solve systems with a total queueing capacity beyond five. For the dual queueing model with c\ = 4 and c2- 6, the size of A5 is 150 x 150. Adding the MPDQ to a network complicates this and precise analysis is beyond steady state computation, hence simulations are used. The MPDQ and FIFO are illustrated in Figure 1. Nq is the number in queue and c, is the capacity of queue i. If the arriving traffic meets a full primary queue, then it waits in the secondary queue. If the secondary queue is also full then the arriving traffic is lost. If a space becomes vacant in the primary queue, traffic at the head of the secondary queue moves to the tail of the primary. We employ a Highest-Class First (HCF) regime within the DQ models. This means that Class 1 (high-class) traffic jumps to the head of the line over any lower class traffic within the same queue. Service is nonpreemptive hence there is no interruption to traffic being processed.

if Nq
Queue

Figure 1, Multi Priority Dual Queue (MPDQ) and Single FIFO Queue (F).

Networks There are three network structures shown here, each used with and without the inclusion of a MPDQ. Each one, as seen in Figure 2, can be used as a blueprint for constructing larger networks, or treated as a sub-network/LAN. Using the results for these three basic structures, the analysis provides a clue as to the ideal quantity of dual queues to place in a network based on the service manager's QoS objectives. All networks where the MPDQ is included have the dual queue located at node 1. Each node in a network

685

contains only two adjacent nodes. Furthermore, traffic may depart the network or arrive at any node (with rate y), or be moved on to its destination node. The traffic waits at each node based on the queue type. As can be seen in Figure 2, some paths are identical, and we define the transit times between nodes to be the same. After preliminary analysis, we could simplify our results into sets of routes rather than individual routes. This is because our analysis showed that even though external arrivals could occur at any node, routes with the same distance in the same network could be considered equivalent for the same class of traffic. For example, a Class 1 customer travelling from node 1 to 3 was found statistically equivalent to a Class 1 customer travelling from node 3 to 1. This is also known as Burke's Law. Exhaustive analysis confirmed this trend to hold for all like route combinations with the same service queueing regime. So we end up with, at most, four types of paths in a network. For the three networks analysed, each one is evaluated with and without the MPDQ at node 1.

Figure 2. Three, four and five node (sub) networks analysed

Throughout this paper, we use DQ for the multi-priority dual queue and F for a FIFO node. The six networks we will analyse are as follows: F-F-F, F-F-F-F and F-F-F-F-F are networks with three, four and five nodes respectively that all follow a FIFO regime. The other three networks are DQ-F-F, DQ-F-F-F and DQ-F-F-F that have the dual queue at node 1 and all other nodes are FIFO. In the networks analysed all feasible routes are combined into the following sets DQ-F, F-F, F-F-F and DQ-F-F-F. We have employed a shortest path algorithm, so traffic cannot take a longer route even in circumstances of congestion. We define the primary and secondary queue in the dual queue to be of length 5 each. All other FIFO nodes have single queue length of 10. For Class 1, Class 2 and Class 3 traffic, the mean inter-arrival time, (y c ,,, where c = class and i = entry node), and service rate (jic) is 10 seconds and 4 seconds; 5 seconds and 2 seconds, and 2.5 seconds and 1 second respectively. 4

Preliminary Performance Evaluation

All simulations were undertaken using the Arena Simulator. Each simulation was run for 50000 units (seconds). From Table 1, the 3-node FIFO network (F-F-F) shows little difference in loss levels for each node, with the exception of Class 1 for node 1. The introduction of the MPDQ (DQ-F-F) shows that loss levels at the dual queue node are only slightly lower for both Class 1 and 2 traffic. Overall there is a marginal improvement on a class-wise basis for traffic in the DQ-F-F over the F-F-F. The 4-node networks are quite similar to the 3-node. Overall there is a substantial reduction in the loss of approximately 10% from the 3-node networks. The 4-node network has an increased capacity of 25% for the same amount of traffic hence we expect the loss levels to drop. The DQ-F-F-F is again marginally better than its FIFO counterpart for Class 1 traffic. For Class 2 and 3 the F-F-F-F network has lower levels of loss than the dual queue network. There is an overall slight decrease in loss from the 4 to 5-node networks. The increase in

capacity is now 20%. Class 1 again receives the lowest loss statistics in the dual queue network. Table 1. % Loss at node by Class and Network.

Networks F-F-F, DQ-F-F F-F-F-F, DQ-F-F-F F-F-F-F-F, DQ-F-F-F-F 5

Class 1 2 3 1 2 3 1 2 3

1 13.3,12.9 14.9,13.7 14.4,14.6 4.7,4.8 4.7,5.4 5.1,5.3 4.2,3.7 3.9,4.3 4.0,3.7

2 14.6,14.5 14.1,15.0 14.1,14.1 4.5,4.2 4.1,4.6 4.2,5.1 4.2,3.8 4.3,4.4 4.5,4.4

Node 3 14.1,14.2 14.8,14.7 14.4,14.6 5.2,4.8 5.1,5.2 4.9,5.1 4.2,3.5 4.2,4.3 4.3,4.0

4

5

5.1,4.8 5.7,5.1 5.3,5.1 4.0,3.5 4.2,4.0 4.1,4.2

3.9,4.2 4.1,4.4 4.0,4.5

Conclusion

In this paper we have displayed the networks used to evaluate class loss by network. Preliminary analysis suggests Class 1 traffic under a MPDQ suffers lower loss that the other classes. As the MPDQ is situated at Node 1, it can be seen that loss is increased for traffic 2 nodes from it. Further analysis is undertaken in our next paper in these proceedings. References 1. Bedford A. and Zeephongesekul A, Simulation studies of waiting time approximation for the multi priority dual queue (MPDQ) with finite waiting room and nonpreemptive scheduling. In Topics in Applied and Theoretical Mathematics and Computer Science, ed. by V. V. Kleuv and N. E. Mastorakis. (WSEAS Press, Greece, 2001) pp. 220-225. 2. Bedford A. and Zeephongesekul A, Simulation studies on the performance characteristics of multi priority dual queue (MPDQ) with finite waiting room and non-preemptive scheduling. In Topics in Applied and Theoretical Mathematics and Computer Science, ed. by V. V. Kleuv and N. E. Mastorakis. (WSEAS Press, Greece, 2001) pp. 226-231. 3. Hayes D., Rumsewicz, M. and Andrew L., Quality of service driven packet scheduling disciplines for real-time applications: looking beyond fairness. Proc. IEEE Infocom'99 1 (1999) pp. 405-412. 4. Ranasinghe, R., Andrew L., Hayes D., and Everitt, D., Scheduling disciplines for multimedia WLANs: embedded round robin and wireless dual queue, Proc. IEEE Int. Conf. Commun. (2001) pp. 1243-1248. 5. Ogawa M., Sueoka T. and Hattori T., Priority Based Wireless Packet Communication with Admission and Throughput Control, Proc. of the 51st IEEE Conf. Vehicular Technology, (2000) pp. 370-374.

I N V E R S E OF A CERTAIN B A N D TOEPLITZ M A T R I X LIM K A H JIN Department

of Mathematics and Science, Singapore Polytechnic, 500 Dover Road, Singapore 139651, e-mail:[email protected], Tel:68790377

In his paper 'Inversion of certain symmetric band matrices', Lars Rehnqvist 1 gives an algorithm for the inverse of a band Toeplitz matrix A of order n x n arising from certain statistical problems. The elements of A are atj = k — \i — j \ , if \i — j \ < k and aij = 0, if \i — j \ > k for integer k < n. In this paper a key idea of Rehnqvist is exploited to find the exact inverse of a generalization of A.

{1

k-\i-j\-\

if \i- j\k

where the non-zero elements in A are the modified Chebyshev polynomials defined recursively as So(x) = 1; S\(x) = x; Sj(x) = x • Sj_i(x) — Si-2(x), i > 2. The result confirmed Rehnqvist report that the inverse matrix is dependant on the value of k and n. The determinant of the matrix is also found and thereby proving a conjecture by E.L. Allgower2. A range of values for x that guarantees the non-singularity of the matrix is also determined.

1

Introduction

Ci{x) is the modified Chebyshev polynomials denned recursively as Co (a;) = 2; Ci(x) = x; Ci{x) = x • C;_i(a;) — C;_2(a;), I > 2. In the elements 5s Z of A~l, a means a _ 1 a n d i,j are its row and column positions and sz refers to the size of A~x. Also s in Us indicates its dimension. For brevity, Si(x) and Ci(x) are written as Si and C; respectively. Also defined are 5_i = 0, and 5_2 = —5oDefine the band symmetric tridiagonal matrix T whose first row is [x, - 1 , 0 , . . . , 0] and x is real. LetT • A = M. Then m ^ = 0 for 1 < i, j < n except for m*1'1) = mSn'n^ = Sk, fh^1^ = rh^n'n+l-^ = Sk-j-i for 2 < j < k + 1, m ( ^ ) = Sk - Sk_2 = Ck for 2 < i < n - 1 and rh{-i'i+k^ = — So for 1 < i < n — fc. For k = 4 and n = 2

I M,(2fc+l) -

\ M,(nfc+l+a)

for

(1.1)

1

E

m.nfc+l+a

U,(n-k+a)

<

•y.

So 0 -So 0 0 0 0\ Si 0 0 0 -So 0 0 0 0 c 40 d 0 0 0 -So 0 0 0 0 0 0 C4 0 0 0 -So 0 0 0 0 C4 0 0 0 -So -So 0 -So 0 0 0 C4 0 0 0 0 -So 0 0 0 C4 0 0 0 0 0 0 -So 0 0 0 C4 0 0 0 0 0 -S0 0 So Si S 4 / Si

<

k is partitioned such that 1^ ( n f c + l + a )

If M( n .fc +1+Q )

M

and

U(n.k+a)

are non-singular

then

( n .fc+i+a) c a n b e f o u n d f r o m U(n1k+a) b v t h e bordering method. U{^k+a) is expected to be sparse since it is reduced invariant under a large number of sub-l spaces. Unfc + l is first determined, then next [/,nfc+l+a is obtained from U.nfc+l for

1 < a < fc-1.

687

2

T h e inverse o f U(nk+i+a)

Unk+i c a n be found if its elements in the first k rows and columns are known. The elements w„fc+i is determined from w^n-Dfc+i recursively for n = 2, 3 , . . . and for 1 < hj •?' — ^- ^ ^ 0 an< ^ ^ s Schur complement P = G — E • U~ • F are non-singular then for 0 < 7 < /?

U3

'

2.1

Elements

~\F

V-CZ-VP"1

Uy)

U-1 +Uy1FP-1EU~1

J

in the first k rows and columns of U~k+l

For n > 1, u„fc+i = 0 for 1 < i, j < k except for C Snk/S(n+i)k " ^

i=j = 1

= < 5„fc-i/S(n+1)fc_1 t -Sj-3Sk~l/{S(n+l)k~lS(n+l)k)

The proof is by induction,

4

t = j = 2,3,...fe i = l , j = 2,3, ...fc

is zero, for 1 < i, j < k except for uk'

(2)

=

k j]

Sk-i/S2k-u 1 < i < k-1, uk ' = -(S j _ 2 S' f c _i)/(SfcS 2 f c _i) ) 1 < j < k - 1 and uCc'fc) = 1/5/.. Partition t/fc+i as in eqn(l) with /? = k + 1 and 7 = fc. Simplify eqn(l) a n d the base case of n = 1 is established: 4 + 1 = Sfc/S^fc, 2 < i < k, 4 V ? = S}-3Sk-i/(S2k-iS2k), 2 <j
=

S

(n+\)k/S(n+2)k,

^{n+l)k+l

=

5

( n + l ) f c - l / 5 ' ( n + 2 ) f c - l . 2 < l < k,

5

"(n+i)fc+i = - 'j-35'fc-i/(5' ( n + 2)A : -i5' ( n + 2)fc), 2 < j < k and elsewhere U(n+i)fc+i = 0, for 1 < i, j < k and eqn(2) is established.

2.2

Elements in the first k rows and columns of Unkl+1+a for 1 < a < k — 1

U~k+1+a is partitioned as in eqn(l) with /3 = nk + l+a ,7 = nk+1 and u„k+i+a = 0 for 1 < i,j < k except for o

i = 7= 1,2,... , a

S ( n + l ) f c S ' ( n + 2)fc-l 7 ('.i) x

i = a + 1, j = a + 1

S ( n + l)fc

nfc+l+a

—

Sj-a-sSk-1

Snk-1 i

•S(„

+ 1

z = a + l , j = 1,2, . . . , a

)t_

i = a + 1, j = a + 2, a + 3 , . . . , k i = j = a + 2, a + 3 , . . . , k

1

(3)

689 2.3

All elements of U~£+1+a, 0 < a < k - 1

Multiply each of the first k rows of U~£+1+a with Unk+\+a to obtain k difference equations. Solving these equations gives the elements in the first k rows of Unk+\+aSince Unk+i+aU~k+1+a = I, the remainding non-zero elements of U~£+1+a are obtained and 3

Kk+i+a

>ior0
—5fc+j-a_3S(T.+

l
1)fc-l-5(c+1)fc_1

Sfc-lS(„+i)fcS(n+2)/e-l ^(n-c)fS(r-(-l)|t-l

l,r
j = a + 1, r > c

^(»-r)l'S(t+l)4-l

a+ 2 < j < k, c ^ n

—5J-a-3-S'(c+l)fc-l"'5(r+l)fc-l

(4)

_(rfc+i,cfc+i) nk+l+a 0 < a < fc- 1, 0 < r , c < n l
r
'S'(r+l)-fc-l'>5(n + l - c ) f c - l Sk-l-S(n 2)k-l +

r
^(r+l)-fc-l1'S'(Ti-c)fc-l Sk-i-S(n+f)k-i

,

r = c 5 ( r + lS) f- cf c- -l l' S' ^( („* i2 +) f cl -- rl ) f c - l r — c • 5 ( rS+flc)_- flc- -Sl(' '„S 'i()r fi -cr_) f e - l +

1

S(c+l)k-l-S(n-r)k-l Sk-l-S(n +

1)k-l

+

r >c 3

r >c

•S/t_l'S(„ + 2 ) t _ l

Elements of M nfc x +1+a for 1 < a < A:

Partition Mnfc1+1+Q as in eqn(l) with (3 = nk + 1 + a and 7 = nfc + a then 4fcC+i+Ja for 0 < c < n j = l,c = 0 2<j < a a/ 1

j = a + l,c^0 a 5^ fc

a + 2<j
S(„+i)k-iS(„+i)t Sk-lSnk

+l +

aS(n+2)k-l~a

Sk+j-a~3S(c+1')k-lSk~a-2S(n Sk-lS„k + l + aS(n+2)k-l-c,S(„

Sfc-lS„fc

+

i+

i))b_1S(„+i)t

c«S(n+2)fc-l-c.

•Sj_3-a5fc_Q_2 5(c+1)fc_1 Sk-lSnk

+ l + aS(n+2)k-l~a

- (rfc+i,l) nk+\+a

0
+

Sfc-lS„lfc+l+aS(„+2)fc-l-aS(7i + 2)k-l •5(n-e)fcSfc_Q_2S(„ + 1 ) i . _ 1

m

i = a+1

l)k-l 2)k-l

S'fc_j-lS(„+1_e))c_iS(„

c =£ n i = l,l
+ +

S(„ + Sk-lSnk

+l+

l)kS(n-r)k-l aS(n+2)k-l~a

— '5fe-a-2'5(r+l)fc-l Sfc_iS'„J i + i + ( , S ( „ + 2 ) t - l _ c <

S ( r l _ c ) f c _ 1 Sfc- j _ 1 S ( n + l)fc &k- 1 Snk + l + a S(n

+

2)k - 1 - a

(5)

690 ,(i-l,j-l)

and m^k+i+a = unk+'a

^ o r *>•? —

2 anc

+ j

^\f

a

* *—

a

— ^

— S(n-r+l)k-lSk

a = k, r < c 5=1

S( n +l-c)fcS r fe_i

<5(„ + i_ r )fc5( Tl + i_c)fc

Sfc_iS( n + 1 ) f c

Sfc_i5(n+1)fc5(n+i)fc+i

Sfc_iS( n + i) f c

5fc_i5( Tl+1 j fe 5( Tl -(-i)fc+i

a = fc, r > c i = i 2 < Q < fc-1 2<j < a

— a-2'5( T l „ c ) f c

1+0,5(Tl+2)fe-l-a

'5(n,+l-T-)fc-l'S'fc-a-2'S'fc+i-a-3S'(c4-l)fc-l •5fc-l5nfc + i

^(n + l - r J f c - l ^ + lJfcSfc-j-lSfn+i-cJfc- 1

+ a5(„+2)fc-l-Q'S'(n.+2)fc-l

Sj-aSrk-iS(c+i)k-i

Sfc_i5 n fc

+

i+a5(rl+2)fc-l-aS(;n+2)fc-l

•S'j-3 5( n +i_ r )fcS( c +i)fc_i

5 , fc-iS'( n+1 ) fc 5( Tl+ 2)fc~i

a = fc 2
P t f ° r t n e following

f°rl
1
-5fc_l5'„fe +

exce

5fc_i5(n+i)fc5(n+1)fc+15(n+2)fc-l

5fc_j_iS( r i + i_ r )fc5( n .4_i_ c )fc_i Sfc_iS(7l+1)fc+1<S'(ri-t-2)fc-i

1 < a < fc-1 a +2<j
S'(T1_r+l)fc^l5fc_a-25j-3-a5(c+i)fc_i • ? f c - l 5 n f c + i + a 5 ( n + 2 ) f e - l - a ' S ' ( n + i)fc_i

S ( T l _ r + 1 j f c _ 1 5 ( r l + 1)fc5A:_J_15(n_c)fc_1 S'fc_i5'T1fc + i + c [ 5 ( T 1 , + 2 ) f c - i _ Q i S ' ( n + i)fc_i

S( Tt ,__ r+ i) fc _ 1 5fc_ a _2 5 j _ 3 - a S ( c + i ) f c _ i

5( T l „ T . + i) f c _ 1 5( T l + 1 )fe5fc_j_i5( n _ c )fc_i

5fc-l'5nfc+l+a'5( T l + 2)fc_-l_a'S'(Ti+l)fc-l

S k - l S T i f e + i + a S ( n + 2 ) f c _ l-a-S'(Ti + l ) f c - l

•S'fc-l'^fn+i)/,;-! <5(n-7-+l)fc- l'S'fc-a~2'S , j - 3 - a < 5 ( c + i ) f t _ i

-S(n-r+l)k-l'S'(n+l)fc'Sfc-j-l'S'(Ti-c)fc-l

•S'fc-l'5nfc+l+a'5'(n+2)fe-l~a'S'(n+l)fc-l

Sfc_i 1S„.fc-|_i + a S ( T l + 2 ) f c _ i _ a 5 ( n - ( - i ) f c _ i

1 < a < fc-1 j =fc+ l, r > c if c = n, j ^ k + 1

Scfc-lSfn-^fc^i 5fc_i5( n + i)fc_!

™;&£S ,C * +i forO
2 < a
Sk+j-a-3S(r+l)*-l5'(c+l)fc-l •S(n + l ) f c S ( „ + 2 ) ) ! - l S ' f c - l

Sk-a-2S(r+l)k-lSk-a-2Sk+j-a-3S(c+i)k-l S(n+1)fcSfc_l5„it

5fe_ct_2S(r+l)ib-lS'fc-j-lS(„ Sfc-lS„fc

+

+

+

i+e,S(n+2))!-l-a'S'(7i+2)fc-l

i_c)fc_1

i+aS(n+2)fc-l-cf5(n+2)li:-l

1 < a < fc j = 1 + a, r < c

S(„-c)*:S(r+1)fc_1 • S f c - l S ^ + jjj.

•5(T-+l)fc-l^fc-Q-2S(n-c)fc ' •Sfc_lS(„+1)t5„fc + i-|-c<S(„+2)fc_i_0

1 < a c

S(n-r)kS(c+l)k-l

,

1 < a < fc-1 a+2<j
•5fc-l-5(71+1)*!

S'(r+l)fc-l'S) t _ e ,_2'S(„_ c )fc Sfc_iS(rl +

— Sj-a-3S(c+i-)k-lS(r+l)k-l Sfc_iS( n + 1 )fc_iS(„ + i) f c 1

1)fcS„fc+i+aS(n+2)it-l-£>

Sj-3-a

S ( c + 1 ) f c _ i S t _ a _ 2 S(r+l)lfc-l

S( T l + 1 ))t_ 1 S( n + 1 ) f c S)b_iS„fc + i + a S'(„ + 2)fc-i-£,

•Sfc-j-iS'(„„ c )i._ 1 5fe_ C I _2S( T . + i) f e _ 1 S , (n+l)fc-lSt_ 1 S„Ji-|-l-|- a S(„ + 2)t-l-o

(6)

691

Having Mnk1+1+a, post-multiplying by T gives A 1. T is tridiagonal, each row of A is linear combinations of 3 adjacents rows of M~k+1+a. 4

D e t e r m i n a n t o f Ank+i+a

for 1 < a < k

Proposition 1 A^t

A

\ -

J S(n+l)k~lS(n+2)k-lS(n+2)k-a-lSlzt

Since

Proof.Let 5 ( p ) = n L i ^ + i + a _(1,1)

_

°nfc+l+a -

-(1,1)

:E m

' nfc+l+Q

_m

_ (1,2)

_

c

a

ifc+l+/3

p=l

a=l

p=l

and „

1 / r c

P(n+l)fc-l*(n+2)k-aJ/P(„+2)k-lO(n+2)fc

n 4 M ) =n^n*(*>)!!« n —1

l<<X
d e t ( , 4 n f c + 1 + a ) = \l\[^+aa^'l)

f o

nfc + l+a ~

if

0=1

—

^

Qfc-1 a

—a r*a —1 Q (n+l)fc-l°(n+2)fc-l°(n+2)fc-a-l

Thus part 1 of the proposition and by similar arguments part 2 is obtained. When x = 2, Sn = n + 1 and Allgower's conjecture 2 is included below det(A„ f c + 1 + Q ) - | 5

( n+ 1)fc-ifc

Non-singular values o f

if Q = 0

(8)

Ank+i+a

T h e o r e m 1 Let N e M „ ( C ) and AT = I

j so i/iaf J4 is r x r . / / rank(N)

= rank(D)=s and det(£>) ^ 0 then N is singular iff A = B • D - 1 • C. Proof: If A = BD~XC then det(iV) = det(D) det(A - BD^C) = 0. If det(N) = 0, det(D) y^ 0 then for 1 < i < r, Ni = auNr+i + ct2iNr+2 + • • • + asiNr+s and (Nx N2 ••• Nr) = ( Nr+1 Nr+2 • • • Nr+S) • K, where k ^ = ai:j for 1 < i < s and 1 < j < r. Thus ( ^ J = ( ^ J AT and A =

BD~XC.

In deriving M"1, U must b e non-singular. The proof that U is non-singular for x — 2 is by induction. Us is non-singular for 1 < s < A;.Partition Uk+i as in theorem(l) with D = Uk, then aW - BD~lC = 2 - l/(fc + 1) ^ 0 and by theorem(l) f/fc+i is non-singular. Similary Uk+i+a is non-singular for 1 < a < k. Assume Unk+i+a is non-singular for 1 < a < fc. f/(„+i)fc+i+Q is non-singular for 1 < a < k as a*1'1) - B • D'1 • C = 2 - u £ + 1 ) j t + a / 0. Similarly T is non-singular for x = 2 and t/„fc+i+ a and T are non-singular for x > 2. References 1. Rehnqvist,L.:Inversion of certain symmetric band matrices BIT 1 2 , 90-98 (1972). 2. Allogower, E.L.-.Exact Inverses of Certain Band Matrices, Numerische Mathematik 2 1 , 279-284 (1973)

TIME-SPLITTING SINE-SPECTRAL A P P R O X I M A T I O N FOR T H E N O N L I N E A R S C H R O D I N G E R EQUATIONS WEIZHU BAO Department of Computational Science National University of Singapore, Singapore 117543 E-mail: [email protected] In this note we review the time-splitting sine-spectral (TSSP) method, recently studied by the author, for nonlinear Schrodinger equations (NLS) in the semiclassical regimes, where the Planck constant e is small. The time-splitting spectral method under study is unconditionally stable, time reversible and time transverse invariant. Moreover, it conserves the position density and performs spectral accuracy for spatial derivatives and fourth-order accuracy for time derivative. Numerical tests are presented for linear, for weak/strong focusing/defocusing nonlinearities and for the Gross-Pitaevskii equation. The tests are geared towards understanding admissible meshing strategies for obtaining 'correct' physical observables in the semi-classical regimes. Furthermore, applications to Id, 2d and 3d Gross-Pitaevskii equation for Bose-Einstein condensation are presented.

1

Introduction

Many problems in quantum or solid state physics require the solution of the nonlinear Schrodinger equation with a scaled Planck constant e (0 < e < 1):

^ = -y w + vwr + f{\r\2)r, ^ ( x , * = 0)=^g(x),

t>o, *eiid, (i)

xeRf

(2)

In this equation, V = V(x) is a given real-valued electrostatic potential, / a realvalued smooth function, and ij}e = ipE(x., t) the wave function. The wave function is an auxiliary quantity used to compute the primary physical quantities (or observables) such as the position density rf and the current density Js n e (x,t) = | ^ ( M ) | 2 ,

Je(x,i)=eIm(^(x7i)VV£(x,t)).'

(3)

The general form of (1) covers many nonlinear Schrodinger equations (NLS) arising in various different applications. For example, when / = 0, (1) reduces to the linear Schrodinger equation; when V = 0, f(p) = f)e p, it is the cubic nonlinear Schrodinger equation (called the focusing NLS if /3e < 0 and defocusing NLS if f}s > 0); when V(x) = f |x| 2 with u > 0 a constant, f(p) = 5p with 6 a constant, it is called the Gross-Pitaevskii equation (GPE) 14 which is used to describe BoseEinstein condensation (BEC) i' 10 - 8 ' 5 or nonlinear optics 15 . It is well known that the equation (1) propagates oscillations of wave length e, in space and time, when e is small. The oscillatory nature of solutions of the nonlinear Schrodinger equation with small e provides severe numerical burdens. Even for stable discretization schemes (or under mesh size restrictions which guarantee stability) the oscillations may very well pollute the solution in such a way that the quadratic macroscopic quantities and other physical observables come out completely wrong unless the

692

693

spatial-temporal oscillations are fully resolved numerically, i.e., using many grid points per wave length of 0(e). In 12 , Markowich et. al. study the finite difference approximation to the Schrodinger equation with small e. Their results show that, for the best combination of the time and space discretizations, one needs the following meshing strategy constraint in order to guarantee good approximations to all (smooth) observables for e small 12 : mesh size h = o(e) and time step k = o(e). Failure to satisfy these conditions leads to wrong numerical observables. Much more restrictive conditions are needed to obtain an accurate L 2 -approximation of the wave-function itself. In 6 ' 7 , Bao et. al. study time-splitting spectral approximations to the Schrodinger equation with small e. Extensive numerical experiments suggest the following meshing strategies for obtaining the correct observables: h = 0(e) and k- independent of e for linear Schrodinger equation; h = 0(e) and k = 0(e) for defocusing nonlinearities and weak 0(e) focusing nonlinearities 2 ' 3 ' 6 ' 7 . One can find more numerical approaches for the Schrodinger equation in 9>16'13 and references therein. The note is organized as follows. In section 2 we review the fourth-order timesplitting sine-spectral method. In section 3 we report numerical results for NLS. 2

Fourth-order time-splitting sine-spectral method

In this section we review the fourth-order time-splitting sine-spectral (TSSP) method 3 for the problem (1), (2) with homogeneous periodic boundary conditions. For the simplicity of notation we shall introduce the method for the case of one space dimension (d = 1). Generalizations to d > 1 are straightforward for tensor product grids and the results remain valid without modifications. For d = 1, the problem becomes iei>l = -£-rxx s

i) (x,t

+ V(xW

= 0)=%(x),

+ f(\r\2)^,

a<x
a<x
E

ip (a,t)=ip (b,t)

t>0, = 0,

(4) t > 0.

(5)

Clearly, the Schrodinger equation is time-reversible, so we could pose equations (4), (5) for t e 11. We choose the spatial mesh size h = Ax > 0 with h = (b — a)/M for M an even positive integer, the time step k = At > 0 and let the grid points and the time step be Xj-.= a + jh,

tn := n k,

j = 0,1, • • •, M,

Let ipEj'n be the approximation of ips(xj,tn) t = tn = nk with components ipfn-

n = 0,1,2,---.

and ips,n be the solution vector at time

From time t = tn to time t = t„+i, the Schrodinger equation (4) is solved in two steps. One solves ie^t

= -JTPXX,

(6)

694

for one time step, followed by solving ial4(x,t)

+ f(\il>s(x,t)\2)ilf{x,t),

= V(x)P(x,t)

(7)

again for one time step. Equation (6) will be discretized in space by the sinespectral method and integrated in time exactly. For t £ [£n>^n+i]> the ODE (7) leaves |V>| invariant in t 7 and therefore becomes iet£i(x,t)

+ f(\r(x,tn)\2)

= V(xW(x,t)

r(x,t)

(8)

and thus can be integrated exactly. From time t = tn to t = tn+i, we combine the splitting steps via the fourth-order split-step method and obtain a fourth-order time-splitting sine-spectral method (SP4) for the Schrodinger equation (4) . The detailed method is given by ^(D

= e-t2ii;i*(V(x,)+/(ltfJ"'|

2

))/e ^=,nj

M-l

$1}

^? = E e-^^'

smQuixj-a)),

M-l

^

e-™*krf

= E

$ 3 ) s i n f a f o - a)),

j = 1,2, • • •, M - 1,

(=i ^,(5)

4) 2

=

e-i2u,3k(V(Xj)+ft,\^

\ ))/e

^(4)^

M-l

^6) = ^ ^e,n+l

=

e-fe»»*M?

$») s i n ^ ^ - a ) ) ,

e-i2«,1fc(V(xj)+/(|^

6) 2

" ( | 6 )>) /| £2 -^ S( 6 ) .

^

where ti>i = 0.33780 17979 89914 40851, w2 = 0.67560 35959 79828 81702, w3 = -0.08780 17979 89914 40851, wA = -0.85120 71979 59657 63405 and Uh the sinetransform coefficients of a complex vector U = (UQ, UI, • • •, UM) with [70 = UM = 0. Notice that the only time discretization error of SP4 is the splitting error, which is now fourth order in fc for any fixed e > 0. For the stability of the time-splitting spectral approximations SP4, we have the following lemma, which shows that the total charge is conserved. L e m m a 2.1 The time-splitting spectral scheme (SP4) (9) is unconditionally stable. In fact, for every mesh size h > 0 and time step k > 0, W'n\\p 3

= W'°\\p

= \M\\P,

n=l,2,-~.

(10)

Numerical examples

Here we consider an example of Id Gross-Pitaevskii equation, i.e. in (1) we choose d = 1, s — 1 and V{x) = x2/2, f(p) = Sp. The initial condition is taken as 1

A0{x) = -Y^e-X

2/

I2,

S0(x) = 0 ,

x

ell.

695 Table 1. The error ||^ e (t) - i>*'h'k(t)\\p at t = 2.0 with h time step

fc-

-i-

-

20

K

fc- J-

fc-

-i-

— 40

~

80

K

K

K

— 160

fc- -*K

— 320

±. fcK

-i-

— 640

5 = 10.0

1.261E-4

8.834E-6

5.712E-7

3.602E-8

2.254E-9

1.422E-10

6 = 20.0

6.039E-4

4.293E-5

2.800E-6

1.771E-7

1.110E-8

6.929E-10

6 = 40.0

3.755E-3

2.250E-4

1.482E-5

9.424E-7

5.915E-8

3.696E-9

We solve on the interval [—16,16], i.e. a = —16 and b = 16 with the homogeneous periodic boundary condition (5). Table 1 shows the errors HV^W — V,e'ft'fc(')lli2 at t = 2.0 with a very fine mesh of mesh size ft = ^ for different 6 and k. For more numerical experiments on various Schrodinger equation see 2>3<7'5'4. References 1. M.H. Anderson, J.R. Ensher, M.R. Matthews, C.E. Wieman, and E.A. Cornell, Science 269, 198 (1995). 2. W. Bao, Fourth-order TSSP method for the nonlinear Schrodinger equation and application to Bose-Einstein condensation, preprint. 3. W. Bao, Time-splitting Chebyshev-spectral approximations for (non)linear Schrodinger equation under (non)zero far-field conditions, preprint. 4. W. Bao, D. Jaksch, Numerical methods for solving damped nonlinear Schrodinger equations with a focusing nonlinearity, preprint. 5. W. Bao, D. Jaksch and P.A. Markowich, Numerical solution of the GrossPitaevskii equation for Bose-Einstein condensation, preprint. 6. W. Bao, Shi Jin and P.A. Markowich, J. Comput. Phys. 175, 487 (2002). 7. W. Bao, S. Jin and P.A. Markowich, Numerical study of time-splitting spectral discretizations of nonlinear Schrodinger equations in the semi-clasical regimes, SIAM J. Sci. Comput., submitted. 8. W. Bao and W. Tang, Ground state solution of trapped interacting BoseEinstein condensate by minimizing a functional, preprint. 9. Q. Chang, E. Jia and W. Sun, J. Comput. Phys. 148, 397 (1999). 10. M. Edwards and K. Burnett, Phys. Rev. A 51, 1382 (1995). 11. P.A. Markowich, N.J. Mauser and F. Poupaud, A Wigner J. Math. Phys. 35, 1066 (1994). 12. P.A. Markowich, P. Pietra and C. Pohl, Numer. Math. 81, 595 (1999). 13. D. Pathria and J.L. Morris, J. Comput. Phys. 87, 108 (1990). 14. L.P. Pitaevskii, Zh. Eksp. Teor. Fiz. 40, 646, 1961. (Sov. Phys. J E T P 13, 451, 1961). 15. C. Sulem and P.L. Sulem, Springer, New York, 1999. 16. T.R. Taha and M.J. Ablowitz, J. Comput. Phys. 55, 203 (1984).

Calculating Global Minimizers of a Nonconvex Energy Potential D a v i d G a o 1 &: P i n g l i n 2

Abstract The Ginzburg-Landau Equation is central to material science, which has been subjected to a substantial study during the last twenty years. Since the total potential energy associated with this equation is a nonconvex (double-well) functional, traditional direct analysis and related numerical methods for solving this nonconvex variational problem are difficult. Based on the canonical dual transformation method proposed recently in [1], an algorithm is presented for solving the nonconvex variational problem. This method provides a parameter (one component of the dual vector) which can serve as an indicator for the global minimization.

1

Primal Problem and Canonical Forms

Let t h e region of space O C I 2 occupied by t h e material be a smooth, bounded simplyconnected domain with boundary dQ. T h e configuration u : fi —> R is a real-valued function (i.e. the so-called order-parameter, which is used to denote a field whose values describe the phase of the system under consideration, see [3]). Consider a nonconvex potential energy

^W = /nYlv«|2dn + ^ ^ ( A - ^ 2 ) 2 d n - ^ U f f d n ,

(i)

in which, the double-well function

is the "coarse-grain" free energy, whose wells define the phases. fco,A > 0 and /i are material constants, g(x) is a given internal source field. Ginzburg-Landau equation in superconductivity and Oseen-Prank liquid crystal model are examples of this kind of energies. Let Uk = {u£

£ 4 ( 0 ) | V u 6 £ 2 ( n , K 2 ) , u(x) = wo Vz e dQ}

be t h e admissible space. Physically it is interested in finding solution w of t h e following primal minimization problem OP) :

P{u) = inf P ( « )

Vw 6 Uk.

'Department of Mathematics, Virginia Tech, Blacksburg, VA, 24061, USA. E-mail: [email protected] Department of mathematics, NUS, Singapore 117543. E-mail: [email protected]

2

696

(2)

697 Due to the nonconvexity of P, the traditional analytic methods and associated algorithms are very difficult. The numerical results of all direct approaches depend on the initial iteration point choosed. In order to solve this nonconvex variational problem and to clarify the phase states (i.e. the minimizers of P(u)), we need to study the canonical dual variational problem. By the canonical dual transformation method introduced in e.g. [2], the generalized finite deformation strain vector £ can be defined by £ = A(u) = (grad u , \u2 - \f

= (e, £)T.

(3)

Thus, in terms of £ = (e, £) T , the stored-energy density is the quadratic function

where

The dual variable <; = £* of £ can be defined by « = (
•

Since A is a quadratic operator, £ can be considered as a Green-type strain vector in finite deformation theory, and its dual variable c is a Kirchhoff-type stress. Since the canonical stored energy V is a quadratic function of the canonical strain £, the canonical constitutive relation <; = C£ is reversible, i.e. £ = C~l<;. Thus, the complementary stored energy Vc can be simply obtained by the traditional Legendre transformation

K°(«) = {(?) • « - v«(«)) = \>?
=^

• * + |A2-

(4)

u = u0},

(5)

By introducing an admissible configuration space Ua defined by Ua=Ua = {u€C2(U,M)\

\/ue£.2{Q,^),

the so-called extended Lagrangian L : Ua x S —> R associated with the canonical primal problem is (see [1])

L(u,S) = =

f[A(u)-
(6)

It is easy to see that for each given u £ Ua, L : S —> R is a strictly concave. However, the convexity of L : Ua —> R depends on the sign of <j. <; > 0 indicates the convexity of L in terms of u.

698

2

Primal-Dual Algorithm and Numerical Examples

Motivated from triality theory given in [1, 2] we can derive an algorithm to find a global minimizer of the nonconvex energy functional. The algorithm is involved with both primal and dual variables. In order to make problem easier, we use <x = koVu to eliminate the stress field, the modified Lagrangian can be defined by La{u,s) = J[\k0\Vu\2

+ {\u2-\)<;-±n2
dQ.

(7)

Both primal and dual variables are involved in the Lagrangian. Algorithm 1 Let a computational domain Q, a distributed external source field g(x) and a boundary source UQ be given. (1) For a given stress field <;k(x) G <S„ such that sk > 0, find the configuration u (x) such that La(uk,<;k)= inf £,(«,?*). (8) ueua (2) Let }k = 9 + koA.uk(x). Solving the algebraic equation 2 , 2 ( A + A) = /fc2

(9)

for sk{x), i = 1,2,3. Choosing the positive root 0 and let 0 is a previously given small number), then d and the associate u = uk+1(x) is the ? _ ,-fc+i j s fae global maximizer of U. global minimizer of the total potential II. Otherwise, let k = k + 1 and go back to (1). Step (1) leads to the following elliptic problem -fc 0 Au + qku = g,

u | a n = «o-

The elliptic probem can be solved by the standard finite element method with piecewise linear basis functions. Next we apply this algorithm to a example. Example 1 Consider g(x) = 0 and UQ = 2+(x—y) 2 . Take fi = 1, the initial guess ?° = 10 and the tolerance u> = 10 - 4 . The algorithm converges pretty fast. Figures 1-2 depict the triangular meshes we use for both circular and rectangular domains and the corresponding solutions of u(x,y) and s(x,y). Since s(x,y) > 0 we expect the solution u(x,y) is a global minimizer.

699

V

k

/

/

Figure 1: Triangular mesh, solution u(x,y) and s(x, y) on a circular domain

illlllgi

iv

~*y

V

y

Figure 2: Triangular mesh, solution u(x, y) and <;(x, y) on a rectangular domain Above example shows that the algorithm works pretty well in certain situation. However, we are still lack of theoretical results on what conditions we should propose to ensure the convergence of the algorithm. Our computational experience shows that the algorithm often diverges if fi is small. Sometimes we don't get positive ? in the whole domain. Nevertheless, the algorithm provides a parameter indicator which can justify that the solution we obtain is a global minimizer by the triality theory. This outstands the algorithm from other direct PDE solvers.

References [1] Gao, D.Y., Analytic solutions and triality theory for nonconvex and nonsmooth variational problems with applications, Nonlinear Analysis 42 (2000), 1161-1193. [2] Gao, D.Y., Duality Principles in Nonconvex Systems, Kluwer Academic Publishers, Netherlands 2000. [3] Gurtin, M.E., Thermomechanics of Evolving Phase Boundaries in the Plane, Oxford University Press, New York 1993.

A QR-type Method for Computing the SVD of a General Matrix Product/Quotient Delin Chu Department of Mathematics National University of Singapore 2 Science Drive 2, Singapore 117543. Email: [email protected] October 2 1 , 2002

Abstract In this paper, a QR-type reduction technique is developed for the computation of the SVD of a general matrix product/quotient A — A*1 A% • • • A"^ with A{ e R n x n and Sj = 1 or s, = —1. First the matrix A is reduced by at most m QR-factorizations to the form Qjj'(Qjj') - 1 , where Qn , Q21' e R.n*™ and (Q(111))TQii) + (Q&'rQw = 7 - T h e n t h e S V D o f A i s obtained by computing the CSD (CosineSine Decomposition) of Q\i and Q21 using the Matlab command gsvd. The performance of the proposed method is verified by some numerical examples.

1 Introduction This paper deals with a new method for the computation of the Singular Value Decomposition (SVD) of a sequence of matrices in product/quotient form. The simplest forms of these Generalized SVD's (GSVD), for two matrices, are the well-known Quotient SVD (QSVD) and Product SVD (PSVD). One of the three possible forms involving three matrices, is the so-called Restricted SVD (RSVD). The GSVD is one of the essential numerical linear algebraic tools in signal processing and identification. Possible applications include source separation, stochastic realization, generalized Gauss-Markov estimation problems, generalized total linear least squares, open and closed loop balancing, etc. Like the QSVD, PSVD and RSVD, the SVD of a general matrix product/quotient has many applications. For example, it is important for the estimation of Lyapunov exponents for dynamic systems. Consider finite difference equations &k+1 = * i » e t l 9o = / , * * € R n x n , sk = 1 or sk = - 1 .

700

(1)

701 The ith Lyapunov exponent is then defined by Xi =

\imk^too\og(ai(Qk))/k,

where cr,(6jt) is the ith biggest singular value of 0^. Discretizations of ordinary differential equations may also lead to sequences of matrix products/quotients. In this paper, we propose a QR-type reduction technique for computing the SVD of a general matrix product/quotient /i — yi1 / i 2

sim

with Ai £ R nx ™, Sj = 1 or Sj = —1. We will show that, if not all Sj are the same, the matrix A can be reduced b y m - 1 QR-factorizations to the form Q^iQ^)'1 with Q^, Q$> e R " x " , (Q$)TQ$ + T = a s are (Q21) *?2l I'i ^ " i equal, then we need TO QR-factorizations. The main advantage of this QR-type reduction is the way in which quotients are dealt with. Finally the SVD of A can be obtained by resorting, e.g., to Van Loan's CSD method .

2

A QR-Type M e t h o d

Consider a matrix A of the following form: • ASI A32 • m > 2, • -^i ^ 2 . Ai is nonsingular if st = — 1.

A 6 R " x n , Si = 1 or s{

(2)

Assume for simplicity that the matrices Ai in (2) are square. The method we develop in this paper is as follows: Algorithm 1 Input: Matrix A of the form (2). Q${Q21l)-1. Output: Matrices Q$,Q$ € R n x n such that (Q$)TQ$ + {QZIVQTL ••I andA = Init: If all Si = 1, set Am+i := I, sm+i := —1;TO:= m + 1, If all Si = - 1 , set Am+i := Am,Am := I,sm = l,sm+i := - l ; m := m + 1, If —«! = . . . = —Sj = Sj+j = ... = sm = 1, apply procedure to AT. Determine maximal j such that SJ V21

—

1 and

-1.

Sj+i

Set sm+i

:= - 1 ,

A

3+l-

Loop: for i = j , j - 1 , . . . , 1, do: A

• Case Si = 1 and s;_i = 1. Compute the QR factorization of

iQll V21

AiQn r>('+1) W21

Q2i

Q22

0

, *w,Qi?,Q$,Q£,< >£

Case st = 1 and Sj_i = —1. Compute the QR factorization of

^Q^1} V21

Q2\+1)

Qn

Q12

Q21

^2

0

-Sj+2,

Q

O+i) = J,

702

Case Si = - 1 and s;_i = 1. Compute the QR factorization

^ Q <6 '1 +1 1 )

of

n («+l)

V21 r

.T_I«-I-I1

n

^ST'

r

„r«l

V21

„ M

oS

1

r

„

0

1

«,(«) & «r>W & «nW §',«&

^

6R"X".

i!22

- 1 . Compute the QR factorization

Case s, = — 1 and s,.

of

n (i+l)

V21

AJQ^

3i?

V21

Q21

Bnd loop. 5etQW+i):=Q(i))Q««)

= < $ .Loop: /or

Qlr ' ^r*

Q21

TJ"

i = j + 2, j + 3 , . . . , m do:

Case Sj = 1 and s;+i = 1. Compute the QR factorization

1

r>W r>M «(') n(') c

R(>)

0

Q22

°1

of

AJQV]

0

Q22

Case Sj = 1 and Sj+i = — 1. Compute the QR factorization

of

sir1'

AJQ^]

AM-V

Q«

0

QS

(i) Q£ o

, it

, Q j j , Q12, Q211Q22 ^ " •

V22

Oase Sj

-1 and Sj+i = 1. Compute the QR factorization 0

V12 AiQ 21 Case ;

-1 and Sj + i :

of

, .R w , Q n , Q 1 2 , Q211022

e

R-'

Q22 J 1. Compute the QR factorization

of j4j
AiQ 21

QS «$

R® Q22

J

0

,

Rlii,Q$,Q$,Q$,Q$eRn*».

End loop. In this algorithm, we first determine a value j such that Sj-i = 1 and Sj = - 1 . Prom there, we work further to the left and subsequently to the right, as explained above (note that in our implementation so allows to take into account the type of operation required for i = j + 2). If Sj_i = 1 = — Sj does not apply, but instead we have Sj = 1 and Sj-\ = —1, then we can work with AT instead of A. In these cases, we need only m — 1 QR-factorizations. Only if si = S2 = • • • = sm, we have to plug in an artificial I and the method requires m QR-factorizations. In Algorithm 1, the explicit computation of AJ1 and explicit solution of the corresponding triangular linear system are avoided if Sj = — 1.

703 After reducing A to Q n ( Q 2 i ) 1 by Algorithm 1, we can compute the SVD of A by computing the CSD of Q!Q and Q 2 i by the Matlab command gsvd. In the following we explain that the computations involved in Algorithm 1 can be posed as left and right orthogonal transformations of a large matrix whose sub-blocks are the Aj or their transposes, several unit matrices, and the rest being zero matrices. For simplicity, we assume without loss of generality that j = m - 1 in Algorithm 1, i.e., sm = —1 and s m - i = 1. Define Am—1 -Am

Mm-l For i = 1,

» = i, Q21

, m — 1.

Q22 J

,m-2,

AT

M{:=

0 In

0

0 In

Ai

M{:=

, if Si — 1 and Sj_i = 1, or at — 1 and Sj_i = —1,

0

if S{ = —1 and Sj_i = 1, or Si = —1 and s,_i = —1.

Set Qm-l

0

0 0

V:

•• 0

e

0

771—1

if m is odd,

0 QT2

In 0

0 Qm-2

0 0

0 0

0 0

0 0

'•• 0

0 Qi

M =

if m is odd,

F :=

0

AC-3

0 ^m-4

0 0

0 0

0

T

Af

U :=

M m_x 0

Mm-2 Xm_3

0

0

0

0

0 A4 m _4

if m is even,

0

'••

0

0

0

Q[J

In 0

0 Qm-2

0 0

0 0

0 0

0 0

'• 0

0 e2

if m is even,

if m is odd, Ml

Mi

0 0

0 0 if m is even.

Ml 0

Ml A4?" J

Then U and V are orthogonal matrices, and UMV = R,

(3)

where

fl =

•Rm-l 0 o 0

Km-2 1lm-3 o 0

0 Km-4

0 0

0 0

'•• 0

'•• %

o K,

if m is odd,

704

ftm_! 0

ftm_2 TCm_3

0 7J m _ 4

0 0

0 0

R

if m is even, 0

0 R(m-1)

"R-m-l =

0

%! 0 fl(m-l)

or Tlm-i =

0

%i 6 R n x " , i = 1, • • •, m — 2, are of one of the following forms ' flW * " 0 *

' B«

'

*

0'

*

*

* '

0

flW

1

' * •

0 flW

flW 6 R7**™ (j = 1, • • • , m — 1) are nonsingular. Let X denote the estimate of X computed with finite precision arithmetic, as opposed to exact arithmetic, and let e denote the machine precision. From (3), we have [?] \\UTU - hn\\ ~ £, \\VTV-hn\\~t,

\\UMV-R\\&e\\M\\

(4)

Hence, algorithm 1 is backward stable in the sense that (4) holds.

3

Conclusions

In this paper, we have studied the computation of the SVD of a general matrix product/quotient sequence. First we reduced the sequence by at most m QR-factorizations to the form Qn {Q\i) -l , with Q ^ . Q ^ e T (i) ii + T i.V2i) R « x " a n d ( Q ii W )f QvW ( Q ^ ) >^2i Q ^— - I- Then we obtain the SVD of A by computing the CSD of Q\{ and Q21' -.a) using the Matlab command gsvd. An advantage of our QR-type reduction is its flexibility for adding one more matrix from left or right to the matrix A of a matrix product/quotient, this feature is very useful for the applications like the estimation of Lyapunov exponents of dynamic systems. Some numerical examples were given to show the performance of the presented method.

NEWTON'S METHOD FOR NON-DIFFERENTIABLE EQUATIONS: CONVERGENCE A N D APPLICATIONS

D E F E N G SUN Department

of Mathematics and Center for Industrial Mathematics, National of Singapore, Singapore 117543, Republic of Singapore. Email:matsundf@nus. edu.sg

University

Newton's method has been proved to be the most effective approach for solving nonlinear systems of equations. While the convergence properties of Newton's method for differentiable equations have been well understood for a long time, its behavior for solving non-differentiable equations has only been discovered successfully quite recently. In this talk, we first present the recent advances in convergence analysis of Newton's method for solving non-differentiable equations and then briefly introduce its applications in fields of optimization, variational inequalities, best interpolation, inverse eigenvalue, optimal control and computational mechanics.

1

Introduction

Suppose that F : $tn —> 3ftn is a locally Lipschitz continuous functions, i.e., \\F(y)-F(x)\\ 0 and x, y in a bounded set. We are interested in finding a solution of the following (nonlinear) system of equations F(x)

= 0.

(1)

When F is continuously differentiable (smooth), the most effective approach for solving (1) is probably Newton's method. For example, in 1987, S. Smale 4 wrote "If any algorithm has proved itself for the problem of nonlinear systems, it is Newton's method and its many modifications. ... Thus a relation between the simplex method of linear programming and Newton's method, is no surprise. ... " The most attractive feature of Newton's method for solving smooth systems is its quadratic convergence when the initial point is sufficiently close to the solution. However, in applications in fields of optimization, best interpolation, computational mechanics, and many other fields, it is often found that F is not smooth everywhere. Hence, Newton's method is no longer valid for solving (1). There are counter examples in the literature proving the non-convergence of Newton's method when F is not smooth. In this paper, we will introduce a smoothing Newton method for solving non-differentiable equations and analyze its rate of convergence. 2

S e m i s m o o t h Functions a n d S m o o t h i n g Functions

For a locally Lipschitz continuous function F, by Rademacher's Theorem, we know that F is differentiable almost everywhere. So, Clarke's generalized Jacobian is well defined 1 : dF(x) = conv{lim F'(y),y

705

—» x,y €

Dp}.

706 Here Dp denotes the set where F is differentiable and conv^l denotes the convex hull of a set A. For example, for F(x) = max{0, x}, x £ 5R, we have 0F(O) = [O,1]. A locally Lipschitz continuously function F : 5ftn —»5ftn is semisinootn 3 at x if lim

Vh'

(2)

Vg8F(i+th')

U0 n

exists for every nonzero h € R . F is semismooth at x implies that F is directionally differentiable at x. Another equivalent definition is that F is said to be semismooth at a; if F is directionally differentiable at x and for any h —> 0 and V € dF(x + h), F{x + h)-F{x)-Vh F is said to be strongly semismooth

= o(\\h\\).

(3)

at x if F is semismooth at x and = 0(\\h\\2).

F(x + h)-F(x)-Vh

(4)

One may use the definition to check that the following two functions are strongly semismooth: F(a,b) = Va2 + b2 , (a,b) 6 K2 and F ( e , a, b) = Ve2 + a2 + b2 , (e, a, 6) € 5R3 . A function G : K x 5R" —> K n is called a smoothing function of a nonsmooth function F : 5ft™ —> JJ™ if G is continuously differentiable on [ft x 3i n except 0 x Sftn and for any x € Jft™, lim

G(e,y) = F ( i ) .

(5)

elO.y—>x

In general, the existence of smoothing function G is proved in Sun and Qi 5 by using Steklov's averaged function. In practice, easily computed smoothing functions can be constructed. For example, let F(t) = max{0,t},t € 5ft. Then the defined function

G(e,t) = ±(t+Vt2 + e2)

(6)

is a smoothing function of F . 3

A Smoothing Newton Method

Suppose that G is a smoothing function of F . Let E : 5ft x 5ftn —> 5ft x 5ft™ be defined by E(e,x)

:=

£

G(s,x)

707 Then, F(x) = 0<—>E(e,x) = 0, which implies that solving a nonsmooth system of equations is equivalent to solving a smoothing (nonsmooth) system of equations. Before we introduce the smoothing Newton method, we need the following assumptions. A s s u m p t i o n 1: (i) G is a smoothing function of F. (ii) For any e > 0 and x £ 5ftn, G'x(e,x) is nonsingular. A s s u m p t i o n 2: G is semismooth at (0, x*), where x* is a solution. A s s u m p t i o n 3: G is strongly semismooth at (0,x*). Choose s e 5ft++ and 7 G (0,1) such that -ye < 1. Let z := (e,0) € 5ft x 5Rn. Define the merit function -0 : 5Rn+1 —» 5R+ by

V(*) := ||2?(*) II2 and define f3 : 5ftn+1 -> 5K+ by /?(z) : = 7 m i n { l , V ( z ) } . Let n •= {z : = {£,x) e 3* x K n | e > /3(z)e}. Then, because for any 2 £ 5Rn+1, /3(z) < 7 < 1, it follows that for any x 6 5Rn,

(e, x) e n. A S m o o t h i n g N e w t o n M e t h o d : [Qi, Sun and Zhou 2 ] S t e p 0. Choose constants 5 £ (0,1) and a 6 (0,1/2). Let e° := e, x° € » n be an arbitrary point and k := 0. S t e p 1. If E{zk)

= 0 then stop. Otherwise, let pk := /3(zk).

S t e p 2. Compute Azk := ( A e \ Ax fc ) e » x 5ftn by E(z fc ) + £;'(z fc )Az fc =

ftf.

(7)

S t e p 3. Let Zfc be the smallest nonnegative integer I satisfying iP{zk + SlAzk) Define z

fc+1

fc

< [1 - 2
(8)

fc

:= z + #*Az .

S t e p 4. Replace k by fc + 1 and go to Step 1. T h e o r e m 1. Suppose that Assumption 1 is satisfied. Then an infinite sequence {zk} is generated by the above algorithm with lim 4){zk) = 0 k—*oo

708 and each accumulation point z of {zk} is a solution of E{z) = 0. Moreover, suppose that Assumption 2 is satisfied and that z* := (0,x*) is an accumulation point of the infinite sequence {zk} generated. If dE(z') are nonsingular, then the whole sequence {zk} converges to z*, \\zk+l-z*\\

o(\\zk-z*\\)

=

and ek+1 =

o(sk).

Furthermore, if Assumption 3 is satisfied, then \\zk+l 4

- z* || = 0{\\zk

- z' ||2)

and

ek+1 =

0(ek)2.

Conclusion

In this paper we introduce a smoothing Newton method of quadratic convergence for solving the nonsmooth system of equations. Its applications in fields of optimization, variational inequalities and inverse eigenvalue problems have been well studied (http://www.math.nus.edu.sg/~matsundf). The smoothing newton method for solving nonsmooth equations arising from best interpolation, optimal control, computational mechanics and other fields are being investigated. New discoveries will be posted in the above webpage. References 1. F. H. Clarke, Optimization and Nonsmooth Analysis (Wiley, New York, 1983). 2. L. Qi, D. Sun, and G. Zhou, A new look at smoothing Newton methods for nonlinear complementarity problems and box constrained variational inequality problems, Math. Program., 87 (2000), 1-35. 3. L. Qi and J. Sun, A nonsmooth version of Newton's method, Math. Program., 58 (1993), 353-367. 4. S. Smale, Algorithms for solving equations, Proceeding of International Congress of Mathematicians, Edited by Gleason, A. M., American Mathematics Society, Providence, Rhode Island, 1987, pp.172-195. 5. D. Sun and L. Qi, Solving variational inequality problems via smoothingnonsmooth reformulations, J. Comput. Appl. Math., 129 (2001), 37-62.

Numerical Solution of Blow-Up Problems Using Mesh-Dependent Variable Temporal Steps K.W. LIANG, P. LIN and R.C.E. TAN Department of mathematics, National University of Singapore

1

Introduction

The mesh adaptive methods have played important roles in solving the parabolic PDEs, whose solutions may develop singularities in a finite time X. All adaptive meshes, however, are sets of some regular discretization nodes including the property that spatial nodes have the same temporal increments among time-levels [2, 3, 4, 5, 6]. Due to the blow-up of the solution, all the adaptive methods have to be stopped as soon as t approaches T. The reason is that the numerical solution is unstable after T, which is caused by shock, since the solutions (or the derivatives of the solutions) become very sensitive and blow-up with respect to time. In this circumstance, we present a new method to automatically generate an adaptive irregular mesh, which can overcome the above limitation. Our new method will focus on the classical quenching type partial differential equation, ut = uyy +

9,

0 < y < a, 0 < t < T,

u(y,0) = u o ,w(0,t) = u(a,t) = 0,

ye (0,a),i 6 (0,T)

where 9 > 0 and 0 < uo < 1- As figure 1.1 showed, our adaptive irregular mesh is achieved through letting the spatial nodes at same time-level have different temporal increments based on ut. If the variation of Ut at node A is smaller than it at node B, then the temporal increment T& is bigger than TB- As a result, the line will be replaced by a curve at a time-level. Repeatedly, the next temporal increment TA* and TB< are determined by the variations of the function ut similarly. In practice, since the interpolating process is very complex in the constructure of the time curve, we will replace the curve by multi line. The computation of the solution of (1) is important for several reasons. First, although very simple, the problem (1) provides a fundamental combustion model to many quenching processes. And the quenching behavior in this problem is typical singularities (blow-up) of a wide class PDEs modeling many important physical phenomena. Second, quenching problems have been widely analyzed and it is well known about the property of solution when t approaches X, thus it is a excellent problem for testing the performance and verifying the efficiency of our new method. Third, the middle location arrives quenching firstly, which is the only proved character about the quenching problem [1], while the post-quenching behaviors of solution are unknown till now, and then using our new method can lend insight into the post-quenching characters.

2

Discretizations and The Difference Scheme

Note that for y/a = x, problem (1) can be conveniently reformulated into the following form ut = ^uxx + ^, az (1 - u)a u(x,0) = uQ,u(0,t) =u(l,t)

0<x
x € (0,1),t € (0,X).

It assumed that 0 = xo < • • • < XN = 1 are the spatial nodes on [0,1] and hi = Xi+i — x*. And we denote tj^ as the j-th discrete time step at node xt and TJJ as the j-th temporal increment at node Xj, where tj+i,t = tj,i + Tji. With our new method of mesh generation, the temporal coordinates and temporal increments are determined by ut as mentioned in section

709

710 1. And we adopt the arc-length monitor function on ut to determined r,,; with To, T\ and T2 given [5], T^ = Tj_xti

+ {(ut)j-2,i

~ ((«t)j-l,i - («t)j-2,t)2 ,

~ {Ut)j-3,if

(3)

where indices i, 0
= h—

k—+2

+

«0 Qx2

(4)

Together with the three neighboring points of (ZOJ*O) as figure 2.1 showed, we immediately have the following equation

-h 0 h

h ^ 1 r 02 l2 2 J

u(xi,ti)

our, L

=

dx2 -J

-u0

u(x2,t2) -u0 u(x3,t3) -u0 _

(5)

(X2,t2) (Xl,ti)

Fig. 1.1 Since the 3 x 3 matrix in the equation (5) is non-singular, then we get the implicit difference scheme

fc+MaW

«£i+(i+

2r,-,i

„J'+i

:i£ + •

6j.f i,ja 2 /i 2

°j+l , i ( l " ^ )

where «:? = «(£;,!,'), 6j+i,j = 1 + ——-— 2

3+ 2

'l+

~ tj+l,i\'j+l,i+l

=

tj+l,i+l

—

(6)

and / j + i ] i _ 1 ( ^ + l i , + i ) is the tempo-

ral spacing between mesh points (x;_i,£j+i) and (xi,tj+\)((xi+i,tj+i) •j + l . i - l = tj + l,i-l

S

and (XJ,£,- + I)), i.e.,

ij' + l.i)-

Remark 3.1 For square regular mesh, the irregular difference scheme (6) reduce to the usual implicit finite difference formulae with 6 = 1 .

3

Convergence and Stability Analysis

We assume that -^4% and ^jy are continuous in [0,1] x [0,X] and the following conditions are satisfied I = uiax\ljti\ < Ch and 6j-+i,i > 1 (7) Considering u is the exact solution of (2) and using Taylor's series expansion, we have the local truncation error of (6)

\EJ+1\
(8)

711 where C, K\ and K2 are positive constants and At = max|T,;|. Let Uf is the numerical solution of the implicit difference scheme (6) and e{ = u\ - Jj{. If we let e>+l = max* \e{+11, then we obtain e^ < (1 + K3AtYe0 of ,

H

^ E, where K3 is the maximum magnitude K, .(f_ u)9+ i • Since e° = 0 and (8), we have e?' —> 0 as h and At —> 0.

Theorem 3.1 The implicit difference scheme (6) is stable if &,-,< > 1 for all i and j . Remark 3.2 The following restrictions are necessary to guarantee the condition bjti > 1 in (7). If lj,i-i * lj,i+i < 0 then lj,i-\ + lj,i+i must be nonnegative. In practical computation, we require a extra procedure to obtain the function values of ut in (3) for improving the accuracy of solution. Taking the time derivative of equation (2), we have d 1 d2 6 with the initial-boundary conditions ut(x,0)

= l,ut(0,t)

= ut(l,t)

= 0 , i 6 (0,1), te (0,T).

As for (6), (9) can similarly be approximated by the implicit difference scheme 2r,,i

Tj,<

b]+h,aW «Cl+ U + bj+hia2h2

j+i

J+1 Oj + l ,iO?h?

^ „ '

= ,r? •

1

°3+l

, i ( i - «3r+)\'9

«f (10)

where vj = ut(xi,tj). Associated with (6), we can solve a system of equations on u and u ( without extra computational cost, since (6) and (10) have the same triangular coefficient matrix.

4

Numerical Experiments

We apply the new method in sections 2 to solve the problem (2) with the only case of 0 = 1 since other cases involving 9 > 0 are similar. Without the loss of any generality, the initial value uo is set to be zero. The spatial mesh step size h varies from 0.1 to 0.01, while the initial temporal step size is chosen to be 0.01 — 0.001. We observe that, in figure 4.3, the function ut(xi, t) at Xi = 0.4(0.6) grows rapidly and the peak of it exceed 15, the same as it is at Xi = 0.5. It implies that the solution u at X{ = 0.4(0.6) quenches following the node x; = 0.5. Although the values of the solution u at other spatial nodes are below 1, or far away from 1 at some nodes, the derivative functions ut at related spatial nodes have been increasing in figure 4.2. Especially, the phenomenon of the rapid increase of ii ( is obviously at x = 0.3 and the value of Ut has reached 3.2594. So we can conclude that the solution of problem (2) quenches finally for the whole spatial domain except two boundary nodes. The contour maps in figure 4.5 are also indicated the conclusion, since the contours change to flatness while t increases. Table 4.1 Quenching time T(a) and maximal temporal coordinates maxtiJ i i,j

ft = 0.1 max tj:i Ta a- 2 a= n a = 10 a = 25

2.5826 1.5449 0.9162 0.8503

1.8029 0.7893 0.5275 0.5024

'

h = 0.0.5 max i, i Ta

h = 0.02 maxi, i Ta

h = 0.01 maxi,' i T

1.4757 0.9274 0.8208 0.6677

0.9815 0.7669 0.6027 0.6064

0.8638 0.6049 0.5599 0.5531

1.1010 0.6005 0.5112 0.5014

±

i,3

0.8469 0.5713 0.5010 0.5010

a

0.7890 0.5400 0.5008 0.5004

Figure 4.4 displays the curve of spatial nodes at one time-level immediately before the quenching. It is observed the nodes near the boundaries are further from the mid location with respect to t. In fact, the nodes but the middle point can be extended to a further temporal

712

domains for enlarging the maximal temporal spacing. So that we can well study the postquenching behavior of solution. In table 4.1, we list newly computed quenching time Ta and the maximal temporal coordinates max £,; for various given values of a and h. We also note that enlarging the temporal spacing leads to the delay of the quenching time, i.e., it reduces the accuracy of the quenching time. The reason is that the error of approximation of finite difference scheme depends not only on the derivatives of solution and the sizes of spatial and temporal steps, but also on the spacing and the shape of mesh's cells. The shape of cells change to narrow and the angles between mesh lines become small if the temporal spacing is enlarged. On other hand, we can improve the accuracy of the numerical solution through decreasing the spatial step size h and the initial temporal step size To- Furthermore, the delay of quenching time do not affect the conclusion of quenching characters of the solution.

References [1] H. Kawarada, On solutions of initial-boundary problem for ut — uxx + 1/(1 — u), Pul. Res. Inst. Math. Sci. 10(1975), 729-736. [2] Q. Sheng and A. Q. M. Khaliq, A compound adaptive approach to degenerate nonlinear quenching problems, Numer. Meth. for PDEs, 15(1999) 29-47. [3] Q. Sheng and H. Cheng, A moving mesh approach to the numerical solution of nonlinear degenerate quenching problems, Dynamic Sys. Appl., (to appear). [4] Q. Sheng and H. Cheng, An adaptive grid method for degenerate semilinear quenching problems, Computers and Mathematics with Applications, 39(2000), 57-71. [5] H. Cheng, P. Lin, Q. Sheng and R. C. E. Tan, Solving degenerate reaction-diffusion equations via adaptive Peaceman-Rachford splitting, submitted. [6] Q. Sheng, A monotonicaJJy convergent adaptive method for nonlinear combustion problems, Integral Methods in Science & Engineering (Research Notes in Math., 418), Chapman & Hall/CRC, London and New York, (2000), 310-315.

N O N L I N E A R B O U N D A R Y L A Y E R S OF T H E B O L T Z M A N N E Q U A T I O N SEIJI UKAI, TONG YANG, AND SHIH-HSIEN YU ABSTRACT. We will summarize our recent study on the existence theory on half-space boundary value problem of the nonlinear Boltzmann equation of a hard sphere gas, assigning a Dirichlet data for incoming particles at the boundary and a Maxwellian as the far field, [15]. It shows that the solvability of the problem changes with the Mach number Jit00 associated to the far Maxwellian: If ^ ° ° < —1, there exists a unique smooth solution connecting the Dirichlet data and the far Maxwellian for any Dirichlet data sufficiently close to the far Maxwellian, while, otherwise, such solutions exist only for Dirichlet data satisfying certain admissible conditions and the set of admissible Dirichlet data forms a smooth manifold of codimension 1 for the case — 1 < JC^ < 0, 4 for 0 < M00 < 1 and 5 for JC" > 1, respectively. Then we will discuss the stability of boundary layer solutions for the case when ^ ° ° < — 1.

1. I N T R O D U C T I O N AND M A I N R E S U L T

The Dirichlet problem of the nonlinear Boltzmann equation in the half-space arises in the analysis of the kinetic boundary layer, the condensation-evaporation problem and other problems related to the kinetic behavior of the gas near the wall, [5], [12]. The main concern is to find a solution which tends to an assigned Maxwellian at infinity. An interesting feature of this problem is that not all Dirichlet data are admissible and the number of admissible conditions changes with the far Maxwellian. This has been shown for the linear case by many authors [3],[6],[7],[9], mainly in the context of the classical Milne and Kramers problems. Recently, a nonlinear admissible condition was derived for the discrete velocity model in [14] and the stability of steady solutions was proven in [11]. The full nonlinear problem was solved on the existence of solutions in [8] for the case of the specular reflection boundary condition, whose proof, however, does not work for the Dirichlet boundary condition, whereas in [2], the Dirichlet case has been solved but with the ambiguity that the far Maxwellian cannot be fixed a priori, in addition to some non-physical truncation assumption. We will establish the admissible conditions for each far Maxwellian. Our proof provides also a new aspect of the linear problem. It should be mentioned that K. Aoki, Y. Sone and their group, ([1], [12], [13] and references therein), made an extensive numerical computation on the same nonlinear problem. Our result gives a partial explanation of their numerical results. In [15], we study the existence theory stationary solutions in a half-space x > 0 in which the spatial dependence of the mass density F of gas particles is assumed constant on each plane parallel to the boundary x = 0 but the velocity dependence is fully 3-dimensional, that is, F is assumed to be a function of position x and particle velocity £ = (£1,62, £3) £ R3- Let fi stand for the velocity component along the rr-axis. Then, our problem is,

r fi^x = (1-1)

< F\x=a [ F

oo.eei 3 ,

Q(F,F),

= F0(O, ->• Moc(0

(x->°c),

€1 > 0 , ( 6 , 6 ) e K 2 , (el3.

Here, Q, the collision operator, is a bilinear integral operator (1.2)

Q(F, G)=

[

(F(?)G(£)

- F(OG(t,))

713

q(( - £„ u) d£.dw,

714 with

(i-3)

r = f-[tt-e.)-w]w.

e = f. + [(f-eo-w]w,

where "•" is the inner product of R 3 . In this paper, we restrict ourselves to the hard sphere gas for which the collision kernel q is given by ?(C,w) =

a0\(-u\,

where <7o is the surface area of the hard sphere. Here we shall recall two classical properties of Q which will be needed later. See [4], [5] for details. ( Q l ) Q(F) = 0 if and only if

d-4)

F

=

P

M

„

(

If-"12*

^T^^j^wM-^r-) (27rT) / 3 2

for any constants p , T > 0 and u = (u^, 112,11$) € R 3 . This is a Maxwellian and is the distribution function of a gas in the equilibrium state with the mass density p, flow velocity u and temperature T. (Q2) A function <j){£) is called a collision invariant of Q if <0,<2(F)) = O

for all F,

{,) being the inner product of L 2 (R?). Q has five collision invariants (1.5)

0o = l ,

& = &(t = l,2,3),

4>4 = |f|2,

which indicate the conservations of mass, momentum and energy in the course of the binary collision of particles. The second equation in (1.1) is the Dirichlet boundary condition. The Dirichlet data Fo(£) can be assigned only for incoming particles (£1 > 0), because assigning the outgoing ones (£ : < 0) makes the problem ill-posed as is seen from the a priori estimate given in the next section. This corresponds to the physical situation that only the incoming distribution can be controlled on the wall. The distribution M^ in the third equation of (1.1) is the boundary data at x = 00. It follows from the property (Ql) that (1.1) does never have a solution unless M^ is a Maxwellian. Thus, (1-6)

Moo(0

=

M[poo,uoo,Too}(0,

where the constants px > OjUoo = (2*00,1, "00,2, "00,3) 6 R 3 , and Tx > 0 are the only quantities that we can control. By a shift of the variable £2, ?3, we can assume without loss of generality that "00,2 = ^00,3 = 0, and then, the sound speed and Mach number in the far field are given by

(1-7)

coo = J%, V o

^

= ^Si, Coo

respectively, see [5]. Note that the flow at infinity is incoming (resp. outgoing) if ^#°° < 0 (resp. > 0) and supersonic (resp. subsonic) if \*df°°\ > 1 (resp. < 1). The Mach number ^ # ° c provides significant changes on the solvability of (1.1). Indeed, since the "Dirichlet data" M oc (^) is imposed both for incoming and outgoing particles, it is over-determined and hence (1.1) is not necessarily solvable uncoditionally. Actually, we will show that the number n+ of solvability conditions changes with „#°° as

(1.8)

0, 1, 4, 5,

^ ° ° e (-oo,-l), ^r°°e(-i,o), ^°°G(0,1), ^°°e(l,oo).

715 To be more precise, introduce the weight function (1-9)

WP{0

= (1 + I f D - ^ M l l . i i c c . T J t t ) ) 1 7 2 ,

with /3 e R. The main result obtained in [15] on existence is: T h e o r e m 1.1. Let M^ be the Maxwellian (1.6) with JC^ ^ 0, ± 1 and let /3 > 5/2. Then, there exist positive numbers e^,t\,a, and a Cl map (1.10)

#:L2(R^,£idO—*Kn+,

*(0) = 0,

such that the following holds. (i) Suppose that the boundary data F0 satisfy (1-11)

^(O-AMfll^eoWjCO,

eeR»..

TTiera, i/ie problem (1.1) admits a unique solution F in the class (1.12) \F(x,0 - M ^ O I + 16(1 + K i r 1 F x ( z , f l | < e 1 e - « W / J ( £ ) , j / and on/j/ ?/ Fo satisfies (1.13)

* > °>

?eR3>

* ( F 0 - Moo) = 0.

(ii) The set of F0 satisfying (1.11) and (1.13) forms a (local) C\ manifold of codimension n+. Based on this existence theorem, we will then discuss the stability of the boundary layer. As for the existence theory, we can see that the case when j^°° < — 1 would be simpler than other cases as all the information from the far field goes to the boundary and the solution to the linearized equation is exponential decay. Hence, we will discuss briefly the proof of the following stability theorem. T h e o r e m 1.2. The boundary layer solution obtained in Theorem 1.1 when ^#°° < —1 is nonlinear stable under small perturbation. A c k n o w l e d g m e n t : The research of the first author was supported by Grant-in Aid for Scientific Research (C) 136470207, Japan Society for the Promotion of Science (JSPS). The research of the second author was supported by the Competitive Earmarked Research Grant of Hong Kong # 9040648. The research of the third author was supported by the Competitive Earmarked Research Grant of Hong Kong # 9040645. REFERENCES [1] Aoki K., Nishino, K., Sone, Y., Sugimoto, H.(1991): Numerical analysis of steady flows of a gas condensing on or evaporating from its plane condensed phase on the basis of kinetic theory: Effect of gas motion along the condensed phase, Phys. Fluids A, 3, 2260-2275 [2] Arkeryd, L., Nouri, A. (2000): On the Milne problem and the hydrodynamic limit for a steady Boltzmann quation model, J. Stat. Phys., 99, 993-1019 [3] Bardos, C , Caflish, R. E., Nicolaenko, B. (1986): The Milne and Kramers problems for the Boltzmann equation of a hard sphere gas, Comm. Pure Appl. Math. 49, 323-352 [4] Carleman, T., (1932): Sur La Theorie de l'Equation Integrodiffercntielle de Boltzmann, Acta Mathematica, 60, 91-142 [5] Cercignani, C , Illner, R., Purvelenti, M. (1994): The Mathematical Theory of Dilute Gases, Springer-Verlag, Berline, [6] Cercignani, C. (1986): Half-space problem in the kinetic theory of gases, in: Kroner, E., Kirchgassner, K. (eds.) Trends in Applications of Pure Mathematics to Mechanics, Springer-Verlag, Berlin, 35-50 [7] Coron, F., Golse, F., Sulem, C. (1988): A classification of well-posed kinetic layer problems, Commun. Pure Appl. Math., 4 1 , 409-435. [8] Golse, F., Perthame, B., Sulem, C. (1988): On a boundary layer problem for the nonlinear Boltzmann equation, Arch. Rational Mech. Anal.. 103 , 81-96

716 [9J Golse, F., Poupaud, F.(1989): Stationary solutions of the linearized Boltzmann equation in a half-space, Math. Methods Appl. Sci., 11, 483-502 [10] Liu, T.-P., Yu, S.-H. (2002): Boltzmann Equation: Micro-Macro Decompositions and Positivity of Shock Profiles, to appear [11] Nikkuni, S., Kawashima, S. (2000): Stability of stationary solutions to the half-space problem for the discrete Boltzmann equation with multiple collisions, Kyushu J. Math., 54, 233-255 [12] Sone, Y. (2002): Kinetic Theory and Fluid Dynamics, Birkhauser, Basel [13] Sone, Y., Aoki, K., Yamashita, 1.(1986): A study of unsteady strong condensation on a plane condensed phase with special interest in formation of steady profile, in: Bom, V., and Cercignani, C. (eds), Rarefied Gas Dynamics, Teubner, Stuttgart, II, 323-333. [14] Ukai, S. (1998): On the half-space problem for the discrete velocity model of the Boltzmann equation, in Kawashima, S., Yangisawa, T. (eds), Advances in Nonlinear Partial Differential Equations and Stochastic Series on Advances in Mathematics for Applied Sciences-Vol. 48, World Scientific, Singapore-New York, 160174. [15] Seiji Ukai, Tong Yang and Shih-Hsien Yu, Nonlinear boundary layers of the Boltzmann equation: I, Existence. (To appear in Communications in Mathematical Physics) DEPARTMENT OF A P P L I E D MATHEMATICS, YOKOHAMA NATIONAL UNIVERSITY, YOKOHAMA, JAPAN

E-mail address: u k a i Q m a t h l a b . s c i . y n c . a c . j p D E P A R T M E N T O F MATHEMATICS, C I T Y UNIVERSITY OF H O N G K O N G , K O W L O O N , H O N G K O N G

E-mail address: matyangQcityu.edu.hk DEPARTMENT OF MATHEMATICS, C I T Y UNIVERSITY OF H O N G K O N G , K O W L O O N , H O N G K O N G

E-mail address: mashyuQcityu.edu.hk

A NEW ALGORITHM FOR DIVISION OF POLYNOMIALS LIANGHUO FAN Nanyang Technological

University, I Nanyang Walk, Singapore 637616, E-mail: [email protected]

Singapore

Division of polynomials has fundamental importance in algorithmic algebra, and is commonly encountered in many areas of mathematics as well as in scientific and engineering applications. The existing classical algorithm for polynomial division fails to provide an explicit way of determining the coefficients of the quotient and the remainder. In this paper, I present a new general theorem about division of polynomials, which provides a new and explicit algorithm for division of any two polynomials. A method of expressing a polynomial in polynomials of lower degrees is also obtained, as a corollary of the algorithm.

1 Introduction n

m

Given two polynomials f(x) = ^jajx'

and g(x) = ^bjXJ , where a,

(z = 0,1, 2,...,n) and b} (j = 0,\,2,...,m) are complex numbers and both an and fcmare nonzero, and for convenience we assume n>m, we can easily add, subtract, and multiply the polynomials, namely, n

f(x)±g(x)

= ^ ( a t ±bt)x',

where b} = 0 when m + l<j
and

1=0

n+m

f(x)g(x) = ^ ( y^aibj)xk

. As we know, faster algorithms also exist for

the multiplication of polynomials [1]. However, the division of polynomials, which has fundamental importance in computational algebra and is frequently encountered in many areas of mathematics as well as in scientific and engineering applications, is much more complicated. A conventional algorithm for division of polynomials can be seen in the typically-used proof for the so-called "Division Theorem", which says for any two polynomials f(x) and g(x), as shown earlier, there exist unique polynomials q(x) and r(x) so that f(x) = q(x)g(x) + r(x) and deg r(x) < deg q(x) The algorithm goes as follows: when m>n, clearly q{x)= 0, g(x) = r(x); when m < n, to obtain g(x) and r(x), one can establish a sequence of polynomials f^x), f2(x), ... using the following method:

717

Let/,(x) = f(x)-g(x)*c0xn

m

, where Co is the ratio of the leading

coefficient of/(x) to that of g(x), namely —-. We have deg/(x)>

K deg/;(x). If deg/|(x) =rt,<m, then q(x) = c0x"'m and r(x) = fi(x). Otherwise, we continue to let f2(x) = f](x)-g(x)*c]x"''m, where c\ is the ratio of the leading coefficient of fx(x) to that of g(x), and again we have deg/(x)>deg/ 2 (x). If deg/ 2 (x) = n2<m, then q(x) = c0x"~m + c,x"'"m and r(x) = f2(x). Otherwise, follow the above process to get f3(x), f4(x), ... until fk(x) when degfk(x)<m. Then, q(x) = c0x"~m +cxx"'~m + ... + ck_lx"t^m and r(x)=fk(x). This classical algorithm, found in many relevant texts in mathematical form [2,3] or in pseudo-code form [4], provides a method for computing the quotient and remainder of polynomial division. However, it does not give explicit algebraic expressions for determining the coefficients of the quotient and the remainder to be produced. In 1990, Godbole presented another algorithm for polynomial division by solving a system of algebraic equations involving the coefficients of the dividend, divisor, and quotient, but it can only be used to find the quotient when the remainder is zero, inapplicable when it is nonzero [5]. Below in this paper I present a new and explicit algorithm for computing the quotient and remainder of the division of two polynomials, which is based on a new general theorem about polynomial division. 2 A New Algorithm In polynomial division, if the divisor has a higher degree than the dividend, then obviously the quotient is zero and the remainder is the dividend itself, therefore here I only consider the situation when the degree of the divisor is equal to or lower than that of the dividend. n

Theorem: For any two polynomials/^) = ^ajx'

m

and g(x) = ^bjXj

where at (i = 0,1, 2,...,ri) and bj (j = 0,\,2,...,m) are complex numbers,

,

719

m
Hn-m-i TO

y_i

m

m

J—1

m

\,2,---,m,

-, i = m + l,m +

2,---,n-m,

and r

m-k = am-k ~ Z * = 1, 2, • •m. •,

(b) When m> — , 2 a„

i = 0,

bm ' Hn-m-i

i n—i tn

\

v1, < 7 „ --m

m-j

-i+j

K

,

i=

l,2,---,n-m,

tn

and a

m -*~Z^A-*-/> i=0 m-k

k=

a

m-*~Z^-*-"

k = 2m-n + l,2m-n + 2,-,m.

l,2,-~,2m-n,

To prove the above theorem, one can use a generalized synthetic division established in Fan [6], which is essentially an easier way of doing long division, but not limited to the situation where the divisor is of the form x-c. The following example explains how the generalized method can be implemented using the so-called "the synthetic array of numbers" when /(JC) = 2x5 - 5x4 + x2 + 2x - 7 and g(x) = x3 - 3x2 + 2x - 3. Notice

720

how the coefficients of fix) and g(x) are used in the synthetic array and how the arrows are placed to indicate the connection of the numbers. The dividend ,2x5 -5x 4 +,0x3 +x 2 + 2 x - 7 v n °

V

2

r -3 X

.£ + -ON

H

/ 5 /0 1\

I

-2 3 Tjy7'

v / T / 1 -1 \ \ / 2 x22 + x - l (The quotient)

t 2

7

-10

\

, * ' 2x 2 +7x-10 (The remainder)

A more detailed explanation about the generalization of synthetic division and the mathematical deduction of the theorem can be found in Fan [6]. To verify the validity of this method provided in the theorem, one can use concrete examples, for instance, let /(x) = 4 x 4 + x 2 + 3 x - 5 , g(x) = 2x2 - x + 3, by applying the method we get q(x) = 2x2 +x-2, and r(x) = -2x +1. Obviously, f{x) = q(x)g(x) + r(x). It is also easy to see when g(x) = x-c, the algorithm turns to be the classical synthetic division. From the algorithm, a method of expressing a polynomial in polynomials of lower degrees, as shown in the following corollary, can be obtained. The proof is immediate. Corollary: For two polynomials f(x) of degree n and g(x) of degree m, and m
n m

, anddeg qxix)<m,X -\,2,..k

.

721

Notice when g(x) = x, the first expression is Horner's rule for evaluating a polynomial. 3

Discussion

The theorem presented in this paper provides a new and explicit way of determining the coefficients of the quotient and remainder of the division of any two general polynomials. Clearly, this algorithm can be used in practical computing as well as in software design. Moreover, because the algorithm reveals direct and explicit algebraic relations between the coefficients of the dividend and the divisor and those of the quotient and remainder, it can help us to more clearly analyze the properties of the quotient and the remainder in terms of the properties of coefficients of the dividend and the divisor. For example, it is obvious from the algorithm that if the dividend and the divisor are over the ring of complex numbers C, then the quotient and the remainder are also over C, but the same kind of property does not hold for division of polynomials over the ring of integers Z. Further application of the algorithm in theoretical analysis in relevant areas of mathematics remains to see. Nevertheless, it should be pointed out that, as one can see without much difficulty, the new algorithm and the existing classical algorithm for polynomial division have essentially the same efficiency in computing in terms of the times of mathematical operations they need to execute to arrive at the final results. References 1. Zippel, R., Effective polynomial computation (Kluwer Academic Publishers, Boston, 1993) pp. 113-120. 2. Merris, R., Introduction to computer mathematics (Computer Science Press, Rockville, Maryland, 1985) pp. 230-234. 3. Akritas, A. G., Elements of computer algebra with applications (John Wiley & Sons, New York, 1989) pp. 102-105. 4. Chapra, S. C. and Canale, R., P., Numerical methods for engineers: with software and programming (McGraw-Hill, Boston, 2002) pp. 163166. 5. Godbole, P. B., Algorithms for multiplication and division of two polynomials, Adv. Eng. Software 12 (1990) pp.133-138. 6. Fan, L., A generalization of synthetic division and a general theorem of division of polynomials. Mathematical Medley 29 (2003), to appear.

GINZBURG-LANDAU SYSTEM A N D SUPERCONDUCTIVITY N E A R CRITICAL TEMPERATURE

XING-BIN PAN Department of Mathematics, National University of Singapore, Singapore 119260. E-mail: [email protected] We investigate superconductivity of a sample subjected to an applied magnetic field and slightly below the critical temperature Tc, and introduce recent results on the estimate of the critical field He •

In Ginzburg-Landau theory, superconductivity is described by a complex-valued function ip (order parameter) and a real-valued vector field A (magnetic potential), 1 and (ip, A) is a minimizer of the Ginzburg-Landau energy functional. Under a proper scale, t h e energy functional can be written as / {|VV> - iAij)\2 + ^ ( l - \i>\2?}dx Jn *

+ — f |curl A - Happi\2dx, M Jn3

where H a p P i is t h e applied field; K is the Ginzburg-Landau parameter here A is t h e penetration depth and £ is the coherence length; 1

(1)

: K = A/£,

4ma2l2(Tc - T) h2Tc c

» - ? -

here T is t h e t e m p e r a t u r e , Tc is t h e critical t e m p e r a t u r e in zero field, % is t h e Planck's constant, I is a typical scale for t h e sample, m is the electron mass, and a is a material constant which is independent of t e m p e r a t u r e . Note t h a t n = \^/]i. In this paper, il is a bounded, smooth and simply-connected domain in H3. Our interest is the superconductivity under applied magnetic fields, with t e m p e r a t u r e T slightly below t h e critical t e m p e r a t u r e Tc (hence fj, is small). Let us consider an applied field of the form H a ppi = crh, where h is a unit vector, a n d a > 0 is a parameter. Letting A = aA, the associated energy can be written as g\ip,A\=

[ {\Vij;-iaAij\2

+ ^(l-\TP\2)2}dx+'^-

[

|curl A - h | 2 d x . (2)

Let F h be a smooth vector field such t h a t curlFh = h ,

div F h = 0

in II3.

We may choose F h such t h a t / n F^dx = 0. Let W 1 , 2 ( f i , C ) be t h e Sobolev space of all complex-valued functions defined on fl, and let D l l 2 ( 7 i 3 , d i v ) = { A : | A | € Lj, c (W 3 ), | V A | € L2(K3),

div A = 0 in

W(fl) = {ty, A ) : V G W 1 , 2 ( f i , C ) , A - F h e D ^ ^ d i v ) } .

722

II3},

723

It is easy to show that the (global) minimizers of the functional Q on W(O) exist, and they are weak solutions of the following Ginzburg-Landau system • V

^

= M(1-M2)V'

in",

2

in

curl A = ^3{ViV C T AV>}xn (V*AII>)-V

= 0

on SO,

^3-

(3)

A-FhGD^^.div),

where xn is the characteristic function of O, namely, xn = 1 on 0 and = 0 in R \ 0 ; v is the unit outer normal vector of 9 0 . It is well-known that, when the applied field is strong, (0, Fh) is the only minimizes namely, the sample is in the normal state. Since we are interested in the existence of non-trivial minimizers, we define a critical field by Hc(h, /j,, K) = inf{er > 0 : (0, Fh) is a global minimizer}. The estimate of the value of Hc(h,p,,n) for a superconductor with small \i was given recently 3 . It involves two numbers: w(h) = / j V w h - F h | 2 cte,

A(h) = A / ^ P llcurl U h | |

L W

,

(4)

where -f„(fidx = mi Jn dx; u>h and Uh are the solutions of the following equations respectively : Awh = 0 in 0 ,

—— = Fh • v Qv

curl 2 U h = (Vw h - F h ) x n

in TZ3,

on 5 0 ,

/ Wh dx = 0, Jn

U h 6 D 1 , 2 (ft 3 , div),

f Vhdx = 0. Jn

T h e o r e m 1. For any unit vector h and n > X(h)y/Ji we have, for small /i, Hc(h,K,n)

= JJ^+o{y/ji).

(5)

The asymptotic behavior of the minimizers for small /i depends on the scale of K. 3 To describe the result, we need some notations. Given a unit vector h, and positive constants A and p, we consider the equations Aw" = 0 in O, A 2 curl 2 A'' = p{VwP - AP)xn in II3, 2£-=A<>-v on dQ, AP - F h € D 1 - 2 ( ^ 3 , div).

(6)

There exists exactly one solution (wp, Ap) of this equation in the set y = {(w,A)

: weW1'2^),

A-FheD1'2(7e3,div),

/ wdx = 0, f Adx Jn Jn

0}.

724

vp = \imt-+o{wp+t - wP)/t and B? = \imt-,o{AP+t Avp = 0 \2cm\2BP ^-=BP-v

= {{VwP-AP) on Oil,

- A")/t

exist, and satisfy

+ p{VvP-BP)}Xn BP € D 1 ' 2 ^ 3 , div).

in O, in

ft3,

Note that, when p = 0, {w°,A°) = (ro h ,F h ). If A > A(h), for 0 < a < there exists a unique positive number p — p(a) such that

a2 I \Vwp

Ap\2dx +

(7)

l/y/w(h),

p=l.

J a

Write A a = A'<°>, B a = B", c a = y/pjaj, define ua to be the unique solution of

va = v', wa = w*W. Then we

Aua = \Aa\2 - 2A a • Vwa - 4,(1 - |c a | 2 ) in (I, dUg -waAa • v on 9 0 , / n uadx = 0,

(8)

and set ba =

+ waAa • (Vva - Ba)dx 2a2\ca\2-J* 1 - 2a 2 / (Vw„ - A a ) • Badx

Let -0h be the unique solution of the following equation : AV'h = 2 F h • Vw h - | F h | 2 + w(h) in ofi, ^ + whFh • v = 0 on dil, JQ iphdx = 0.

(9)

T h e o r e m 2. Consider the applied field H a p p i = a^/ph, where h is a unit vector, and a is a fixed number, 0 < a < l/y/u(h). Let (i/v> AM) be the minimizer of the functional Q given in (2). (i) If K = Xy/p with A > A(h) being fixed, then we have, as p —> 0, Vv = CM t 1 + ia\fpwa + a2p(ua + ibava) + o(p)], AM = A Q + a&aBaA/7I + o(y/p), |c M | 2 = |c a | 2 + abap + o(p). (ii) If K > 0 is fixed, then we have, as p —• 0,
h

0(p3/2)],

+ 0( M 3 / 2 )

|CM| = v ^ M h ) + 0 ( v ^ ) Conclusion (ii) describes the behavior of a sample of size much smaller than the penetration depth, and subjected to the applied field below Hc(h, p, K). When

725

the temperature increases to Tc, the applied field penetrates the sample almost completely, however, superconductivity may persist. Conclusion (ii) also implies that, near the critical temperature Tc, type I behavior may be observed in a type II superconductor at certain scale. The minimizers of the Ginzburg-Landau functional exhibit various phenomena for parameters of different scales. If we choose the penetration length A as the length unit, we may take A = 1 and \i = K2. In a rescaled domain (also denoted by Q) we can rewrite the functional (1) in the following form : /{|V^-i^|

2

+ ^(l-|V>|2)2}d*+ /

|curM-74p P i| 2 cto.

(10)

In recent years many authors have used the functional (10) to study the behavior of superconductors of large value of K when the applied fields are close to the upper critical field Hc3 • For a superconducting cylinder with infinite height and constant cross section f2o, for large K we have Hca(it) = •£- + - ^ K m a x +

A)

0(K-1/3),

2

pi'

where /?o is the lowest eigenvalue of the Schrodinger operator with a unit magnetic field on the half plane and 0.5 < /?o < 0.76, Kmax is the maximum value of the curvature of the boundary of £IQ, and C\ > 0 is a universal constant; as the applied field decreases from Hc3, superconductivity nucleates at the maximum points of the curvature (see Lu-Pan 4 and Helffer-Pan 5 ). As the applied field further decreases but is still above Hc2, a thin superconducting sheath forms on the entire boundary, and gradually develops a surface superconducting state 6 . Comparing these results with Theorems 1 and 2 we see that, the behavior of the minimizers for small /x are quite different to those with large K. It is interesting to explore a liquid crystal phase which is an analogy of the surface superconducting state. Some investigations have been carried out 7 . Acknowledgments This work was partially supported by the National University of Singapore Academic Research Grant No. R-146-000-033-112. References 1. 2. 3. 4. 5.

V. Ginzburg and L. Landau, Zh. Eksper. Teoret. Fiz. 20, 1064 (1950). D. Saint-James and P.G. De Gennes, Physics Letters 6, 306 (1963). X. B. Pan, Superconductivity near critical temperature, submitted. K. Lu and X. B. Pan, Physica D 127, 73 (1999). B. Helffer and X. B. Pan, Upper critical field and location of surface nucleation of superconductivity, Ann. Inst. H. Poincare Analyse Non Lineaire, to appear. 6. X. B. Pan, Comm. Math. Phys. 228, 327 (2002). 7. X. B. Pan, Landau-de Gennes model of liquid crystals and critical wave number, submitted.

G E O D E S I C A P P R O X I M A T I O N S OF 2 D H Y D R O D Y N A M I C S WAYNE LAWTON Centre for Industrial Mathematics, Department of Faculty of Science, 2 Science Drive 2, Singapore E-mail: matwmWnus.edu.sg

Mathematics 117543

Euler (1765) derived equations that describe the inertial motions of rigid bodies and inviscid incompressible flows and these were later shown to be described by geodesic flows with respect to Riemannian metrics induced by inertial operators on Lie groups that parameterize the physical configurations. Arnold generalized Euler's equations to general Lie groups and Fairlie, Zachos and Zeitlin developed metrics on SU(N) with N odd whose geodesic flows approximate 2D periodic inviscid incompressible flows and preserve analogues of powers of vorticity. We have developed an accuracte symplectic integrator and used it to study scaling behaivor and turbulence. This talk describes our computational results.

1

Classical and Q u a n t u m Descriptions

Classically, incompressible fluid flow in a domain D is described by a trajectory g : R -¥ SDiff(D) in the infinite dimensional Lie group of volume preserving diffeomorphisms of D. The velocity in space u : D x R -> R 3 defined by u := jft o g-1 clearly satisfies div u = 0 and u • n = 0 where n is normal to dD. For inertial flow of an inviscid fluid, Euler 6 showed that dt u + V u u = —grad p and Moreau 16 showed that g is a geodesic with respect to the right-invariant Riemannian metric defined by (u,u) = JDu-u. Arnold * used this geometric description to relate flow sensitivity to negative curvature and Ebin 5 , Marsden 15 , and Shkoller 17 used it to derive existence, uniqueness, and regularity results for Euler's and Navier-Stoke's equations. For 2D inertial flow u — JV«/>, where J is rotation by ^ and if> is the stream function. The vorticity U E V X U determines if> — —A _1 w and satisfies the equation % = u • Vu — dXlipdX2u — dX2ipdXlbj = [tf>,u] (Poisson bracket) hence the flow preserves the infinite number of Casimirs Ip = JD UJP, p > 1. Quantum formalism provides a powerful, new description of 2D hydrodynamics that is reflected in the expanding literature 3,4,7,8,10,11,19,20 L e t D _ tf/2irZ2 denote the two-dimensional torus. The Lie algebra of SDiff(D) consists of divergence-free vector fields on D with the commutator product or, equivalently, of C°° real-valued stream functions on D with the Poission bracket product. Its complexification, described with the exponential basis Lm{x) = exp{rn-x), m € Z2, m ^ (0,0), 1 6 D , by [Lm, L n ] = (mxn) Lm+n and the Laplacian A Lm = m • m on D admit a continuous family of deformations [Lm,Ln] — K_1 sin(/cm x n), AKLm = K~ 2 sin2(iim) where re > 0 and they can be recovered in the limit re -» 0. Furthermore, the deformed algebra is the algebra of derivations of the C* algebra (noncommutative torus) generated by operators A and B that satisfy AB — eBA where e = etK and AK is the Laplace operator on this noncommutative torus! If re = ^ where N is an odd integer then e is an N-th root of unity, the operators A and B can be realized

726

727 /O 1 0 ...^ by the Weyl matrices

18

(I

0 ••• 0 e

A =

0

0

\

•••

B = 0

0 - 1

\ i o ... o /

\0-.

0 e"-1/

and the mapping Lm -> emim^/'2AmiBm:2 provides a finite dimensional approximation by sl(N,C) which can be identified with complex-valued functions on the discrete torus Z'^ that sum to zero. Since su(N) is the real form of sl(N,C) and AK approximates A, this yields approximations to 2D incompressible inviscid 2-K—periodic flow by geodesic trajectories g in SU(N) with respect to the rightinvariant Riemannian metric defined by (ip, ip) = fD —ipAip. Since u satisfies u(t)=gu(0)g-\

^g-1

= -AKu,

(1)

the N — 1 Casimirs trace up, p = 2 , . . . ,N are preserved. A simple computation shows that in the limit K -> 0, these converge to the continuous Casimirs Ip. 2

Theoretical and Computational R e s u l t s

Hydrodynamic turbulence is related to the breaking of symmetry 9 , a property that is described in the general context of geodesic flows by the following result: T h e o r e m 2.1 Let G be a Lie group with algebra a self-adjoint, positive definite inertial operator, mannian metric from (u,v) = (Au)(v), and define automorphisms of Q that are (-,) — isometries. If Q\ = {x G Q | s(x) = x, s 6 Si} then

G, linear dual Q*, A : Q —» Q* give G the right-invariant Riethe symmetry group S(G,A) of Si is a subgroup of S(G,A) and

1. Q\ is a Lie subalgebra of Q, 2. if g is a geodesic in G and the velocity in space u = ^f <7_1 satisfies w(0) 6 Q\, then for all t, u(t) 6 Q\ and g(t) G Gig(0) where G\ is the Lie subgroup of G associated to Q\. P r o o f The first assertion follows since if s £ S and x,y G Gi then s([z,2/]) = [s(x),s(y)] = [x,y]. The trajectory g is a geodesic if and only if u satisfies Arnold's 2 generalized Euler equation A^ = Ad*u(Au) or, equivalently, if for all v e G, -^(u(t),v) = (u(t),[u(t),v]). If s G S then (s(u),s(v)) = (u,v) and (s(u),[s(u),s(v)]) = (s(u),s([u,v])) = (u, [u,v]) and since s(G) = G, s(u) also solves the generalized Euler equation. If u(0) G G\ then s(u(0)) = u(0) and the first part of the second assertion follows from the uniqueness property of the solutions of initial value problems while the third part follows from the first. We developed a two-step implicit symplectic integrator, based on Eqs. 1 and implemented in MATLAB, and used it to numerically compute vorticity trajectories associated to geodesies in SU(N). One multipole trajectory yielded the following

728 vorticities, represented in R(Z%) by 5 x 5 matrices, at times t = 0,15, 21 / .5 - . 5 - . 5 .5 0 0 0 0 0 \ 0 0

0 0 0\ 00 0 00 0 , 0 0 00 0 /

/ .4913 -.4913 .0118 0 \-.0118

- . 4 9 1 3 .0354 0 - . 0 3 5 4 \ / .5 - . 5 0 0 0 \ .4913 -.0354 0 .0354 - . 5 .5 0 0 0 - . 0 1 1 8 - . 0 8 5 0 0 .0850 ,« 0 0 000 0 0 0 0 0 0 0 0 0 .0118 .0850 0 - . 0 8 5 0 / \ 0 0 00 0 /

The symmetry group S(SU(5), A2») is generated by the translations, rotation by | , and reflection in lines composed with multiplication by —1. The initial vorticity is invariant under two reflection symmetries and the invariant Lie subalgebra has dimension 4. These matrices together with Fig. 1, that illustrates the distance from the initial vorticity and the distance from the subalgebra, show that the symmetry is approximately preserved for some time and, furthermore, suggest that the flow is integrable. The exponential divergence of the distance from the subalgebra is a consequence of numerical roundoff error combined with negative curvature as predicted by Jacobi's equation. 3

Future Studies

Chaos and turbulence are universal phenomena that, unfortunately, characterize social conditions, e.g. epidemics and terrorism, as much as they characterize hydrodynamics, and it is urgent to understand and control them. We intend to study the role of symmetry and curvature on integrability, chaos, turbulence and scaling properties. We have derived results 1 2 , 1 3 that suggest that a single geodesic on SU(N) generically determines the inertia operator up to a scalar multiple and we believe that this property will play a key role. We have also derived preliminary results that suggest wavelet bases may be useful 14 and intend to develop bases related to noncommutative geometry. Iyer and Rajeev n have used the SU(N) approximation to derive a new scaling model for 2D turbulence and plan to use noncommutative geometry to attack problems in 3D hydrodynamics. We remark that the existence/uniqueness problem for the 3D Navier-Stoke's equations carry a US1 million Clay Prize Award as does the problem of resolving the Riemann Conjecture to which Alain Connes has outlined an approach based on eigenvalues of the Laplace operator on noncommutative geometries! Acknowledgments THIS RESEARCH IS SUPPORTED BY THE GRANT FROM IHPC-CIM RESEARCH PROJECT: R-146-000-036-592. References 1. V. Arnold, Ann. Inst. Fourier Grenoble, 16, 319 (1966). 2. V. Arnold, Mathematical Methods of Classical Mechanics,(Springer,NY,1978). 3. V. Arnold and B. Khesin, Topological Methods in Hydrodynamics, (Springer, NY, 1998).

729

dist. from start

50

log dist from subalgebra

100

-70

0

50

100

Figure 1. Evolution of Multipole Vorticity

4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20.

J. Dowker and A. Wolski, Phys. Rev. A, 46, 6417 (1992). D. Ebin and J. Marsden, Annals of Mathematics, 92, 102 (1970). L. Euler, Memoirs de I'Academie des Sciences Berlin, 1765. B. Fairlie and C. Zachos, Phys. Lett. B, 218, 203 (1989). B. Fairlie, P. Fletcher and C. Zachos, J. Math. Phys., 3 1 , 1088 (1990). U. Frisch, Turbulence: the legacy of A. N. Kolmogorov, (CUP,NY,1995). J. Hoppe, Int. J. Mod. Phys. A, 4, 5235 (1989). S. Iyer and S. Rajeev, Modern Physics Letters A, 2, 1 (2002). W. Lawton and L. Noakes, J. Math. Physics, 42(4), 1 (2001). W. Lawton, ScienceAsia, 28, 61 (2002). W. Lawton, Proc. Int. Conf. Optimization of finite-element approx., wavelets, and splines, Saint-Petersburg, Russia, June 2001. J. Marsden, T. Ratiu and S. ShkoUer, Geometry and Functional Analysis, 10, 582 (2000). J. Moreau, Acad. Sci. Paris, 249, 2156 (1959). S. ShkoUer, Applied Mathematics Letters, 14, 539 (2001). H. Weyl, Gruppentheorie und Quantenmechanik, Zurich, 1928. V. Zeitlin, Physica D, 49, 353 (1991). V. Zeitlin, J. Phys. A, 25, L171 (1992).

MULTI-PHASE FLOW MODELS A N D METHODS FOR LAVA L A M P S A N D LIFE S C I E N C E S JIA SHUO Centre for Industrial Mathematics, Department of Mathematics Faculty of Science, 2 Science Drive 2, Singapore 117543 E-mail: [email protected] Multi-phaseflows,that involve immisciblefluids,arise in many natural and industrial processes and their accurate and efficient simulation will play an increasingly vital role in the life sciences. Surface tension, characteristic of these flows, creates a discontinuous pressure drop across each surface separating two liquid phases that is proportional to the surface's mean curvature. The primary computational challenge is to solve for the pressure whose laplacian has a singular term consisting of a distribution supported on the separating surfaces. In this paper we discuss models and methods for multiphaseflowsthat include simplifying assumptions, boundary integral representations, and wavelet-based discretization and multilevel preconditioning. We also discuss the validation of these models and methods through the simulation of lava lamps, whose simple two-phaseflowsand convenient availability provide an ideal laboratory testbed. 1

Introduction

Multi-phase flows are not only part of our natural environment like volcanic activities and air and water pollution, but also are working processes for a variety of industrial branches like conventional and nuclear power plants, combustion engines, propulsion systems, flows inside the human body, oil and gas production and transport, chemical industry, biological industry, process technology in metallurgical industry or in food production etc. The list is by far not exhaustive. For instance everything to do with phase changes is associated with multi-phase flows. The industrial use of multi-phase systems requires methods for predicting their behavior. Zemansky l defines a phase as a system or a portion of the system composed of any number of chemical constituents satisfying the requirements (a) that it is homogenous and (b) that it has a definite boundary. The phases may of course be solid, liquid, or gas. In the lava lamp model, the liquid-liquid flow (i.e. flow of two immiscible liquids) is considered. Actually, some of the principles and methods for liquid-liquid flow can be applied to other types of two-phase flow.

2

Exact M o d e l

In general the density p, coefficient of dynamic viscosity p,, coefficient of bulk (or second) viscosity A, thermal conductivity /t, specific heat at constant volume cv, specific heat c, and internal energy per unit mass E are functions of the thermodynamic state variables pressure p and temperature T. For a calorically perfect gas E = cvT and for a liquid E ~ cT. The acceleration of gravity g will be assumed to be constant. The equations governing the evolution of p, v , T within each fluid in a multiphase system are derived from conservation laws 2 , 6 , 3 . Conservation of

730

731 mass yields the equation of continuity

where - ^ = J ^ + v - V . Conservation of momentum yields the Navier-Stokes p - ^ = - V p + / i A v + ( A + ^ ) V V - v + pg. Conservation of energy yields the heat conduction

equation (2)

equation

DE u •> 2 p — = V - « V T - p V - v + ^ T r a c e ( ( V v ) + (Vv) T ) + (A - - ^ ( d i v v ) 2 . (3) The pressure drop across the surfaces that separate different phases is described by Laplace's formula 3 ) Pi-P2

=a ( ^

+ ~J+nT(a[~a'2)n,

(4)

where n is the outward normal vector to S, a is the surface-tension coefficient, Rx and Ri are the principle radii of curvature and reckoned as positive if they point in the direction of f^, and a'-, j = 1,2 are the viscous stress tensors denned by (dvi

3

dvk

2

dvt\

dve

Computational M e t h o d s

Under the assumption that the temperature variation is sufficiently slow to make a minor contribution to the inertial force, we may take the divergence of the velocity equal to zero within small time intervals. The resulting Boussinesque Approximation 6 , 4 can be regarded as an operator splitting method that vastly simplifies the computations. At each time step we simply solve the incompressible Navier-Stokes equations p—— - - V p + ^ A v + pg, V v = 0 then compute the new temperature and density. We implement the computations using the velocity-pressure formulation. To solve for the pressure we observe that the Laplacian of the pressure A p is the sum of tho components, a continuous component withing the interior of each fluid phase that is a function of the velocity, and a singular component consisting of a double layer distribution that is supported on the surfaces that separate the phases and that can be computed from Laplace's equation. Furthermore, the normal component of the pressure is computable from the Navier-Stokes equation. These facts provide the pressure as the solution of the Poisson equation with Neumann boundary conditions. For infinite domains we impose asymptotic boundary conditions that are equivalent to the gauge conditions discussed in 7 that play a critical role in the navigation of microorganisms. We note that boundary element methods also provide a means to compute the solution of these equations and the gauge conditions provide a means to regularize the resulting equations for unbounded domains 5 .

732 4

Future Research: Lava Lamps and Life Sciences

Lava Lamps illustrate the simplest example of two phase flow. They consist of an inverted lighbulb beneath a glass container filled with water and wax. As the wax, initially settled at the botton of the container, is heated by the lighbulb it expands and exudes rising columns that rise, cool, fall and exhibit captivating motions. Their simulation has elicited a growing enthusiasm and involves the use of sophisticated graphical and visualization techniques. However, the underlying models tend to be animation based and while they offer speed they do not simulate realistic physics that could be extended to analyze more complex two phase flows such as occur during oil recovery, pharmaceutical production, and biomedical processes. Our future research will experiment with various methods to simulate two phase flows and related situations such as the navigation of microorganisms using holonomy. Acknowledgments THIS RESEARCH IS SUPPORTED BY THE GRANT FROM IHPC-CIM RESEARCH PROJECT: R-146-000-036-592. References 1. M. W. Zemansky, Heat and thermodynamics, (5th ed, McGraw-Hill, New York, 1968). 2. G. K. Batchelor, An Introduction to Fluid Mechanics, (Cambridge University Press, Cambridge, 1967). 3. L. D. Landau and E. M. Lifshitz Fluid Mechanics, (Pergamon Press, New York, 1987). 4. C. Pozrikidas, Introduction to Theoretical and Computational Fluid Dynamics, (Oxford University Press, New York, 1997). 5. H. Power and L. C. Wrobel, Boundary Integral Methods in Fluid Dynamics, (Computational Mechanics Publications, Boston, 1995). 6. P. G. Drazin and W. H. Reid, Hydrodynamic Stability, (Cambridge University Press, Cambridge, 1981). 7. A. Shapere and F. Wilczek, Self propulsion at low Reynolds number, Physical Review Letters, (20), 59, 2051, 1987.

A Reynolds—uniform numerical m e t h o d for P r a n d t l ' s b o u n d a r y layer p r o b l e m for flow past a plate w i t h m a s s transfer J.S. Butler Department of Mathematics, Trinity College, Dublin, Ireland J.J.H Miller Department of Computational Science, National University of Singapore, Singapore & Department of Mathematics, Trinity College, Dublin, Ireland G.I. Shishkin Institute for Mathematics and Mechanics, Russian Academy of Sciences, Ekaterinburg, Russia In this paper we consider Prandtl's boundary layer problem for incompressible laminar flow past a plate with transfer of fluid through the surface of the plate. When the Reynolds number is large the solution of this problem has a parabolic boundary layer. We construct a direct numerical method for computing approximations to the solution of this problem using a piecewise uniform mesh appropriately fitted to the parabolic boundary layer. Using this numerical method we approximate the self-similar solution of Prandtl's problem in a finite rectangle excluding the leading edge of the plate, which is the source of an additional singularity caused by incompatibility of the problem data, for various rates of mass transfer. By means of extensive numerical experiments we verify that the constructed numerical method is Reynolds — uniform in the sense that the computed errors for the velocity components and their derivatives in the discrete maximum norm are Reynolds uniform. We use a special numerical method related to the Blasius technique to compute a reference solution for use in the error analysis.

1

Introduction

Incompressible laminar flow past a semi-infinite plate P with mass transfer in the domain D = R 2 is governed by the Navier-Stokes equations. Using Prandtl's approach the vertical momentum equation is omitted and the horizontal momentum equation is simplified, see 2 a n d 3 . For large Reynolds numbers the new momentum equation is parabolic and singularly perturbed, because the highest order derivative is multiplied by the small singular perturbation parameter e = - ^ j . It is well known that for flow problems with large Reynolds number a boundary layer arises on the surface of the plate. Also, when classical numerical methods are applied to these problems large errors occur, especially in approximations of the derivatives, which grow unboundedly as the Reynolds number increases. For this reason Reynolds - uniform numerical methods, in which the error is independent of the singular perturbation parameter, are required. Here we solve the Prandtl problem in a region including the parabolic boundary layer. Since the solution of the problem has another singularity at the leading edge of the plate, we take as the computational domain the finite rectangle f2 = (.1,1.1) x (0,1) on the upper side of the plate, sufficiently far from the leading edge (see Fig. 1) that the leading edge singularity does not cause excessive difficulties for the numerical method. We denote the boundary of fi by T = TL \JYT \JT B (JTR where FL, TT, TB and TR are, respectively, the left-hand, top, bottom and right-hand edges of fi.

733

734 The Prandtl boundary layer problem in D is: ' Find u e = (ue,ve) such that for all (x,y) £ D ue satisfies the differential equations - i ^ (Pe){

+ u E .V« E = 0

V.U e = 0 with boundary conditions ME = 0 and ve — VQ(X) on u £ = u p on TL U T T

TB

where vo(x) is the velocity normal to the plate at which mass is transferred through its surface. Negative values of vo correspond to injection, positive values to suction. We construct a numerical method for which there are error bounds for the solution components and their derivatives, such that the error constants do not depend on the value of Re or VQ- That is, the method is (Re, nonuniform. 2

Blasius solution

Using the transformation described i n 4 , (P£) can be simplified to the well-known Blasius problem, involving a non-linear ordinary differential equation, which is described in 5 . From 4 we have v0(x) = In principle /o can have values in (—00,00), but in practice —0.87 < fo < 7.07. We solve the Blasius problem numerically and then reverse the above transformation to obtain the Blasius solution of the original Prandtl problem. Here the Blasius solution will be denoted by U | 1 9 2 , since we solve the Blasius problem on a mesh with N = 8192. The purpose of finding this independent, semi-analytic solution of Prandtl's problem is to use it as a reference solution for the unknown exact solution in the error analysis of the direct method. Since the Blasius solution is known to converge Reynolds-uniformly to the solution of Prandtl's problem, we can use it to estimate guaranteed error bounds for the approximations generated by the direct method 5 . 3

Direct numerical solution

In this section we construct a robust numerical method to solve the Prandtl problem (Pe) for all admissible values of Reynolds numbers Re and /o S [-0.3,0.3]. We define the rectangular mesh on the rectangle fi to be the tensor product of two one-dimensional meshes fl^ = fi^1 x Q£y, where N=(Nx,Ny). The mesh in the x-direction is the uniform mesh ftf* = {Xi : Xi = 0.1 + iN~\

0< i <

Nx}.

735 The mesh in the y-direction is the piecewise-uniform fitted mesh

n?- = {yr-vj = *j£y,o<j<%-,vi = * +

(i-
It is important to note the position of the boundary layer in order to define an appropriate transition point a from the coarse to the fine mesh, so that there is a fine mesh in the boundary layer. T h e appropriate choice in this case is a = min{^, v^eln Ny}. The factor y/e may be motivated from a priori estimates of the derivatives of the solution u £ or from asymptotic analysis. For simplicity we take NX=NV = N. Since t h e problem (-P£N) is a nonlinear system an iterative method is required for its solution. This is obtained by replacing the system of nonlinear equations with a sequence of systems of linear equations. The systems of linearized equations are With the boundary condition U™ = Uf, 192 on r L , for each i, 1 < i < N, use the initial guess V°\xi = U £ " - 1 \xt-! and for m = 1 , . . . , Mi solve the following two point boundary value problem for U™ (xi, yj) (-eS2y + U f " 1 • D-)U?(xi>yj) = 0, l<j
(D--\J?)(xi,yj)

yj)

= Q,

with initial condition V/" = vo(xi) on FBContinue to iterate between the equations for U™ until m = Mit where Mi is such that maxfltfj* - U^-%t,^\Vt^

- K ^ l x J

< ™-

For notational simplicity, we suppress explicit mention of the iteration superscript Mi henceforth, and we write simply U e for the solution generated by (A^ 1 ). We take tol = 10~ 6 in the computations. We note that there are no known theoretical results concerning the convergence of the solutions U e of (^4^) to the solution u£ of (P e ) and no theoretical estimate for the pointwise error (U £ — ue)(xi,yj). It is for this reason that we are forced to apply controllable experimental error analysis techniques, which are adapted t o the problem under consideration and are of crucial value to our understanding of the computational problems. In what follows V* is defined t o be V* = maxjjN VB • 4

Error A n a l y s i s

In this section we estimate the Reynolds-uniform maximum pointwise errors in the approximations generated by the direct numerical method described in the previous

736 section. For brevity, we show the errors for only one typical value of the mass transfer, /o = 0.3. We compare the parameter uniform maximum pointwise errors in the approximations generated by the direct numerical method of the previous section with the corresponding values of Ufj 192 . Table 1 indicates clearly that the method is Reynolds-uniform for sfeDyU,; since the error stabilises as the Reynolds number increases. We define p^comp by \\D;U£N

N

Pe,comp

[0

&2

\\i-f-TT2N 1J U

\\ y e

and

max^-t/f Pcomp -

OS2 m a x £ \\D-IJ2N

D;u%im\\a»

-

8192 ll U r )V 7 7

~ VB

,»,

llfiiN

-£>-tfI 1 9 2 ||n* _

DyU%W*\\^'

From Table 2 we see that the order of convergence is at least 0.66 for N > 16. Fig 2 shows that the largest error in *JiDyUe occurs, as expected, within the boundary layer region. Similar results also hold for Ue, Ve and D~Ve. The leading edge singularity becomes a problem for D~Ve unless we consider only the subdomain Q.N n [0.2,1.1] x [0, l].The above results show experimentally that the order of convergence is no less than 0.66 for N > 16 and 0.8 for N > 128. 5

Conclusion

We considered Prandtl's boundary layer equations for incompressible laminar flow past a plate with suction/blowing vo. When the Reynolds number is large the solution of this problem has a parabolic boundary layer at the surface of the plate. We constructed a direct numerical method for computing approximations to the solution of this problem using a piecewise uniform fitted mesh technique appropriate to the parabolic boundary layer. We used the method to approximate the selfsimilar solution of Prandtl's problem in a finite rectangle excluding the leading edge of the plate for various values of Re and vo- To analyse the efficiency of the method we constructed and applied a special numerical method related to the Blasius technique to compute reference solutions for the error analysis of the velocity components and their derivatives. By means of extensive numerical experiments we showed that the constructed direct numerical method is (Re, vo)-uniform. References 1. P. Farrell, A Hegarty, J.J.H. Miller, E. O'Riordan, G.I. Shishkin, Robust Computational Techniques for Boundary Layers, CRC Press, (2000). 2. H. Schlichting, Boundary Layer Theory, 7th edition, McGraw Hill, (1951). 3. D.J. Acheson, Elementary Fluid Dynamics, Oxford: Clarendon, (1990). 4. D.F. Rogers, Laminar Flow Analysis, Cambridge University Press, (1992). 5. B. Gahan, J.J.H. Miller, E. O'Riordan, G.I. Shishkin, Reynolds-uniform method for Prandtl's problem with suction-blowing based on Blasius' approach, Numerical Analysis and Its Applications: NAA 2000, Rousse, Bulgaria, June 2000;(L.G. Vulkov, J.Wasniewski and P.Yalamov eds.) / Lecture Notes in Computer Science, Vol. 1988, Springer, (2001).

737

f l M l l f f M l f l v 0

F i g u r e 1: Flow p a s t a p l a t e w i t h s u c t i o n / b l o w i n g

T a b l e 1: C o m p u t e d m a x i m u m pointwise scaled error *Ji\\Dy

Ue —

DyU^92\

n?/rL

w here Ue is

g e n e r a t e d b y (A™) for various values of e, N a n d / o = 0 . 3 .

e\N -u 2-2 2-4 2-6 2-8 2-io 2

2-20 BN

8 9.50e-02 1.87e-01 3.21e-01 3.37e-01 3.37e-01 3.37e-01

16 4.78e-02 9.50e-02 1.87e-01 2.59e-01 2.59e-01 2.59e-01

32 2.41e-02 4.78e-02 9.50e-02 1.63e-01 1.63e-01 1.63e-01

64 1.23e-02 2.41e-02 4.78e-02 9.50e-02 9.89e-02 9.89e-02

128 6.32e-03 1.23e-02 2.41e-02 4.78e-02 5.78e-02 5.78e-02

256 3.35e-03 6.32e-03 1.23e-02 2.41e-02 3.33e-02 3.33e-02

512 1.86e-03 3.35e-03 6.32e-03 1.23e-02 1.89e-02 1.89e-02

3.37e-01 3.37e-01

2.59e-01 2.59e-01

1.63e-01 1.63e-01

9.89e-02 9.89e-02

5.78e-02 5.78e-02

3.33e-02 3.33e-02

1.89e-02 1.89e-02

i_r>tn ut_pmndlLD_y_u'

F i g u r e 2: G r a p h of a n d ^e(DyUE

- DyU%192)

for e = 2 " 1 2 , N = 3 2 a n d f0 = 0.3

T a b l e 2: C o m p u t e d orders of convergence p^co7np, Vwmp f ° r V^(Dy g e n e r a t e d by (A^) for various values of e, N a n d / o = 0 . 3 .

Ue — D% Ur ;B8 1 9 2 ) w here Ue is

e\N 2-° 2-2 2-4 2-6 2-8

8 0.99 0.98 0.78 0.38 0.38

16 0.99 0.99 0.98 0.66 0.66

32 0.98 0.99 0.99 0.78 0.72

64 0.96 0.98 0.99 0.99 0.77

128 0.92 0.96 0.98 0.99 0.80

256 0.84 0.92 0.96 0.98 0.82

2-20

0.38 0.38

0.66 0.66

0.72 0.72

0.77 0.77

0.80 0.80

0.82 0.82

PN

CONSTRUCTING AN OGSA-BASED GRID COMPUTING PLATFORM

WEI JIE, TIANYI ZANG AND ZHOU LEI Institute of High Performance Computing, 1 Science Park Road, Singapore E-mail: [email protected]

117528

WENTONG CAI, STEPHEN J. TURNER AND LIZHE WANG Nanyang Technological University, Nanyang Avenue, Singapore E-mail: [email protected]

639798

Grid is a promising computing platform that integrates resources from different organizations in a shared, coordinated and collaborative manner to solve large-scale science and engineering problems. Recently Grid technologies are evolving towards an Open Grid Services Architecture (OGSA). The OGSA provides a uniform service-oriented architecture and integrates Grid technologies with emerging Web services standards. We are constructing a Grid computing platform based on the OGSA. This platform provides Grid services such as directory service, scheduling services and execution management services to support the execution of various Grid applications. Each Grid service in this platform is viewed as a Web service, and all these services are seamlessly integrated together to form a Grid computing environment. This paper describes the basic requirements and initial design of the architecture of this Grid computing platform, including the essential components of the platform, the key Grid services provided, as well as how these components and services are integrated. The implementation issues of this OGSA-based Grid computing platform are also discussed.

1

Introduction

The Grids [1,2] are positioned as systems that scale up to Internet size environments with resources distributed across multiple organizations and administrative domains. In a Grid environment, resources belonging to different organizations are integrated and work in a shared, coordinated and collaborative manner to solve large scale science and engineering problems. This integration can be technically challenging because of the need to achieve various qualities of service when running on top of different native platforms. The Open Grid Services Architecture (OGSA) [3, 4] is recently presented to address these challenges. The OGSA is a formulation of Grid services as a specialized sub-set of Web services using standard Web service definitions and protocols [6]. It draws from the Globus Toolkit [5], a community-based open source collection of services and software libraries that support Grids, and Web services including SOAP, WSDL and UDDI [8]. The marrying of Globus and Web services meets the demands of an increasingly complex and distributed computing infrastructure: by providing a set of interfaces from which all Grid services are implemented, the OGSA allows for consistent resource access across multiple heterogeneous platforms with local or remote location transparency; it also allows the composition of services to form more sophisticated services without regard to how the services are implemented, and supports integrations with various underlying native computing platforms facilities. The researchers behind OGSA have discussed high-level guiding concepts as well as the lower level interfaces as its foundation. Although OGSA presents a promising framework for Grid computing, no actual OGSA-based Grid computing platform is designed and implemented. In this paper, we present an OGSA-based Grid computing platform. Section 2 describes the basic requirements and initial design of the architecture of this Grid computing platform, including the essential components of the platform, the

738

739

key Grid services provided, as well as how these components and services are integrated. Section 3 discusses the implementation issues. Finally, Section 4 concludes the paper with our future research directions. 2

Architecture

We present the following scenarios to motivate the design of our OGSA-based Grid computing platform. Suppose resources in a Virtual Organization are connected through a Private Virtual Network (PVN). Our Grid computing platform intends to support the execution of parallel programs on a pool of parallel machines managed by different HPC centers. A user submits application through Grid Portal. After the Hosting Environment receives the request, the Factory creates instances of the proper Grid services (subject to the mutual authentication of the user and the factory) in the Grid Service Pool and these services instances work collaboratively to execute the application. The Registry and Mapper in the hosting environment enable users to locate appropriate Grid service. Multiple users may connect to and submit applications through the same Grid Portal (see Figure 1).

Hosting Environment

Figure 1. The OGSA-based Grid Computing Infrastructure

The key components of this OGSA-based Grid computing platform are a set of Grid services in the form of Grid service interfaces. These services include Directory, Scheduling, Security, QoS, Data Management and Execution Management. Instances of Security, QoS, Data Management, and Execution Management services are created for each parallel machine, while instances of Scheduling and Directory services will be

740

created for each Grid Portal. The Scheduling service and the local Load Management Systems will cooperate to schedule the application. The Globus Toolkit is served as the underlying Grid middleware. We now describe in turn the key services provided in this Grid computing environment. • Directory service: the Directory service is responsible for providing both static and dynamic information for the resources in a Grid environment. Although Globus provides some core information services, it is not sufficient for a practical directory information provider. Our OGSA-based Directory service handles information representation and organization, information storage, update and access, as well as how it interacts and integrates with other Grid services. • Scheduling service: the main role of the Scheduling service is to allocate the resources to the applications to achieve high throughput of the resources and best QoS for the applications. There are two layers in our job scheduler: a super job scheduler dispatches jobs at the level of Grid to proper resources, while the local job scheduler is in charge of job running on particular resources within an organization. The Scheduling service needs the information from the Directory service. • Execution Management service: the Execution Management service aims to monitor and control the execution of an application after it is submitted to a Grid environment. This service manages the execution information of an application running in a Grid environment (e.g., information construction, execution information tracing and updating). Execution control mechanisms and strategies, for example, load balancing in a Grid environment, are provided in this service as well. The OGSA only specifies uniform Grid service interfaces and the corresponding semantics for the general Grid services such as Factory, Registry and Notification [4]. We will provide particular service interfaces for each Grid service of this OGSA-based Grid Computing platform. 3

Implementation Issue

The implementation of our OGSA-based Grid computing platform is based on open standard Web services [8]. Grid service description is defined by WSDL, communication adopted by Grid services is based on SOAP protocols, and the information exchange follows UDDI protocols (see Figure 2). All of these mechanisms are XML-based which guarantee Grid services are vendor-independent and interoperable. As a Web service, a Grid service in our platform follows the service-oriented architecture model. A Grid service first publishes its interfaces to a registry. The client then looks up, or discovers, the Grid service from the registry. Last, the client binds to the Grid service to use its services. The hosting environment stands an important role in the OGSA. It not only defines implementation programming model and language, development tools and debugging tools, but also how an implementation of a Grid service meets it obligations with respect to Grid service semantics. Sun J2EE [7] is adopted as the hosting environment in our OGSA-based computing platform. In order to communicate with Globus, the Grid services call Java CoG [9] API for Globus functionality. Java CoG kit defines and implements a set of general components that map Globus functionality into Java framework.

741 4

Conclusion and Future Work

We present a Grid computing platform based on the OGSA. Each component of this platform is viewed as a Web service which may be integrated together to provide the computing framework for the applications. Our future work will mainly focus on the design and implementation of the Grid services, in particular, Directory service, Scheduling service and Execution Management service. The functionality of these services will be developed and these services will be integrated to construct a practical Grid computing platform.

Figure 2. Grid Platform Implementation Based on Web Services

References 1. Foster I. and Kesselman C , The Grid: Blueprint for a New Computing Infrastructure (1999), Morgan Kaufmann. 2. Foster I., Kesselman C. and Tuecke S., The anatomy of the Grid: enabling scalable virtual organizations", International Journal of High Performance Computing Applications 15 (2001) pp. 200-222. 3. Foster I., Kesselman C , Nick J. and Tuecke S., The physiology of the Grid: an Open Grid Services Architecture for distributed systems integration (2002), http://www.globus.org/research/papers/ogsa.pdf. 4. Tuecke S., Czajkowski K., Foster I., Frey J. et al, Grid service specification (2002), http://www.globus.org/ogsa. 5. Foster I. and Kesselman C , Globus: a metacomputing infrastructure toolkit, International Journal of Supercomputer Applications 11 (1997) pp. 115-128. 6. Tony Hey, Unlocking the power of the Grid, IEE Review (2002) pp. 9-12. 7. Information about J2EE is available at http://java.sun.com/j2ee. 8. Curbera F., Duftler M. and Khalaf R., Unraveling the Web services Web - an introduction to SOAP, WSDL and UDDI, IEEE Internet Computing (2002) pp. 8693. 9. Von Laszewski G., Foster I., Gawor J., Smith W. and Tuecke S., CoG Kits: a bridge between commodity distributed computing and high-performance Grids (2000), in the Proceedings of ACM 2000 Java Grande Conference.

AN OGSA-BASED DIRECTORY SERVICE ZHOU LEI, TIANYI ZANG AND WEI JIE Institute of High Performance Computing, 1 Science Park Road, Singapore E-mail:[email protected]

117528

Built on the concepts and technologies of the Grid and Web services communities, Open Grid Services Architecture (OGSA) is viewed as the most promising point in Grid Computing. Our work focuses on OGSA-based directory service research which is the groundwork of Grid computing. This paper describes basic requirement and functionality of a Grid directory service. The architecture of our OGSA-based directory service is discussed. The implementation issues of this directory services are also addressed.

1

Introduction

Grid computing is becoming the mainstream in high performance computing research, which leverages wide-spread sharing and coordinated use of computing resources. The Open Grid Services Architecture [4] was presented to address the challenges in dynamic, heterogeneous, and geographical Grid environment. It is a formulation of Grid services as a specialized sub-set of Web services using standard Web service definitions and protocols [1]. By providing a set of operations from which all Grid services are implemented, OGSA allows consistent resource access across multiple heterogeneous platforms with local or remote location transparency. It also allows the composition of services to form more sophisticated services regardless of how the services are implemented, and supports integration with various underlying native computing platforms facilities. Directory service plays an important role in a computational Grid middleware, providing fundamental mechanism for discovery and monitoring, and thus for planning and adapting behavior [3]. Core Grid services, such as super-scheduler, execution service, and performance diagnosis, highly depend on directory information sources. In this paper, we show our contribution on Grid directory service based on OGSA and Globus which is a de facto Grid computing standard. The requirement and functionality of Grid directory service are depicted briefly in Section 2. The architecture of our OGSA-based Grid directory service architecture is described in Section 3. In Section 4, the implementation issues are addressed. Finally, Section 5 concludes the paper and further work is discussed.

2

Requirement and Functionality

As Grid information is dynamic, heterogeneous and geographically distributed, a Grid directory service should have the following basic properties [6]: • Access to static and dynamic information, • Providing a generic description/discovery/monitoring mechanism for heterogeneous system resources and services, • Integration of wide-area information, • Support for the other core Grid services, such as resource allocation service, execution service, and super-scheduler.

742

743

The basic functionality provided by Grid directory services includes: Information Monitoring: monitor static and dynamic information in a Grid platform, such as real-time status of dynamic resources. • Information Discovery: discover system resources and all kinds of Grid services. • Service API: provide service API for high level services or applications. A Grid user or other Grid services can access these functions through the serviceoriented model. •

3

Architecture

In our directory service, there are two critical components: Basic Information Service Component (BISCO) and Aggregate Information Service Component (AISCO). BISCO is a Web service which communicates with Globus MDS to obtain various kinds of system information in a Grid environment. The information is then converted into XML format. The interaction between BISCO and Globus can be any language supported by Globus, such as C and Java. The relationship is depicted in Figure 1.

3

XML Schema

Globus API

Globus MDS

BISCO Figure 1: BISCO-Globus Interaction Model AISCO is a higher level Web service which extracts XML format system information from BISCO components. Meanwhile, AISCO also responds diverse queries from Information Service users or other services. As Web services, BISCO and AISCO follow the service-oriented architecture model shown in Figure 2. In this model, BISCO first publishes its interface to a registry. AISCO then looks up, or discovers, BISCO from the registry. Finally, AISCO binds to the Web service in order to use its services.

Figure 2: BISCO-AISCO Interaction Model

744

A directory service user accesses Grid information by a single entry point which is called Endpoint. Endpoint invokes security service to obtain authority for the user, then, directory service, which is usually a high level AISCO, and begins to deal with various user queries. 4

Implementation Issues

Considering Globus is the de facto standard for Grid computing, we make fully use of the facilities Globus package provided as underlying support. For instance, BISCO catches system information from Globus MDS and the directory service checks security from Globus API. The definition of the directory services follows the two rules: Firstly, separate XML namespaces are adopted for the five basic parts of WSDL definition, i.e. types, messages, portTypes, bindings, and service descriptions. Splitting up the definitions across namespaces allows us to expand easily any higher level WSDL definition with WSDL import construct. It brings reusability and flexibility for our further work. Secondly, service description follows the OGSA specification [7] presented by the Globus group and IBM as they have done a lot of fundamental work for OGSA research. This improves interoperability and scalability among the Grid services of our Science and Engineering Research Grid (SER-Grid) platform. A Web service is instantiated within a special execution environment, which is called hosting environment. The most important thing for Web service development is to determine which hosting environment should be used. We choose J2EE container as the execution environment of the directory services. J2EE defines not only programming language and development tools, but also how to communicate with Globus. Java is the implementation language and Java CoG, provided by Globus project, is the interface with Globus.

Endpoint Tier

!

J2EE Container

I

Figure 3: The implementation of Directory Services based on J2EE

Globus

745

The implementation of the directory service based on J2EE is shown in Figure 3. Each Grid service is a multi-tier system: Client tier, Endpoint tier, J2EE Container, and Globus tier. JAX-RPC servlet endpoint dispatches all of user requests to J2EE container tier, which is made up of J2EE EJB to accomplish the business logic of services. Java CoG API is the glue to combine J2EE container with Globus tier. 5

Conclusion and Future Work

We presented a directory service based on the OGSA, which is viewed as a Web service integrated together with Globus, the de facto Grid computing standard. Although its design is vendor-independent, we implement it by pure Java language with J2EE hosting environment. All the business logic is described in the information Enterprise Java Beans, which are hosted in J2EE container. Information EJB collects system information by Globus Cog API. This Directory service serves as a fundamental component in our Science and Engineering Research Grid (SER-Grid) platform. In order to achieve a highly efficient and easy-to-use Grid computing environment, there is a lot of work under way. The further system refining, such as the definition of XML vertical schemas and WSDL, is ongoing to improve reusability. A user-friendly interface of the directory service is one of our concerns. Also, we are developing OGSAbased Scheduler Service and Execution Management Service. References 1. Christensen E., Curbera F., Meredith G., and Weerawarana S., Web Services description language (WSDL) 1.1. Technical report, W3C. 2001. 2. Curbera F„ Duftler M. and Khalaf R., Unraveling the Web Services Web - An Introduction to SOAP, WSDL and UDDI, IEEE Internet Computing, 86-93, March 2002. 3. Czajkowski K., Fitzgerald S., Foster, I., and Kesselman, C , Grid information services for distributed resource sharing. In 10th IEEE International Symposium on High Performance Distributed Computing, pages 181-184, San Fransisco, CA, August 7-9, 2001. IEEE Press. 4. Foster I., Kesselman C , Nick J., and Tuecke S., The physiology of the grid: An open grid services architecture for distributed system integration. Technical report, Globus Project, 2002. 5. Foster I., Kesselman C , and Tuecke S. The Anatomy of the Grid: Enabling Scalable Virtual Organization. International Journal of High Performance Computing Applications, 15(3). 200-222. 2001. 6. Laszewski G. V., Foster I., Gawor J., Schreiber A., Pena C. J. InfoGram: A Grid Service that Supports Both Information Queries and Job Execution. High Performance Distributed Computing (HPDC-1I), Edinburgh, Scotland, July 24-26, 2002. 7. Tuecke S., Czajkowski K., Foster I., Frey J., Graham S., and Kesselman C. Gird Services specification. Technical report, Globus Project, 2002.

GRID RESOURCE MANAGEMENT INFORMATION SERVICES FOR SCIENTIFIC COMPUTING H. N. LIM CHOI KEUNG, D. P. SPOONER, S. A. JARVIS AND G. R. NUDD University of Warwick, Coventry CV4 7AL, England, U.K Email: [email protected] LIZHE WANG Nanyang Technological University, Nanyang Avenue, Singapore Email: [email protected]

639798

WEIJIE Institute of High Performance Computing, 1 Science Park Road, Singapore Email: [email protected]

117528

Scientific applications typically have considerable memory and processor requirements. Nevertheless, even with today's fastest supercomputers, the power of available resources falls short of the demands of large, complex simulations and analysis. The solution fortunately lies in the emergence of innovative resource environments such as Grid computing. This paper addresses issues concerning the discovery and monitoring of resources across multiple administrative domains, which can be harnessed by scientific applications. Grid Information Services are an important infrastructure in computational Grid environments as they provide important information about the state and availability of resources. We will present how the Globus Monitoring and Discovery Service (MDS) architecture can be extended to offer a transparent, unified view of Grid resources. Local resource managers ultimately possess the potential to influence where scientific jobs are processed, depending on the availability and load of their scheduler. Consequently, information about the structure and the state of schedulers should be provided to Grid brokers, which will then decide where to schedule the applications. This paper will present the way in which low-level scheduling information is collated from multiple sources and is incorporated into the unified Grid information view. The existence and availability of Grid resource information also allows reliable application performance prediction to be carried out using the Warwick PACE (Performance Analysis and Characterisation Environment) system. Thus, Grid Information Services are a key component in the efficient execution of scientific applications on Grid computing architectures.

1

Introduction

The information services framework used in the global effort to create a Grid infrastructure is the Monitoring and Discovery Service (MDS) [4] from the Globus Toolkit. Its structure consists of a number of configurable information providers (Grid Resource Information Services) and configurable directory components (Grid Index Information Services) [3]. This existing framework has been installed and is being tested at the University of Warwick within the context of developing performance-oriented Grid middleware. An initial stage of the work consisted in defining how various components act individually and collaboratively to serve the execution requirements of scientific applications [5,2]. The process of scheduling scientific applications to be run on the Grid involves many steps, one of which is to obtain exact state information from local schedulers. The scheduler which is used in this work is Titan, a local-area workload manager [6]. Each instance of the scheduler manages a resource pool where the characteristics of the set of

746

747

schedules and those of the resources are highly dynamic. In the Grid context, it must be possible for this kind of local information to be propagated and made available to other remote administrative domains. Only then can the superscheduling of scientific applications take place, based on the local scheduling information of multiple sources. In this context, an information provider is a service that provides a subset of useful information about resources participating in the Grid. Moreover, the structure of the MDS offers a unified solution to the distributed nature and fail-prone information providers. There is a need for information services to be as distributed and decentralised as possible, with providers located on or near the entities they describe [3]. Therefore, it is reasonable to have the scheduler act as an information provider or to have a database near the scheduler providing such services.

2

Titan Scheduler as an Information Provider

Having the Titan scheduler as an information provider increases the likelihood of obtaining dynamic and reliable information about available resources. Likewise, the role of the Grid information service is to focus only on the efficient delivery of state information from a single source, that is the particular information provider. One of the ways in which scheduler information can be made available to the MDS is by speculative evaluation [8] where information from the scheduler is generated at a regular interval. This information is placed in a local backend database which the GRIS can access upon request, as shown in Figure 1. GRIS host

Front End containing -

••::•:•

•CtWCW -

:::: :*lapdBar»r

Q R I S b Ktakvi-

LDA >Data ntarc hang* Forma t (LDIF)

forkOarv i «MC() •

........ i:4_

• J

ProvUar*

Re ad during Search Request •

Pariodic Dynamic Writs

Figure 1: Reading and Writing Dynamic Scheduling Information

This method has been implemented on the Grid testbed at the University of Warwick and it has both advantages and disadvantages. The benefit is that the scheduler itself is not overloaded, since a central repository is accessed for the values of scheduling

748

attributes. Moreover, if the scheduler fails, the latest values of scheduling attributes are still accessible. On the other hand, the data in the backend database is very dynamic and hence, the scheduler might have a very high write frequency and a comparatively low read frequency. However, it is found that since the database is local to the scheduler on the backend of the GRIS server, this does not affect the way its information is pulled from the aggregate directory. The backend database is relational (Postgres) [9], allowing read and write transactions to be handled efficiently. A number of information providers are then created using JDBC (Java Database Connectivity) [7] to access particular fields in the database, corresponding to specific scheduling attributes. The result is a hierarchy of virtual organisations sharing resource information.

3

Other Alternatives for the Information Providers

A slight variant on the above implementation was to use an LDBM (LDAP Database Manager) database [10] instead of the relational one. This allows the GRIS to pull dynamic information from information providers which access this LDBM database in a speculative evaluation method. The information providers for the scheduler are then written using JNDI (Java Naming and Directory Interface). In this case, the method used to access a GIIS would be similar to that retrieving information from the scheduler. But instead of transactions, the scheduler would use commands of this type: ldapmodify -x -h lab-68.cslab -D "Mds-Software-deployment=Titan scheduler,Mds-Voname=local,o=grid" -f modify-makespan.ldif The advantage of using an LDBM database backend for the scheduler is that there is no conversion from LDIF to other data formats, thus keeping the data model uniform. Yet another method which could be used on its own or concurrently with the speculative evaluation method above, is the eager evaluation. This method focuses on caching data which is generated when a search request is first received. Therefore, the scheduler information which the GRIS has accessed from its backend database can be stored in cache for a configurable amount of time. There are a number of advantages with this method: the load on the GRIS host is reduced and the time taken to service a search request is less. However, the drawback lies in the relative staleness of the information. At the other end of the spectrum, there is lazy evaluation where the scheduler generates information only when a search request is received by the GRIS. This method provides the most up-to-date information but the service time increases as well. Another cost is the increased load on the GRIS host, which each service request carries. To obtain dynamic information from the scheduler, the speculative evaluation method is the most appropriate. Up-to-date dynamic information is required, hence making a purely eager evaluation infeasible; on the other hand, a lazy evaluation will only increase the load on the GRIS host. Consequently, a speculative evaluation method is judged as the most appropriate in this case.

4

Conclusions

The work discussed in this paper addresses how information is pulled by a GRIS from a scheduler for attributes including makespan [6], queue length and whether the genetic

749

algorithm is switched on [6]. Since each scheduler monitors a number of resources, there could be a number of schedulers providing important resource information to the MDS. The GIIS is responsible for aggregating this information from a number of schedulers and presenting it in a unified way. Being able to access dynamic information across administrative domains, eventually enables resource-sharing. The knowledge of resource status and availability within the resulting virtual organisation is crucial in determining which resources are involved in performance prediction using PACE [1]. The PACE performance model and the application deadline requirements are then used to calculate the execution time for the particular application on the matched resources. The collation of resource information from multiple geographically-dispersed sources is also very useful in scheduling scientific applications over a large number of resources, thus decreasing the cost of the processing and the total execution time.

5

Acknowledgements

This work is sponsored in part by grants from the NASA AMES Research Centres (administered by USARDSG, contract no. N68171-01-C-9012) and the EPSRC (contract no. GR/R47424/01).

References 1. A.M. Alkindi, D.J. Kerbyson, and G.R. Nudd. Dynamic Instrumentation and Performance Prediction of Application Execution. Proceedings of High Performance Computing and Networking (HPCN2001), Lecture Notes in Computer Science, Volume 2110, Springer-Verlag, Amsterdam:313-323, June 2001. 2. J. Cao, S.A. Jarvis, S. Saini, D.J. Kerbyson, and G.R. Nudd. ARMS: An Agentbased Resource Management System for Grid Computing. In Scientific Programming Special Issue on Grid Computing, 2002. 3. K. Czajkowski, S. Fitzgerald, I. Foster, and C. Kesselman. Grid Information Services for Distributed Resource Sharing. Proc, 10th IEEE International Symposium on High-Performance Distributed Computing (HPDC-10), 2001. IEEE Press. 4. S. Fitzgerald, I. Foster, C. Kesselman, G. Laszewski, W. Smith, and S. Tuecke. A Directory Service for Configuring High-Performance Distributed Computations. 5. H.N. Lim Choi Keung, J. Cao, D.P. Spooner, S.A. Jarvis, and G.R. Nudd. Grid Information Services using Software Agents. Eighteenth Annual UK Performance Engineering Workshop (UKPEW'2002) pp. 187-198 6. D.P. Spooner, J. Cao, J.D. Turner, H.N. Lim Choi Keung, S.A. Jarvis, and G.R. Nudd. Localised Workload Management using Performance Prediction and QoS Contracts. Eighteenth Annual UK Performance Engineering Workshop (UKPEW'2002) pp. 69-80 7. http://jdbc.postgresql.org/ 8. http://www.globus.org/mds/ 9. http://www.postgresql.org/ 10. http://www.openldap.org/

AN OPEN PRODUCER AND CONSUMER MANAGEMENT SYSTEM FOR GRID ENVIRONMENT

TIANYIZANG

ZHOU LEI

WEI JIE

Institute of High Performance Computing, 1 Science Park Road, Singapore 117528 E-mail: zangty@ihpc. as tar. edu. sg Grid technologies have been widely adopted and are entering the mainstream in scientific and engineering computing. Grid technologies and infrastructures support the sharing and coordinated use of diverse resources that are geographically distributed components operated by distinct organizations and individuals with different policies. In a Grid environment, a substantial amount of runtime information requires to be gathered and delivered for various Grid services such as performance analysis, performance tuning, performance prediction, fault detection, co-scheduling and re-scheduling. Because the characteristics of runtime information are fundamentally different from the characteristics of system or program-produced data, systems that collect and distribute the information should satisfy certain requirements. Exploiting producer/consumer model, we design an open producer and consumer management framework to fit with the heterogeneous, dynamic and multi-domain nature of Grids. The security and fault-tolerance mechanisms are incorporated into the framework. Both producer and consumer are designed as Grid services and implemented using Web services technology. Based on the open framework, the management system for Grid applications monitoring is instantiated in our Grid computing project.

1

Introduction

Grid technologies are evolving toward an Open Grid Services Architecture (OGSA) [1,2]. OGSA provides a uniform service-oriented architecture and integrates Grid technologies with emerging Web Services standards. In a Grid environment, management of producer and consumer is critical for enabling Grid computing. The design of producer and consumer management system is made challenging by the characteristics of Grid performance data [4]. Furthermore, the diversity of resources involved and the dynamic nature of Virtual Organization membership also make it difficult Several groups are developing Grid monitoring systems [5,6]. They have recognized the need of interoperability between these systems. The Global Grid Forum Performance Working Group has presented a Producer and Consumer model [4]. But, some crucial issues, such as component creation, control and coordination, are not addressed. In this paper, we present a producer and consumer management system for Grid on the basis of Producer/Consumer model. Both the Producer and Consumer are viewed as Grid Services and implemented as Web services. A Two-stage Producer and Consumer Interaction Protocol (TPCIP) is proposed. Speaking this protocol, the Producer and the Consumer collect and distribute performance data in a Grid environment. The remainder of this paper is organized as follows: Section 2 describes system architecture and TPCIP protocol. The interfaces of the Producer and the Consumer are also presented. Section 3 discusses the OGSA-based implementation of the Producer and the Consumer Grid Services. Finally, Section 4 concludes the paper and outlines our future work directions.

750

751 2

Architecture

A Grid producer and consumer system is different from a general monitoring system in that it must be scalable across different network domains of organizations and encompass a large number of heterogeneous resources. The system must be able to ensure performance data integrity and preserve the access control policies imposed by the owners of the data. Additionally, since the searches in a Grid space have unpredictable latencies that may impact request for performance information, performance information sources discovery needs to be separated from actual performance information transfer. Our producer and consumer management system consists of three types of components as shown in Figure 1. A Directory Grid Service supports information publication and discovery and initiates the communication between the data source and sink. A Producer Grid Service makes performance event data available to other components that are part of the management system. A Consumer Grid Service receives performance event data from a Producer Grid Service.

Figure 1 The Producer and consumer Framework for a VO Grid An event is a typed collection of data with a specific structure that is defined by an event schema. Every event has an associated event type that uniquely identifies the structure for a particular event. 2.1

Producer/Consumer Interaction Protocol

A Two-stage Producer and Consumer Interaction Protocol (TPCIP) is presented to supports to transfer performance data between Producers and Consumers in different interaction modes. • TPCIP-Negotiation In order to guarantee QoS of Grid service invocation and determine interaction mode, some operations should be carried on in advance. Either a Producer or a Consumer may act as an initiator to activate the operations as shown in Figure 2. Initiator lookups Directory Service and requests to create instances of Producer and Consumer services peer. These instances make further negotiation on the benefits of initiator. According to the specific access control policies and different performance detail, the Producers and the Consumers should carry on mutually authentication and authorization and check the delegation of credential. Depending on the invocation semantics the parameters is configured to set up the interaction model, such as subscription, query or notification. • TPCIP-Execution The Producer directly sends one or more performance events to the Consumer in the interaction way determined in the first stage of TPCIP.

752

The control messages and the date messages are used in the TPCIP-Negotiation stage and TPCIPExecution stage respectively. Two stages of TPCIP may map into different wire protocols. Indicate [•'.em to

Oeate
^ • I'.vent type

-

• I:vent schema

*

Configure interaction model • Destination • [Encode or encrypt events • Interval • Buffer si/e • Timeout • Wire protocols • Initial lifetimes • Termination conditions

- » •

Check access control • Authentication • Delegation of credential • Authorization

Ignite participants

J _ Transfer Event Data

TPCIP-Negotiation 4

I TPCIP-Execution : •

Figure 2 TPCIP Behaviors 2.2

Producer/Consumer Interfaces

Using the Producer and the Consumer interfaces to speak TPCIP protocol, the Producer and the Consumer make agreement to collect and distribute the performance data. The implementation of TPCIP protocol allows multiple wire protocol bindings. These interfaces perform the behaviors of TPCIP shown in Figure 2. 3

Implementation Issues

Using the Java CoG Kit [3] that is maintained as part of the Globus Project, we have implemented a prototype of the producer and consumer management system on the top of Globus in Java. The MDS of Globus is served as the Directory Grid Service to publish the existence of the Producer and the Consumer Grid Services. Both the Producer and the Consumer are viewed as OGSA-based Grid Services and they are implemented as Web services. Besides the interfaces inherited from the standard OGSA Grid Services [7], the Consumer and Producer services define their specific interfaces which has been described in Section 2.2. The interfaces speak TPCIP protocol to collect and distribute performance events between the Producer and the Consumer. 3.1

Extension ofEntry Object

An OGSA-based Grid Service should adhere to standard Grid Service interfaces and behaviors specified in [7]. In addition, in order to speak the TPCIP protocol, the entry objects of the Globus MDS are expanded by adding the following three kinds of entry objects. • Event entry object consists of two elements, event type and event schema. Event schema is a structure that collects different typed fields. • Producer entry object consists of the elements related the access control, interaction model, wire protocol and so oa • Consumer entry object consists of elements including the consumer URL and the elements correspond to like-named elements in the Event Producer Entry Object.

753 3.2

Specific Interfaces Implementation

The fundamental pattern used to speak the TPCIP protocol is of a server and a client communicating through a client-side proxy, which makes invocation of the remote method seem, to the client, like a local method call. Both Producers and Consumers may act as servers. The events are sent from the Producer to the Consumer using a separate object, called an EventPipe. Producers call PushEvent to send events in the pipe, and Consumers call PullEvent to receive the events. Control messages are transferred using the SOAP-HTTP binding. All control messages are SOAP messages. The wire protocol for data messages is selected dynamically depending on the specific application, for example, SOAP+HTTP data messages. The event data is encoded in XML. 4

Conclusions and Future Work

We present an open producer and consumer management system that incorporates the TPCIP protocol. It provides the interoperability between events Producers and Consumers across different domain and supports collection and distribution of performance event data in the VO Grid environment. We are working to create formal WSDL description of the interfaces between the Producer/Consumer Grid Services and monitoring sensor/rink, and integrate more sophisticate monitoring sensors/rinks and performance management tools into the producer and consumer management system. References 1 2

3

4 5

6

7

Foster I., Kesselman C, Nick J. and Tuecke S., Grid Services for Distributed System Integration, Computer. 35 (2002), pp. 37-46. Foster I., Kesselman C, Nick J. and Tuecke S., The Physiology of the Grid: An Open Grid Services Architecture for Distributed Systems Integration. (2002) http://www. globus.org/research/papers/ogsa.pdf. Laszewski G. von, Foster I., Gawor J., Smith W. and Tuecke S., CoG Kits: A Bridge between Commodity Distributed Computing and High-Performance Grids, in the Proceedings of ACM 2000 Java Grande Conference. (2000) pp. 97-106. Tiemey B., Aydt R., Gunter D., Smith W., Swany M., Taylor V. and Wolski R., A Grid Monitoring Architecture. http://www-didc.lbl.gov/GGF-PERF/GMA-WG/. Tiemey B., Crowley B., Gunter D., Holding M., Lee J. and Thompson M. A Monitoring Sensor Management System for Grid Environments, in the Proceedings of the IEEE High Performance Distributed Computing Conference. (August 2000). Wolski, Spring N. and Hayes J., The Network Weather Service: A Distributed Resource Performance Forecasting Service for Metacomputing, Future Generation Computing Systems. (1999) http://nws.npaci.edu/. Tuecke S., Czajkowski K., Foster I., Frey J. et al, Grid Service Specification. (2002) http://www.globus.org/ogsa

REPLICA SELECTION FRAMEWORK FOR BIO-GRID COMPUTING LIZHE WANG, WENTONG CAI, BERTIL SCHMIDT AND BU-SUNG LEE Nanyang Technological University, Nanyang Avenue, Singapore 639798 E-mail:(pg0246397'3, aswtcai, asbschmidt, [email protected] WEIJIE Institute of High Performance Computing, 1 Science Park Road, Singapore 117528 E-mail: [email protected] Computing grid is a promising platform that provides plenty of resource for large scientific computing applications. To achieve the performance of applications in the grid environment, careful resource management is needed. Protein alignment analysis, a typical bio-grid computing application, is a computing intensive, data parallel application. It needs data storage resources and large computing resources. This paper discusses a replica selection framework for bio-grid computing and demonstrates that the application of protein alignment analysis can benefit from the framework.

1

Introduction

Now, computing grid has been the most promising computing platform for large scientific applications. Computing grid infrastructure middleware, such as Globus [2], has been developed to support application users with grid services, e.g., resource management and grid information services. However, on top of these services, some higher-level supports are still needed to achieve better performance. Bio-grid computing can benefit greatly from the computing grid [1]. The application of protein alignment analysis [3,4], a typical bio-grid computing application, is a common and often repeated work in the field of molecular biology. This application is to find similarities between a particular query sequence and all sequences in Protein Banks. Processing different protein alignments, different protein datasets in different Protein Banks are needed. The Protein Banks may be large scale and geographically distributed. Thus, the computing nodes that carry out the protein alignment analysis should select the proper Protein Banks that give the minimum communication delay, for replica. It is called replica selection. In this paper, a replica selection framework for protein alignment analysis is discussed. The replica selection framework is developed to select the nearest data storage resources for protein alignment processing dynamically. Since Globus has been the de Facto standard of Grid computing, our work is developed on the platform of Globus Toolkit 2.0 and several popular resource management systems. This paper is organized as follows: in section 2, we introduce the background of protein alignment analysis application. Section 3 describes the replica selection framework. In section 4, we present some preliminary performance result of the framework. We conclude our work in section 5. 2

Protein Alignment Analysis

Protein alignment analysis allows biologists to point out sequences sharing common subsequences. From a biological point of view, it leads to the identification of similar functionalities. The need for speeding up this application comes from the exponential

754

755

growth of the bio-sequence Protein Banks: every year their size scaled by a factor 1.5 to 2 [3]. The application is based on SPMD (Single Program Multiple Data) concept. Thus, it is implemented using the master-slave model. The master is responsible for dividing the tasks and scheduling the tasks and slaves execute the tasks (see figure 1).

Fig. 1 Master-slave paradigm According to the tasks received, slave selects the nearest proper Protein Banks and makes a copy of datasets from of the Protein Banks (see figure 2). These copies of protein alignment paradigms are used for protein alignment comparison. When the task is finished, the results are sent back to the master. 3

Replica Selection Framework

After the task is submitted to the slave node, the slave node makes a replica selection for running the task. Different types of task may require different data sets for protein alignment analysis. These data sets are stored in different Protein Banks that locate in geographically distributed sites. An LDAP [5] server is configured to present replica directory service. Replica catalog supports to register the Protein Banks as logical collections of data sets and provides mappings between data sets and Protein banks. The replica selection in the slave node follows these steps (see also figure 2): (1) Get information of tasks submitted to the slave node and determine the data sets needed for tasks execution. (2) Query LDAP server to locate the proper Protein Banks that can contain the data sets needed. (3) From each proper Protein Bank that contains the required dataset, the peer-topeer communication performance is tested between the slave and the protein bank. (4) The slave node selects the Protein Banks that give minimum data transfer time as the replica source.

756

Jtr--tf t e i n Bank k

*-, Tasks

Li d

SLAVE

t a

set s

| i L

T

^ LDAP server

Re p i i c a Cat a 1 og

Figure 2 Replica Selection Frameworks

4

Performance Evaluation

In order to evaluate the performance of the replica selection, a test bed is configured. It includes three clusters in NTU (Nanyang Technological Univ., Singapore), one cluster in IHPC (Institute of High performance Computing, Singapore) and one single node in Osaka University (Japan). There is a dedicated network link between NTU and OSAKA University. An LDAP is configured in NTU for replica service. Two Protein Banks are set at NTU and IHPC respectively. Globus Toolkit is installed on the head nodes of clusters and the single node in Japan. Globus GRAM will communicate with the local job manager (e.g., Condor) about on scheduling tasks within a cluster. Table 1 shows the replica placement for the Protein Banks. Now, a single node in the Osaka Univ., as an example, is the slave node and replica selection result is shown in Figure 2. The network performance in the test bed is shown in Figure 3. Protein Bank 1 2

Site NTU IHPC

Protein Bank 1 2

Replica placement datasetl, dataset2, dataset5 datasetl, dataset3, dataset4

Table 1 Replica placements

757

•

(Xi i n ! cati en pa-fcrrarrje betvean NRJ arrJOakalriv.

^

ccmric4icn pa-faraTE betvem IHC andCtekaUiv.

Q5

1

2

3

4

5

6

7

8

i rtfec

Figure 3 Communication Performance Task index 1 2 3

Replica requirement datasetl dataset2 dataset4

Replica decision Protein Bank 1 Protein Bank 1 Protein Bank 2

Table 2 Replica Requirement and Selection 5

Conclusion

The objective of replica selection framework is to help bio-information applications achieve better performance in the grid environment. Thus, the communication delay is minimized and the performance is improved. 1. BioGRID project in EUROGRID project, http://biogrid.icm.edu.pl/, 2002. 2. Globus Project, http://www,globus.org. 3. Bertil Schmidt, Heiko Schroder and Thambipillai Srikanthan, A SIMD Solution to Biosequence Database Scanning, in Proceedings of PaCT'2001, Lecture Notes in Computer Science 2127, Springer 2001, pp. 498-509. 4. Lizhe Wang, Wentong Cai, Bertil Schmidt, et. al. Bio-grid Computing Platform: Parallel Computing for Protein Alignment Analysis, to appear 6* in International Conference on High Performance Computing in Asia Pacific Region (HPC Asia), 2002. 5. T. A. Howes and M. C. Smith. LDAP Programming Directory-Enabled Application with Lightweight Directory Access Protocol. Technology Series. MacMillan, 1997.

A GRID TESTBED SUPPORTING MPI APPLICATIONS WEI JIE, ZHOU LEI AND TIANYI ZANG Institute of High Performance Computing, 1 Science Park Road, Singapore 117528 E-mail: [email protected] LIZHE WANG Nanyang Technological University, Nanyang Avenue, Singapore 639798 E-mail: [email protected] Computational Grid is becoming a new platform for large scale science and engineering applications. We have constructed a Grid testbed using Globus toolkit and MPICH-G2 as the underlying middleware. This paper discusses the architecture of the Grid testbed and how this testbed supports the execution of MPI applications in a Grid environment. To evaluate the performance of MPI applications running on this Grid testbed, experiments were conducted and the results were analyzed.

1

Introduction

Computational Grid [1,2] involves heterogeneous collections computers that may reside in different administrative domains, run different software, be subject to different access control policies, and be connected by networks with widely varying performance characteristics. To run MPI applications in a Grid environment, MPICH-G2 [4], a Gridenabled implementation of the Message Passing Interface (MPI) [3], is developed that allows users to run this kind of applications across multiple computers at different sites using the same commands that would be used on a parallel computer. MPICH-G2 works with the Globus toolkit [5] and uses the services provided such as information, security, resource management, communication services, etc. We have constructed a Grid testbed for MPI applications using Globus toolkit and MPICH-G2 as underlying middleware. Example MPI application was executed on this testbed and its performance was evaluated. In this paper, firstly the architecture of this Grid testbed is described in Section 2. Section 3 discusses how to run an MPI application on this Grid testbed. The performance study of example MPI application is presented in Section 4. Finally, Section 5 concludes this paper and outlines our future work directions. 2

Grid Testbed Architecture

The architecture of the Grid testbed is depicted in Figure 1. A user submits an MPI application through Grid Portal. The MPICH-G2 receives the request and invokes the Globus Dynamically Updated Request Online Co-allocator (DUROC) service [5]. The DUROC can allocate multiple resources simultaneously for the sub-jobs of an MPI application. Before resource co-allocation, the DUROC needs to talk to the Globus Gatekeepers for user authentication. The gatekeepers then contacts their respective Job Managers (which in turn ask Local Resource Managers) to create and distribute sub-jobs of the application to appropriate computing resources. The DUROC also verifies correct startup of these sub-jobs and coordinates their parallel executions which may span multiple domains and resources.

758

759

The execution of MPI applications in a Grid environment can be made more intelligent through a Resource Broker. With the help of Globus Grid information services [5], i.e. Grid Index Information Service (GIIS) and Grid Resource Information Service (GRIS), the Resource Broker tries to find appropriate resources to meet applications' resource specifications.

Ki'Muuee Specification

* * «iu?' :.-. i i

Gulckccjx'i >r Job M insiger

Mi'iiiiiinng & I I'lilrol

Site 2

Figure 1. Grid Testbed Architecture

We have constructed a Grid testbed based on the architecture described above (so far the Grid Portal and the Resource Broker are not incorporated into this Grid testbed). Currently this testbed includes 2 clusters from IHPC (Institute of High Performance Computing), 1 cluster from NUS (National University of Singapore) and 1 cluster from NTU (Nanyang Technological University). Globus Toolkit and MPICH-G2 are deployed on the head node of each cluster. In addition, some popular local resource management systems (e.g. Condor, PBS and Sun Grid Engine) are installed in these clusters. Table 1 summarizes the configuration of this Grid testbed.

760 Table 1. Grid Testbed Configuration Organization IHPC

NUS NTU

3

Resource Linux Cluster (13 nodes): IBM xSeries 330, Intel PHI 933 Linux Cluster (17 nodes): IBM xSeries 330, Intel PHI 933 Linux Cluster (35 nodes): IBM xSeries 330, Intel PHI 933 Linux Cluster (6 nodes): Intel PII450

Local Resource Manager Condor Sun Grid Engine Condor PBS

Running MPI Applications on Grid Testbed

The execution of an MPI application on the Grid testbed using MPICH-G2 follows the following steps: • Compile the application on each machine using the appropriate MPICH-G2 compiler and link to the MPICH-G2 library to generate the executables on each machine. • Construct a Globus RSL (Resource Specification Language) [4] script which specifies the sub-jobs and the resources required to run them. A sample RSL file is shown is Figure 2. • Submit the RSL file directly to mpirun command provided by MPICH-G2 and start the execution of the MPI application. + (&(resourceManagerContact="tornado.ihpc.nus.edu.sg") (count=10) (label="subjob 0") (environment=(GLOBUS DUROC_SUBJOB_INDEX 0) (GLOBUS..TCP_PORT.._RANGE "1024 1050")) (arguments="/home/mems/demitube.inp") (directory=/home/mems) (executable=/home/mems/pbcgmg)

) (&(resourceManagerContact="pprg21.sas.ntu.edu.sg") (count=5) (label='subjob 1") (environment=(GLOBUS_DUROC SUBJOBJNDEX 1) (GLOBUS__TCP_PORT. RANGE "1024 1050")) (arguments="/home/mems/demitube.inp'") (directory=/home/mems) (executable=/home/mems/pbcgmg)

)

Figure 2. A RSL File As each cluster is behind the firewall of its belonging organization, the issue of firewall arises when running MPI application on the Grid testbed. Our basic solution is to create a small hole of port numbers in the firewall and to specify that port range with the environment variable GLOBUS_TCP_PORT_RANGE in the RSL script (see Figure 2). 4

Performance Study

We tested an MPI application, i.e. Incompressible Navier-Stokes Solver code, on the Grid testbed. This application is used to solve incompressible flow on domain with moving boundaries. The scale of the problem can be adjusted by tuning a parameter of the application. For a certain scale of the problem, the application is executed on the Grid

761

testbed and IHPC 13 nodes cluster respectively. The ratio between the execution times is presented in Figure 3.

o10-i

.2 4

1n

2n

3n

4n

Problem Scale

Figure 3. Ratio of Execution Time under Different Scales of Problem Figure 3 reveals that the ratio of execution times decreases when the scale of the problem becomes larger. It indicates that the Grid testbed is more attractive for solving large scale computation intensive problems. From the trend of the curve in Figure 3, we may predict that the performance of the Grid testbed can be better than that of the local cluster used in the experiment when the scale of the problem increases to a certain point. Actually this is a major merit of a Grid system when it is used to solve large scale problems to which a single computing system usually is uncompetitive. 5

Conclusion and Future Work

Using Globus toolkit and MPICH-G2, we constructed a testbed supporting MPI applications in a Grid environment. Our experimental results show that a Grid system is competitive to solve large scale computation intensive problems. In future work, we will develop a Grid Portal which provides a Web-based problem-solving environment that integrates Grid services such as real-time resource monitoring, job submission, file transferring and so on. The Resource Broker will be incorporated into the Grid testbed to support a smart resource discovery and allocation. In addition, more MPI applications with a variety of problem scales will be tested and their performance will be evaluated. References 1. Foster I. and Kesselman C , The Grid: Blueprint for a New Computing Infrastructure (1999), Morgan Kaufmann. 2. Foster I., Kesselman C. and Tuecke S., The anatomy of the Grid: enabling scalable virtual organizations, International Journal of High Performance Computing Applications 15 (2001) pp. 200-222. 3. Groupp W., Lusk E., Doss N. and Skjellum A., A High-performance, Portable Implementation of the MPI Message Passing Interface Standard, Parallel Computing 22 (1996) pp. 789-828. 4. Foster I. and Karonis N., A Grid-Enabled MPI: Message Passing in Heterogeneous Distributed Computing Systems, in Proceedings of Supercomputing 98 (SC98). 5. Foster I. and Kesselman C , Globus: a metacomputing infrastructure toolkit, International Journal of Supercomputer Applications 11 (1997) pp. 115-128.

RUNNING MPI APPLICATION IN THE HIERARCHICAL GRID ENVIRONMENT LIZHE WANG Nanyang Technological University, Nanyang Avenue, Singapore 639798 E-mail: [email protected] WEIJIE Institute of High Performance Computing, 1 Science Park Road, Singapore 117528 E-mail: [email protected] WEIXUE Tsinghua University, Beijing, P.R.CHINA 100084 E-mail: [email protected] This paper presents a real-time application in the hierarchical Grid environment. This application is transient stability computation for large-scale power systems. In this application, a new programming model is proposed. In the hierarchical Grid environment, this application is running on two clusters with MPICH 1.2.3 implementation. On the head node of each cluster, Globus Toolkit is installed. An MPICH-G2 based program is running on the head nodes for data exchange between two clusters. The application can make full use of computing resources with the programming model proposed.

1

1.1

Introduction

Application Background

The complexity of computing transient stability of modern power system maybe awful not only because power system is a high dimensional, hierarchical and distributed system but also that the electromagnetic process and strong non-liner elements need modeled in such system. Thus, the computation of power system transient stability is a computing intensive application. Furthermore, the computation of power system transient stability need to be finished in a short time to fulfill the on line control requirement. Therefore, parallel computation for power system transient stability is the natural solution. This application is a transient stability computation of large-scale power system. It developed with C language and MPI-2 on the LINUX platform. 1.2

Grid Computing and MPI

Grid computing [1] has emerged as an important new field, distinguished from conventional distributed computing by its focus on large-scale resource sharing, innovative applications, and, in some cases, high-performance orientation. Computing Grid is a hierarchical environment. Computing resources, e.g., NOWs, clusters, located in geographically distributed sites, are shared in the computing Grid. Grid infrastructure, e.g., Globus Toolkits [2], is installed in the head nodes of these resources. Inside each site, the resource is managed by the local resource management system, e.g., PBS [3], Condor [4]. Many parallel applications are written with MPI (Massage Passing Interface). MPI is a library specification for message passing, proposed as a standard by a broadly based

762

763

committee of vendors, implementers, and users [5]. There are different MPI implementations for different architectures and environments, e.g., MPICH-P4 for LAN environment and MPICH-G2 for Grid environment. To make full use of resources in the hierarchical Grid environment discussed above, the MPI applications need to link different MPI libraries in different levels. For example, in the Grid environment, MPICH-G2 libraries should be used. 2

Programming Model

The whole data of the power system to be computed is divided into two parts. Inside each cluster, an MPICH-P4 based program, says program 1, can be executed with one part of the whole data of whole system. An MPICH-G2 based program, says program2D can run on the Globus level for data exchange between clusters. The software module is detailed in Figure 1.

message passing(MPICH-G2 based)

process 2 of program 1

process n of program 1

ft ft

ft

process 1 of program 1

message passing(MPICH based)

process 2 of program 1

process n of program 1

ft ft

ft

process 1 of program 1

message passing(MPICH based)

cluster 2

cluster 1 Figure 1 Software Module

764

3

Testbed

We use two clusters, says clusterl and cluster2, in the test. On the head node of each cluster, Globus Toolkit 1.1.4 is installed. The cluster configuration information is listed in Table 1 and Table 2. The system configuration is detailed in figure 2. Node Number Head Node Slave Node

Cluster 1 1 11

Cluster 2 1 17

Table 1 Cluster Size Processor Cache Memory Hard Disk Operating System

IBM xSeries 330, PHI 933 Mhz 256MB L2 c 1GB 18 GB SCSI HDD Linux

Table 2 Node Configuration Ethernet

/ /]

100 Mbps / Fast Ethernets

100 Mbps / Fast Ethernet / n

a '_/

slave node si;-.

slave node

slave node

clusterl with 11 slave nodes

slave node

slave node

cluster 2 with 17 slave nodes Figure 2 System Configuration

4

Results and performance evaluation

The result of application is showed in table 3. Now, the computing time of application is 0.093s <0.1s, which can fulfill the requirement of the real-time.

765

Processor number Running time (single processor) Running time (the whole system) Speed up

30 1.082000 s 0.093000 s 11.63

Table 3 running result The system performance is showed in table 4.

Intra-cluster Communication

System idle 0.0% user, l.l%system 1048076K available, 511444Kused, 536632K free Average Available bandwidth = 10.95 MB/sec

Running application 97.6% user, 2.3% system 1048076K available, 522572K used, 525504K free Average Available bandwidth = 5.51 MB/sec

Inter-cluster Communication

Average Available bandwidth = 9.57 MB/sec

Average Available bandwidth = 6.67 MB/sec

CPU (per node) Memory (per node)

Table 4 system performance

5

Conclusion

In this paper, we present a real time application for transient stability computation of large-scale power system, which is running in the hierarchical Grid environment. High performance computing can be accessed with low cost clusters linked in the Grid environment. A new programming model is also proposed to run MPI applications in the hierarchical Grid environment.

References 1. I. Foster. The Grid: A New Infrastructure for 21st Century Science. Physics Today, 55(2), pages 42-47, 2002. 2. http://www.globus.org/. 3. http://www.openpbs.org/. 4. http://www.condor.cs.wisc.edu/. 5. MPI Forum, http://www.mpi-forum.org/.

CLUSTER-BASED PARALLEL SIMULATION FOR LARGE-SCALE POWER SYSTEM DYNAMICS YAN Jianfengl, XUE W e i l , SHU Jiwu2, Wang Xinfengl, JIE Wei 3 1. Department of Electrical Engineering and Applied Electronic Technology, Tsinghua University, Beijing, China 100084 2. Department of Computer Science and Technology, Tsinghua University, Beijing, China 100084 3. Institute of High Performance Computing, 1 Science Park Road, Singapore 117528 Transient stability analysis computation is the most intensive computation part of large-scale power system dynamic simulation. A key problem in large-scale power system real-time simulation is the transient stability analysis. In order to speed up the transient stability analysis, many parallel algorithms and implementations were developed. This paper analysed the available parallel algorithms for transient stability computing, pointed out their advantages and disadvantages and developed a multi-layer BBDF (Block Bordered Diagonal Form) algorithm to achieve the division of network equation computation task on cluster system. Some optimizations to reduce the whole computation cost were also presented. Numerical results on south China power grid were presented to show the speed up and usefulness of the algorithm multi-layer BBDF on cluster system.

1 Introduction Power system dynamic simulation is a powerful tool of power system analysis. Based on the time scale, it can be simply divided into two parts: transient stability analysis and middle-long term dynamic analysis. Transient stability analysis computation requires shorter time step and more complex device models. As is well known, it is the most intensive computation part of power system dynamic simulation. Moreover, its heavy computing task increased sharply with the expansion power system's scale. Transient stability analysis can now only be used in off-line state for large-scale power systems. With the development of power system, it is increasingly necessary to find a useful method to perform on-line dynamic security analysis, which depends on real-time power system transient stability simulation. For now, sequential process computing can't fulfil the requirement of the real-time simulation for large-scale power system transient stability analysis, for it has difficulty in satisfying both quickness and precision. Many parallel studies can be found to complete the real-time transient stability analysis simulation. In this paper, the available parallel algorithms for transient stability analysis computation were presented along with their advantages and disadvantages. Then a multi-layer BBDF (Block Bordered Diagonal Form) algorithm and some optimizations were proposed to achieve the division of computation task on cluster system and reduce the whole computation cost. Numerical results of south China power grid have shown that the implementation on cluster system can perform transient stability analysis computing much faster than real-time dynamic process, which showed that the application can fulfil the requirements of large-scale power systems real-time transient stability analysis computing. 2 Power system transient stability computing model To describe the characteristic behaviour of power system transient process, a set of DAE (Differential Algebraic Equations) is usually needed.

766

767

lxd=f(xd,Vd)

= Axd+Bu(xd,Vd)

(1-1)

\h=g{xdyd) = YNVd d-2) Where: x d - State vector of generator and exciter variables; Vd - Real and imaginary parts of voltages at each node; Id - Real and imaginary parts of current injection; •A - Block diagonal matrix; " - Rectangular block matrix; ' • ' ' * - Nonlinear vector function; YN - Complex sparse matrix According to network characteristic of power system, the rank of xd was far lower than that of Vd . Equation (1-1) could be solved with separately block, which was a facility in parallel computing task division. Computing time cost depended on Equation (1-2) basically, where YN was a sparse high rank matrix. It must be mentioned that the roots of equations must be a series according to time domain t. In traditional, the DAE were solved separately for each time step, which is called IAI (Interlaced Alternating Implicit) algorithm. A series roots according to time domain t could be received by following periodically computing. For (t = 0; t< Total time steps; t++) { Obtain xd (t), YN (t) from events and system state in previous time step; Obtain initial Vd (t) from computing initialization or results in previous time step; While(AV/(0> +l

Obtain Id

E

){

(t) from g using Vd (k), xd (k) ;

Obtain the time derivatives fd (t)(xd(t),Vd

(t)) from Equation (1-1);

Calculate xd (t) with implicit integration method, by (1-1) Obtain Vd+ (t) solved from Equation (1-2);

AV;+1(0 = | Vdk+\t)-Vd'(t) I; k++; } } 3 Parallel transient stability algorithm and optimization schemes The design of parallel algorithm depends on the computer architecture. In this section, cluster system used in our research was introduced firstly. Secondly, the available parallel algorithms for transient stability analysis computing were presented. Finally, a multi-layer BBDF algorithm and its optimizations were proposed.

768

3.1 architecture of cluster system Cluster system, consisting of commercial CPUs and network devices, can provide enough computation power to enable parallel and distributed real-time simulation for large-scale power system transient stability analysis computing. In Table 1, the cluster system constitution used in this paper was given in Table 1. Table 1 Cluster system of Tsinghua Univ.

Number of computing nodes Makeup Node Information

CPU Memory Hard disk Net

Operation System Programming Language Parallel Environment

Results 36 SMP (Symmetry Multiple Processor) 4 X Intel Xeon PIH700MHz 1G SDRAM 36G SCSI Ultra2 2.56GB Myrinet/IOOM Ethernet Linux C> C+-K Fortran PVM, MPI> OPENMP

3.2 available research on parallel transient stability simulation Recently the research on parallel algorithms and their implementations on transient stability analysis computing have been well developed [1] [2]. Spatial parallel algorithms, including partition method and parallel factoring algorithm, took a time-domain integration method and decomposed each time step computation into sub-tasks among different processors. As a coarse granularity parallel algorithm, partition method application could be easier realized and achieve higher efficiency on distributed architecture system, while parallel factor algorithm behaved better on shared memory system. In order to gain better performance on more processors, the simultaneous multiple-time-step solutions such as WR (Waveform Relaxation) method and parallel-in-time Newton algorithm were introduced into parallel algorithms. These time domain parallel algorithms enlarged the size of transient stability analysis computing problems by solving some time steps of the DAE simultaneously and improved the speedup effectively. However, it was difficult to achieve both great effect of parallel computing and high convergence rate. Another limiting factor hindering parallelism was that many invalid computations would be brought into computing task when random events happened in the 'time window' of computing. Many previous developments have been accomplished on memory-shared parallel computer or Transputers, but little on cluster systems. Recently with the wide use of scalable cluster systems, the research on cluster-based parallel algorithms for transient stability simulation has become a new hotspot in this field [3]. 3.2 multi-layer BBDF algorithm and its optimizations According to power system characteristics, the corresponding differential equations of dynamic devices such as generators were only related to one of the network nodes. The

769

differential part of Equation (1-1) could be sorted by different network sub-area, and divided into corresponding processor for computing simultaneously. Therefore, the parallel computation of linear network equations was the key of transient stability analysis computation. Based on cluster system, the linear network equations were reformed in the Block Bordered Diagonal Form[41, as follow:

iAXd,vd)-\ I2(Xd,Vd)

y2 (2)

in(xd,vd) Is(Xd,Vd)\

Y'

Y'

Y'

Y.

Where: I j(Xd,Vd) - Real and imaginary parts of current injection in sub-area j and border area s respectively; Yj - Complex sparse matrix from sub-areay and border area s respectively; Vj - Real and imaginary parts of voltages at each node in sub-area j and border area s respectively; Yj - Complex matrix from connection between sub-area j and border area s respectively; And/=l,2,...,n The BBDF of the equations could be achieved by following flow chart: For Iter = 1, Maxlters Solve the vector and matrix corrected Solve the vector and matrix in sub-area 1 corrected in sub-area n Ay =Y'Y~XY A/ =Y'Y~XI AY =Y'Y'lY,AI =Y'Y~]I n

n

n

n>

n

n

n

n

Communication of collection between processors Solve border area s

(^-XAJ^^-XA/,.

(3)

i=i

Communication of scattering the solution of border area equations to subsystems Solve the node voltages in sub-area 1 Solve the node voltages in sub-area n YV = I •yy, YV =1-YV In some large-scale power systems, Chinese power system, for example, the number of sub-areas had to be large enough to make the computing task of each CPU 'light' enough for real-time simulation. Since the border area node number increased markedly with the rise of the sub-area number, the computing task of equation(3) would be much heavier than any other sub-area equations.

770

Additionally, dissymmetry caused by some events would introduce negative and zero sequence matrices into YN and enlarge border area node number too. To solve the problem, a multi-layer BBDF method was proposed in this paper. To 'lighten' computing task of border area in case of large sub-area number, the equations were divided into several parts as formula (4) and were computed with parallel BBDF scheme, which was the same as above. \ NTp

yTz

NTZ

NpT

(4)

YF

KT

NzT In formula (4), where YT , YTn and YTz were positive, negative and zero sequence node matrices for connection between different sub-areas; YF was the matrix of event nodes; N

T

, NnT , NzT , NTp , NTn

and NTz

were the connection matrices.

Because some event forms had invalid impedance matrix, the admittance matrix was adopted. In this paper, two optimizations were developed to reduce the consumed computing time and improve scalability of the program. • In fact, there were two iteration loops in IAI algorithm. The inner one for solving nonlinear network equations was broken in this algorithm for evading the unnecessary substitutions. And it was proved that the solutions of this algorithm gained the same precision as the traditional IAI algorithm. • During the simulation, state variables of dynamic devices, currents injected and node voltages of positive sequence network had to be updated every iteration step, while those of negative or zero sequence network in sub-areas and border area were not necessary. So the state variables of negative or zero sequence in sub-areas and border area were solved only once in every iteration step. Furthermore, different event computing schemes were used to different event forms, such as cancelling zero sequence network computation when only inner-phase faults happened. 4

Results

Some cases of south China power grid, the largest power grid in China and being given in Table 2, were tested on the cluster system. For all the tested cases, an A-phase fault on single branch was assumed, the time step was 0.02s, and convergence tolerance was ,-4 10' pu. Table 2 power system data used in tests

Power system Number of Nodes Number of Branches Number of Province Systems

South China Power Grid 2098 2588 6

771

Number of Regional System Sparseness of YN

53 0.21% Average Number of Branches per Node 2.36 Figure 1 showed the speedups and simulation velocity of the parallel computing with the data of south China power grid. Both of the communication devices of cluster system, 100M Ethernet and 2.56 GB Myrinet, were tested respectively. 10

10

12

10

12

5.59

3.172 7.163 8.172 9.069 9.823 9.195 7.572

-Rp 0.21 0.572 1.134 1.226 1.321 1.39 1.407 1.174

Rp 0.215 0.682 1.54 1.757 1.95 2.112 1.977 1.628

2.724 5.4 5.838 6.29 6.619 6.7

Figure 1 speedups and simulation velocity of South China Power Grid (left: 100M, right: Myrinet) Where: Sp: Speed up which was compared with sequential process computing program Rp: Ratio of real time to computing time

Several comments were made from the figures above: • The transient stability analysis computing ran faster than the real-time process in south China power grid with both 100M Ethernet and Myrinet communication device. In figure 1, parallel computing speed up ascended with the increase of the number of sub-area and reached the maximum between 8 and 10. The parallel computing on 8 sub-areas with Myrinet was almost 9 times faster than the sequential simulation and 1 times faster than real-time process. It was proved that the multi-layer BBDF algorithm for transient stability analysis was very efficient. • It was also found that the efficiency of this parallel algorithm exceeded 100% in some test cases. Although the communication and computation cost to solve border area equations increased with the rise number of the sub-areas, the whole computation cost to solve network equations with BBDF scheme decreased. The abnormal high speedups might be achieved when the overall costs of the algorithm in this paper decreased 5

Conclusion

In this paper, a multi-layer Block Bordered Diagonal Form algorithm was proposed and implemented on cluster system. Some optimizations such as iteration reduction were also presented to improve the performance of the simulation computing. Numerical results of the largest power grid in China suggested that the algorithms and optimizations were efficient and scalable. For the real-time transient stability simulation of larger power system, even the future nationwide power grid in China, this algorithm, with adequate efficiency and scalability, is a feasible selection.

772

References [1] Daniel J.Tylavsky, Anjan Bose. Parallel processing in power systems computation [J]. IEEE Trans. On PWRS, 1992,7(2): 629-638 [2] Chai J.S, Bose A.J. Bottlenecks in Parallel Algorithms for Power System Stability Analysis [J]. IEEE Trans. On PWRS, 1993,8(1): 9-15 [3] Wei Xue, Jiwu Shu, Xinfeng Wang, Weimin Zheng. Advance of parallel algorithm for power system transient stability simulation [J]. Journal of system simulation, 2002, 14(2): 177-182 [4] A.Torralba. Three methods for the parallel solution of a large, sparse system of linear equations by multiprocessors [J]. International journal of energy systems, 1992, 12(1) [5] I.C.Decker, D.M.Falcao. Conjugate Gradient Methods for Power System Dynamic Simulation on Parallel Computers[J]. IEEE Trans. On PWRS, 1996, 11(3):1218-1227 [6] Nagata, M., Uchida, N. Parallel processing of network calculations in order to speed up transient stability analysis[J]. Electrical Engineering in Japan, 2001, 135(3):26-36 [7] Wei Xue, Jiwu Shu, Xinfeng Wang, Weimin Zheng. Parallel algorithm of power system network equations on cluster system [J]. Journal of Nanjing University, 2001,37:204-210

HARDWARE IMPACT ON COMMUNICATION PERFORMANCE OF BEOWULF LINUX CLUSTER TANG YUAN 1 , ZHANG YUN-QUAN 1 ' 2 , LEE YU-CHENG 1 '(Research and Development

Center of Parallel Software, Institute of Software, Chinese of Science, Beijing 100080,

China);

2

(Key Lab of Computer Science, Chinese Academy of Science, Beijing 100080, E-mail: [email protected],

Academy

z\[email protected].

China)

[email protected]

There exist a lot of models of parallel computation, amongst which LogP and LogGP are famous and suitable to describe the framework of communication process of Beowulf LINUX Cluster. [4] analyzed the impact of each LogP parameter on real world application in detail. But most researchers seemed ignore the impact of different hardware conditions on LogP parameters. This paper, based on Beowulf LINUX Cluster, compare these various hardware differences and their possible impacts, and propose some useful suggestions in the end. We hope it will be helpful to point out the ways on the configuration of Beowulf LINUX Cluster. And the analysis of software impacts are our future work.

1

Introduction:

There are many models of parallel computation, such as LogP[2], LogGP[3], LogPQ, LoGPC, P-3PC and so on. When performing the tests in this paper, we adopted a named parameterized LogP model proposed in [1], which characterizes a Network as N = (L, Os(m), Or(m), g(m), P). Also, the program to measure the parameters is downloaded from web site http://www.cs.vu.nl/albatross/. The program from http://www.cs.vu.nl/albatross/ runs 6 different combinations of MPFs send and receive mode ({MPI_Send, MPIJsend, MPI_Ssend} * {MPI_Recv, MPIIrecv}), amongst which only MPI_Send * MPI_Recv directly reflects the hardware impacts on a real world MPI application. Other combinations reflect software impacts more. So the following comparison and analysis in this paper will only focus on MPI_Send * MPI_Recv. The remaining contents of this paper are organized as follows: Section 2 describes our testing platforms. Section 3 discusses the impact of computational part per node. Sub section 4.1 and 4.2 details the impact of Ethernet versus Myrinet and 1NIC versus 2NIC, respectively. Section 5 is our conclusions and future work. 2

The testing platforms: The testing jrogram runs on following different Beowulf CPU Host bus clock L1ICache speed 64KB(64 Sv-ether AMD 266MHz bytes/line) 1.6GHz 64KB(64 Sv-myri AMD 266MHz bytes/line) 1.6GHz

773

LINUX Clusters: L2 Cache L1DCache 64KB(64 256KB(64 bytes/line) bytes/line) 256KB (64 64KB (64 bytes/line) bytes/line)

Mem

NIC

1GB

3c905c

1GB

Myrinet GM ver 1.5

774 Sv-PIII

Intel PHI 500MHz

100MHz

16KB(32 bytes/line)

16KB(32 bytes/line)

512KB(32 bytes/line)

Sv-PIV

Intel PIV 1.6GHz

100MHz

12KB(64 bytes/line)

8KB(64 bytes/line)

Sv-INIC

Intel PIV 1.6GHz

100MHz

12KB(64 bytes/line)

8KB(64 bytes/line)

Sv-2NIC

Intel PIV 1.6GHz

100MHz

12KB(64 bytes/line)

8KB(64 bytes/line)

512KB(12 8 bytes/line) 512KB(12 8 bytes/line) 512KB(12 8 bytes/line)

512MB SDRA M 128MB DDR

3c905b

128MB DDR

3c905c

128MB DDR

2*3c905c

3c905c

Table 1. The configuration of our testing platforms

3

The impact of computational part per node:

The testing platforms selected for this comparison are Sv-PIII and Sv-PIV in table 1. 0000 .

1 *">c d - ^ g(m}ne w —*—

* n n

1000

IT

H

TT

' ' '

y

"

osim^old —*— 03<m >rew —*—

.r

100

1 D

B~»

_«-

10

7

H • '•

V

J l . .i .

1

10

100 1000 10000 message size (bytes)

100000

1

10

100 1000 10000 me33age size (bytes)

100000

Fig 3. Sv-PIII.Send.Recv.OsfmJ

Fig 2. Sv-PIII.Send.Recv.g(m)

vs Sv-PrV.Send.Recv.Oifm)

vs Sv-PIV.Send.Recv.g(m) or(m >old or(m >new

10000

1000

^L. -a* f***

100

^a^e,B— r-*^i,

10

1

10

100 1000 10000 message 3ize (byte3)

100000

Fig 4. Sv-PIII.Send.Recv.Or(mJ vs Sv-PIV.Send.Recv.Orfm;

From these 3 figures we could see that, improving the computational part of each node reduces the overhead of transferring short and medium messages, but contributes little to long messages and g(m). Considering that g(m) determines the minimal time interval between consecutive messages, only improving the computational portion enlarges the gap between Os(m), Or(m) and g(m), which is exact the gap between computation and

775

communication and should be used to the overlap of computation and communication in long message's non-blocking transfer. Also, improvement of computational part of each node contributes much to the computational part of real world parallel application. 4 4.1

Networks: Ethernet versus Myrinet

The testing platforms selected are Sv-ether versus Sv-myri in table 1. Running the test programs on these two platforms generate similar comparison figures like those illustrated in section 3. And from these figures we can see that the communication performance improvement by Myrinet mainly focus on short and medium messages (< 16KB in our example). For longer (> 16KB in our example) messages, the performance of Myrinet decreases significantly and even does a little bit worse than Ethernet at some points. Another important point to be mentioned is that Myrinet seems suffer more from "socksize" parameter, which sets the size of underlying send/receive buffer, than Ethernet. For short and medium messages, the improved g(m) of Sv-myri against Sv-ether is about 99.88% 92.52%; For short and medium messages, the improved Os(m) of Sv-myri against Sv-ether is about 90.05% 92.18%; For short and medium messages, the improved Or(m) of Sv-myri against Sv-ether is about 73.02% 59.31 %; 4.2

1 NIC versus 2 NIC :

The testing platforms in this sub section are Sv-INIC and Sv-2NIC. Note: we define 1 NIC mode here to mean only one network adapter is available on each node of Beowulf LINUX Cluster and is used for both MPI and OS communication; 2 NIC mode here to mean two network adapters are available on each node and one is for MPI communication and the other for OS communication such as NFS, YPSERV, ARP. From similar resulting comparison figures like those in section 3, we could see that to pure communication tests, 1 NIC mode and 2 NIC mode defined in this sub section make no significant difference. And further tests illustrate that the amount of OS communication is too little compared with MPI while the applications are running, which is about 0.002% 0.003% in bytes of traffic. And the running results of NPB (version 2.3, CLASS = A, NPROCS = 2) on our Sv-INIC and Sv-2NIC confirm the results. So we may conclude that if a parallel program has a lot of OS communication like NFS, YPSERV, ARP, and so on while running, 2 NIC mode will be useful; otherwise, 1 NIC mode may be a good choice considering the performance/cost ratio. 5

Conclusions and future work:

g(m) is mainly influenced by network speed and software parameters. Only improving the computational part of each node has no significant effect. To short or medium messages, Os(m) and Or(m) is mainly occupied by instruction execution in the protocol stacks. So it is very effective to speedup the CPU frequency and the memory of each node to improve Os(m) and Or(m). If message size keeps growing, Os(m) and Or(m) will be greatly occupied by waiting for the re-availability of underlying send buffer or receive buffer, which depends on the

776

network speed. So to long messages, the improvements of Os(m) and Or(m) rely not mainly on the CPU frequency or memory per node, but the speed of network. The communication performance improvement of Myrinet compared with Ethernet mainly focus on short and medium messages; to long messages, the performance of Myrinet decreases quickly and significantly. And Myrinet seems suffer more from the parameter—'socksize' of GM, which sets the underlying send buffer and receive buffer size. If there are two network adapters available per node of Beowulf LINUX Cluster, and use one for OS communication (such as NFS, YPSERV, ARP and so on), the other for MPI communication, which is so-defined 2 NIC mode in sub section 4.2, the possible performance improvement will depend on the ratio of OS messages compared with MPI messages in the running proceeding of applications. If the ratio is large, the 2 NIC mode will be useful. Otherwise, it will be not. If only one network adapter is available per node, some OS parameters should be adjusted to decrease the amount of OS messages, in order to reduce the total program running time. This paper analyzes the possible impact of computational part of each node, Ethernet versus Myrinet, 1 NIC versus 2 NIC, on communication performance of Beowulf LINUX Cluster. We get some conclusions and put forward some suggestions, which may be useful when one should make some decisions on hardware selection of Beowulf LINUX Cluster. And we'd like to analyze more hardware impact, such as the share of single network adapter by multiple CPU per node in the future. The analysis of software impact is also in the schedule. References: 1. Thilo Kielmann* Henri E.Bal* and Kees Verstoep* "Fast Measurement of LogP Parameters for Message Passing Platforms"* http://www.cs.vu.nl/albatross/. 2. David Culler* Richard Karp* David Patterson* Abhijit Sahay* Klaus Erik Schauser* Eunice Santos* Ramesh Subramonian* and Thorsten von Eicken* 'LogP* Towards a Realistic Model of Parallel Computation"* In Proc. Symposium on Principles and Practice of Parallel Programming(PPoPP), pages 1-12, San Diego, CA, May 1993 3. Albert Alexandrov Mihai F.Ionescu* Klaus E.Schauser* and Chris Scheiman* "LogGP* Incorporating Long Messages into the LogP Model* One step closer towards a realistic model for parallel computation". In Proc. Symposium on Parallel Algorithms and Architectures(SPAA), pages 95-105. Santa Barbara, CA, July 1995 4. Richard P.Martin* Amin M.Vahdat* David E.Culler and Thomas E.Anderson* 'Effects of Communication Latency* Overhead* and Bandwidth in a Cluster Architecture"* http://www.berkelev.edu/~culler/papers 5. David Culler* Lok Tin Liu* Richard P.Martin* and Chad Yoshikawa* 'LogP Performance Assessment of Fast Network Interfaces", http://www.berkeley.edu/~cullei7papers

D-GRIDMST: CLUSTERING LARGE DISTRIBUTED SPATIAL DATABASES

JI ZHANG

YUE CAO

Department of Computer Science, National University of Singapore, Lower Kent Ridge Road, Singapore 117543 Email: {zhangji,

caoyue}@comp.nus.edu.sg

In this paper, we will propose a distributable clustering algorithm, called Distributed-GridMST (D-GridMST), that deals with large distributed spatial databases. D-GridMST employs the notions of multi-dimensional cube and grid to partition the data space involved and uses density criteria to extract representative points from spatial databases, based on which a global MST of representatives is constructed. Such a MST is partitioned according to users' clustering specification and used to label data points in the respective distributed spatial database thereafter. Since only the compact information of the distributed spatial databases is transferred via network, D-GridMST is characterized by small network transferring overhead. Experimental results show that D-GridMST is effective since it is able to produce exactly the same clustering result as that produced in centralized paradigm, making D-GridMST a promising tool for clustering large distributed spatial databases.

1

INTRODUCTION

Spatial data clustering, aiming to identify clusters, or densely populated regions in a large and multi-dimensional spatial dataset, serves as an important task of spatial data mining. Though a large number of spatial clustering algorithms have been proposed in literature so far, most of them assume the data to be clustered are locally resident in centralized scenario, making them unable to cluster inherently distributed spatial data sources. Recent effort in this filed includes [1, 2, 3, 4]. In this paper, we will propose a distributable clustering algorithm, called Distributed-GridMST (D-GridMST), which deals with large distributed spatial databases. D-GridMST employs the notions of multi-dimensional cube and grid to partition the data space involved and uses density criteria to extract representative points from spatial databases, based on which a global MST of representatives is constructed. Such a MST is partitioned according to users' clustering specification and used to label data points in the respective distributed spatial database thereafter. Since only the compact information of the distributed spatial databases is transferred via network, D-GridMST is characterized by small network transferring overhead. Experimental results show that D-GridMST is effective since it is able to produce exactly the same clustering result as that produced in centralized paradigm. These advantages are believed to make D-GridMST a promising tool for clustering large distributed spatial databases. 2

MST (MINIMUM SPANNING TREE) CLUSTERING

MST clustering is the best-known graph-theoretical divisive clustering algorithm. Given n points, a MST is a set of edges that connects all the points and has a minimum total length. Deletion of edges with larger lengths will subsequently generate a specific

777

778

number of clusters. The advantages of using MST for clustering are that MST is very suitable for dealing with the dataset featuring arbitrary shapes and the clustering result of MST is stable and not sensitive to the input order of the data points. In addition, the MST method can achieve a very promising accuracy especially when the size of data needed to be clustered is small and the data is clear (free of outliers or noises). Figure 1 shows an example of MST of a number of points.

• (a)

•

»

•

(b)

Figure 1 : A set of points and its corresponding MST Example 1. Given a set of points (Figure 1(a)), the corresponding MST is shown in Figure 1(b).

3

GRIDMST

GridMST clusters spatial databases in a number of steps as follows: (1) Construction of MST of representative points of the spatial database; (2) Clustering representative points using MST clustering method; (3) Label the points in the spatial database. 3.1

Construction of MST of representative points of the spatial database To construct the MST of representative points, a hyper-cube data structure is constructed whereby each point in the dataset is assigned to one and only one cell in the hype-rectangle. The density of each hyper-cube cell is then computed. If the density of a hyper-cube cell exceeds some user-specified threshold, then the cell is considered to be dense and the median of the points belonging to the dense cell is selected as the representative point of this cell. Once the representative points have been selected, a graph-theoretic based algorithm is used to build the MST. 3.2

Clustering representative points using MST clustering method When the MST of representative points has been constructed, it can be easily partitioned according to user's clustering requirements. For instance, if user wants to cluster the spatial database into k clusters (k here is an user-specified parameter), the MST will be partitioned by cutting its k-\ longest edges. 3.3

Label the points in the spatial database When the representative points of the spatial database has been clustered, clustering labeling is performed in this last step to cluster the whole spatial database. Two labeling strategies are used: 1) For the points falling into one of the dense cells, it will share the same cluster label as the representative point of this dense cell; 2) For the points falling into one of the non-dense cells, it will have the same cluster label as that of its nearest representative point.

779

4

D-GRIDMST (DISTRIBUTED-GRIDMST)

4.1

Local data model vs. global data model To enable D-GridMST to produce the clustering result of these distributed databases that is comparable to result of a single database, globalization of local data model is entailed to capture the cluster features for the whole spatial database. (1) Globalize range of every dimension of data in each distributed site Global range of every dimension of the data is required to construct a global hypercube structure, making sure that the hypercube constructed is able to encapsulate all the data points stored in the distributed sites. To this end, all distributed sites are required to provide the central site with the information regarding the maximum and minimum values, i.e. the range, of every dimension of local data points. (2) Globalize local occupied cells in each distributed site Here, the occupied cells refer to the cells that occupied by the data points in the database. The global occupied cells serve as the potential pool for the selection of dense cells: the dense cells are only the occupied cells whose neighborhood density exceeds some threshold. The global occupied cells are the union of local occupied cells. 4.2

D-GridMST Algorithms Table 1 presents the detailed algorithm of D-GridMST. Step

Transfer/Location

1

DS—>CS

2

CS

3

cs—•DS

4

DS

Operation Transfer local range of every dimension of data Globalize local range to global range Transfer global range (global hypercube) Assign local points into the hypercube and compute the density

5

DS—»-CS

Transfer local occupied cells and their densities

6

CS

Globalize local occupied cells of the hypercube

7

CS

Generate representative points and construct MST

8

CS

Perform clustering using MST

9

C S _ * . DS

10

DS

Transfer clustering result of representative points Label local data points Table 1. Algorithm ofD-GridMST

Meaning

Value

Clustering operations in centralized site

CS

Clustering operations in all the distributed sites

DS CS—*DS

Data are transferred from centralized site to all the distributed sites

DS—*" CS

Data are transferred from all distributed sites to the centralized site

Table 2. Annotations of the Transfer/location field in Table 1

780

5

EXPERIMENT AND DISCUSSION

Experiments have been conducted to evaluate the effectiveness of D-GridMST in clustering distributed spatial databases. Synthetic datasets are generated using dataset generator. The main focus of our experiments is to see whether D-GridMST is able to, with only small data transferring via network, yield the same clustering as that produced in a centralized scenario. The results of centralized clustering using GridMST and distributed clustering using D-GridMST on the synthetic datasets are illustrated in Figure 2 and 3. Experimental results show that D-GridMST is effective since it is able to produce exactly the same clustering result as that produced in centralized paradigm. These results verify that D-GridMST is a promising clustering algorithm that achieves very good performance when working on distributed spatial databases.

(a)

(b)

Figure 2. Result of Dataset 1

(a)

(b)

Figure 3. Result of Dataset 2

(a): the result of centralized clustering using GridMST; (b): the result of distributed clustering using D-GridMST

Finally, spatial database is subject to frequent changes and updates, thus a future research direction will be making D-GridMST allow for dynamic distributed spatial database. Instead of performing naive re-clustering, D-GridMST is expected to dynamically deal with these updates and perform clustering efficiently. REFERENCES [1]

I.S. Dhillon, and D.S. Modha. "A Data-Clustering Algorithm On Distributed Memory Multiprocessor", Large-scale Parallel KDD Systems, eds. M. Zaki and C. Ho, (1999), pP245-260. [2] G. Forman, B. Zhang, "Distributed Data Clustering can be Efficient and Exact", HPL Technical Report HPL-2000-158. Also appears in SIGKDD Explorations, (2001). [3] K. Johnson, and H. Kargupta. "Collective, Hierarchical Clustering from Distributed, Heterogeneous Data. Large-scale Parallel KDD Systems, eds. M. Zaki and C. Ho, (1999), pp221-244. [4] H. Kargupta, W. Huang, S. Krishnamoorthy, and E. Johnson. "Distributed Clustering Using Collective Principal Component Analysis", Knowledge and Information Systems Journal: Special Issue on Distributed and Parallel Knowledge Discovery, (2000).

MASSIVELY PARALLEL SEQUENCE ANALYSIS WITH HIDDEN MARKOV MODELS BERTIL SCHMIDT School of Computer Engineering, Nanyang Technological University, Singapore E-mail: [email protected]

639798

HEIKO SCHRODER School of Computer Science and Information Technology, RM1T, Melbourne, E-mail: [email protected]

Australia

Molecular biologists use Hidden Markov Models (HMMs) as a popular tool to statistically describe protein sequence families. This statistical description can then be used for sensitive and selective database scanning. Even though efficient dynamic programming algorithms exist for the problem, the required scanning time is still very high, and because of the exponential database growth finding fast solutions is of high importance to research in this area. In this paper we illustrate how massive parallelism can be used for efficient sequence analysis using HMMs. We present two new techniques to parallelize the dynamic programming calculation: "diagonal-by-diagonal" and "row-by-row". This leads to significant runtime savings on our hybrid parallel system based on commodity components to gain high performance at low cost. The architecture is built around a coarse-grained PC-cluster linked by a high-speed network and fine-grained SIMD processor arrays connected to each node.

1

Introduction

Scanning sequence databases is a common and often repeated task in molecular biology. The need for speeding up this treatment comes from the recent developments in genomesequencing projects, which are generating an enormous amount of data. This results in an exponential growth of the bio-sequence banks: every year their size scaled by a factor 1.5 to 2. The scan operation consists in finding similarities between a particular query sequence and all the sequences of a bank. This operation allows biologists to point out sequences sharing common subsequences. From a biological point of view, it leads to identify similar functionality. However, identifying distantly related homologues is still a difficult problem. Because of sparse sequence similarity, commonly used comparison algorithms like BLAST or Smith-Waterman often fail to recognize their homology. Therefore, Hidden Markov Models (HMMs) have become a powerful tool for high sensitivity database scanning, because they can provide a position-specific description of protein families. HMMs can identify that a new protein sequence belongs to the modeled family, even with low sequence idendity [2]. An HMM can be compared with a protein sequence by dynamic programming based alignment algorithms, such as the Viterbi algorithm, whose complexities are quadratic with respect to the sequence and model length. Basically, there are two methods to parallelize HMM database scanning: one is based on the parallelization of the dynamic programming calculation, the other is based on the distribution of the computation pairwise comparisons. Fine-grained parallel architectures, like linear SIMD array, have been proven as a good candidate structure for the first approach, while more coarsegrained networks of workstations are suitable architectures for the second [1]. This paper presents a new approach to high performance HMM database scanning that combines both strategies in order to achieve even higher speed. We have designed

781

782

massive parallel versions of the Viterbi algorithm that are tailored to fit the characteristics of our hybrid parallel architecture. Their implementation is described on our hybrid system consisting of Systola 1024 cards within the 16 PCs of a Beowulf cluster [4]. The rest of this paper is organised as follows. In Section 2, we introduce the Viterbi algorithm used to align an HMM with a protein sequence. Section 3 provides a description of our hybrid system. The new parallel algorithms and their mapping onto the hybrid architecture are explained and evaluated in Section 4. Section 5 concludes the paper with an outlook to further research topics. 2

Viterbi algorithm

The structure of an HMM to model a set of biologically similar protein sequences (a protein family) is shown in Figure 1. It consists of a linear sequence of nodes. Each node has a match (M), insert (I) and delete state (D). Between the nodes are transitions with associated probabilities. Each match state and insert state also contains a position-specific table with probabilities for emitting a particular amino acid. Both transition and emission probabilities can be generated from a multiple sequence alignment of a protein family.

Figure 1. The transition structure of an HMM of length 4. Squares represent match states, circles represent delete states and diamonds represent insertions.

An HMM can be compared (aligned) with a new protein sequence to determine the probability that the sequence belongs to the modeled family. The most probable path through the HMM that generates a sequence similar to the new sequence determines the similarity score. The well-known Viterbi algorithm computes this score by dynamic programming. The computation is given by the following recurrences: M{i-\,j)

+

tr(MrIj)

M ( / , 7 - l ) + »-(M,_,,£>,)

/(/, j) = e(I j, s,) + max / ( i - l , j ) + fr(/y,/y) D(i,j) = max<WJ-V D{i-\,j) + tr{Drlj) DVJ-V 'M{i-\,j-\) M (i, j) = e(M j,si) + max I{i-\,j-\) D{i-l,j-l)

+ +

trUj-hDj) triDj^D,)

+ tr{Mt_x,M,) + +

tr{Ihl,Mt) tr(DM,Mj)

where tr{state\,state!) is the transition cost from state 1 to state! and e(Mj,Sj) is the emission cost of amino acid Sj at state Mj. M(ij) denotes the score of the best path matching subsequence s\...s, to the submodel up to state j , ending with s, being emitted by state Mj. Similarly I(iJ) is the score of the best path ending in st being emitted by /,, and, D(ij) for the best path ending in state Dj. Initialization and termination are given by M(0,0)=0 and M(n,m+\) for a sequence of length n and an HMM of length m.

783

3

The hybrid architecture

We have built a hybrid MIMD-SIMD architecture from general available components (see Fig. 2). The MIMD part of the system is a cluster of 16 PCs (Pentiumll, 450 MHz) running Linux. The machines are connected via a Gigabit-per-second LAN (using Myrinet M2F-PCI32 as network interface cards and Myrinet M2L-SW16 as a switch). For application development we use the MPI library MPICH v. 1.1.2.

T: +: +:^r +:^:^r^r

^

Systola 1024 board architecture

t t t t t t t t High speed! Myrioet switch

0717TT

tXIxxJj-J Figure 2. Architecture of our hybrid system: a cluster of 16 PCs with 16 Systola PCI boards (left). The data paths in Systola 1024 are depicted on the right.

For the SIMD part we plugged a Systola 1024 PCI board into each PC. Systola 1024 contains an Instruction Systolic Array (ISA) of size 32x32. The ISA [3] is a meshconnected processor grid, where the processors are controlled by three streams of control information: instructions, row selectors, and column selectors. The instructions are input in the upper left corner of the processor array, and from there they move step by step in horizontal and vertical direction through the array. Every processor has read and write access to its own memory. Besides that, it has a designated communication register (C-register) that can also be read by the four neighbor processors. The ISA combines advantages of fine-grained SIMD machines with the capability of efficiently performing so-called aggregate functions. These are associative and commutative functions to which each processor provides an argument value. Examples for aggregate functions are broadcast, ringshift, sum and maximum along the rows or columns of the processor array. These are the key operations within the algorithm presented in the next section. 4

Mapping onto the hybrid architecture

Mapping of the database scanning application on our hybrid computer consists of two forms of parallelism: a fine-grained on Systola 1024 and a coarse-grained on the PCcluster. While the Systola implementation parallelises the dynamic programming computation in the Viterbi algorithm, the cluster implementation splits the database into pieces and distributes them among the PCs using a suitable load balancing strategy. In Fig. 3 we are presenting two ways to map the dynamic programming calculation to a linear array of processing elements (PEs): "diagonal-by-diagonal" and "row-by-row".

784

Both mappings can be efficiently implemented on the ISA taking advantage of the broadcast, ringshift and maximum aggregate functions described in Section 3 (see full version of this paper for details). Table 1 reports times for scanning the SwissProt databank for HMMs of various lengths with the Viterbi algorithm. A single Systola 1024 board is 3 to 4 times faster than a single Pentium III, 1GHz. However, a board re-design based on state-of-the-art technology (Systola has been build in 1994) would make this factor significantly higher. (a) protein N o d e x sequence [ I ...P S —»| R J

mde

r^

2

m/lM

Node m

]

»( W j-»

j

j

(b) protein N f > d e sequence

H F I

-P S

1

N

2

^ ^

Noda

m

[jf

Figure 3. Parallelization of the Viterbi algorithm on a linear processor array. Each node of the HMM is assigned to one PE. In (a) the sequence is shifted systolically through the PEs. In each step, the computation of a single diagonal in the dynamic programming matrices M(ij), I(ij) and D(ij) is performed in parallel ("diagonal-by-diagonal"). In (b) a character of the sequence is broadcasted through the array in each step. The computation of a single row in the dynamic programming matrices is performed in parallel ("row-by-row"). Table 1. Scan times in seconds of SwissProt (release 40, 113997 protein sequences) for various HMM lengths with the Viterbi algorithm on Systola 1024, a PC cluster with 16 Systolas and a Pentium III 1 GHz.

~HMM length Systola 1024 (speed up) Cluster of Systolas (speedup) Pentium III 1 GHz 5

~

Th 2(3) (37) 478

" 222 288(3.5) 22 (45) 994

"~"^^ 546 (4) 40 (56) 2243

Conclusions and Future Work

In this paper we have demonstrated that hybrid computing is very suitable for scanning bio-sequence databases with HMMs. By combining the fine-grained ISA parallelism with a coarse-grained distribution within a PC-cluster, our hybrid architecture achieves high performance at low cost. We have presented the design of two ISA algorithms that lead to a high-speed Viterbi implementation on Systola 1024. The exponentially growth of genomic databases demands even more powerful parallel solutions in the future. Because comparison and alignment algorithms that are favored by biologists are not fixed, programmable parallel solutions are required to speed up these tasks. As an alternative to special-purpose systems, hard-to-program reconfigurable systems, and expensive supercomputers, we advocate the use of specialized yet programmable hardware whose development is tuned to system speed. References 1. Grate L., Diekhans, M., Dahle D., Hughey R., Sequence Analysis with the Kestrel SIMD parallel processor, Pacific Symposium on Biocomputing (2001) pp. 263-274. 2. Krogh, A., et al., Hidden Markov models in computational biology: applications to protein modeling, JMB 253 (1994) pp. 1501-1531. 3. Kunde, M., Lang, H.-W., Schimmler, M., Schroder, H., Schmeck, H., The ISA and its relation to other models of parallel computers, Parallel Comp. 7 (1988) pp. 25-39. 4. Schmidt, B., Schroder, H., Schimmler, M., A hybrid architecture for bioinformatics, Future Generation Computer Systems 18 (2002) pp. 855-862.

TABU SEARCH AND SIMULATED ANNEALING ON THE SCHEDULING OF PIPELINED MULTIPROCESSOR TASKS M. F I K R E T E R C A N Singapore Polytechnic, School of Electrical and Electronic Engineering, E-mail: [email protected]

500 Dover Rd.,

Singapore

YU FAI FUNG The Hong Kong Polytechnic

University, Department of Electrical Engineering, Kowloon, Hong Kong S.A.R. E-mail: [email protected]

Hung Horn,

Parallel computers that can run multiple parallel algorithms simultaneously are generally targeted for applications where operations are periodic. The performance of those systems however considerably depends on the efficient scheduling of tasks. This paper evaluates the solution quality of the two metaheuristic algorithms developed for scheduling multiprocessor tasks on these systems. The reduction achieved over the completion time of these tasks by the metaheuristics has been studied for various task parameters and machine configurations.

1

Introduction

In many real-time applications, such as machine vision, robotics, and power system simulation, two main characteristics of parallelism co-exist. These are spatial parallelism and temporal parallelism[l]. In order to speed up operations these algorithms performed on multiprocessor machines. However, this unique computing structure can be better exploited if it is executed on a multiprocessor environment that can execute multiple parallel programs simultaneously. This class of computers can be named as multi-programmable systems (MPSs) and they are made of either a pool of processors that can be partitioned into processor clusters or processor arrays prearranged in multiple layers [1,2]. A MPS platform can simultaneously execute a number of parallel algorithms on independent processor arrays and provides data exchange between them. Hence, by making use of spatial parallelism, algorithms can be split into smaller grains and when computations are repetitive, temporal parallelism can be exploited. This results in a set of pipelined multi-processor tasks (MPTs) to be performed on a MPS. That is algorithms at each level of precedence relation tree can be mapped into a processing layer of a MPS and executed simultaneously to create a pipelining effect. A single pipeline, made of MPTs, will be named as a job and we assume that there is no precedence relation between them. This paper deals with the problem of efficient scheduling of multiple jobs on a MPS. The objective of

785

786

the scheduling is to find a sequence of jobs that can be processed on the system in minimum time. Two metaheuristic algorithms were introduced for the solution and their performances are evaluated. 2

Simulated Annealing and Tabu search algorithms

A frequently used algorithm is Simulated Annealing (SA), which simulates annealing during search process [4]. Tabu search (TS) is another local search method, which is guided by the use of adaptive memory structures [3]. In order to, apply SA and TS to a practical problem several decisions have to be made and they are briefly described in the following. For SA, an initial solution is generated by setting all jobs in ascending order of their indices. A neighbour of the current solution is necessary to define. We employed interchange neighbourhood, which swaps two randomly chosen jobs in the job list. We found that this method performs well compared to others that we have experimented. A simple cooling strategy is employed in the algorithm. Temperature is decreased through an exponential manner with Tt = XTt_{ where X < 1. In our implementation a X value of 0.998 is selected following a series of experiments. The initial value of temperature is selected using T0 =

— where AEavg is the ln(x0) average increase in the cost for a number of random transitions. Initial acceptance ratio, x0, is defined as the number of accepted transitions divided by the number of proposed transitions. Initial temperature estimated after 50 randomly permuted neighbourhood solution of the initial solution. For the TS algorithm, we employed interchange neighbourhood. In the tabu list, we keep a fixed number of last visited solutions. Tabu list updating uses the elimination of the farthest solution stored in the list method. We have employed a fixed number of iterations as stopping criterion for both algorithms. Objective function is defined as, the minimal value obtained for the completion time of all jobs for both algorithms. 3

Computational study

During computational experiments, the stopping criterion for heuristics was defined as a fixed number of solutions visited to make sure a comparable computational effort committed by each heuristic. This number has been set at 5300. Problem sets were generated randomly for various processing time ratios and different processor configurations. For each combination of

787

processing time ratio and processor configuration of the architecture 25 problems were generated. Results are presented in terms of Average Percentage Deviation (APD) of solution from the lower bound, which is shown in our earlier study [2]. For comparison, a list-based heuristic (LH) which sorts jobs in ascending order of their indexes is also considered. The APD of each heuristic is presented in Table 1. From the computational study, we can conclude that in most of the cases, SA and TS found a reasonable solution and the completion time of jobs were minimized as high as 81 percent and as minimum as 14.5 percent. Figures 1 and 2 illustrate the convergence curves of both algorithms for some selected problems. The performance achieved by both SA and TS were quite similar, though, in most of the cases SA delivered a slightly better result. In addition, when the convergence curves of both algorithms are analysed, it can be seen that TS converges more slowly than SA. In most of the cases, SA converged to a reasonable solution within 500 iterations for ml = m2 processor configuration while TS within 1000 iterations. Simulated Annealing J=10

Simulated Annealing J=10

—•— m1=2,m2=2 *

m1=4,m2=4

A

m1=8,m2=8

• -

o15 Q.

< 10 5 i - - • - -m " i t • A > J - » - • • • • • * * - - • - - - - -

W T

>V^^_,

0 c o m m T j - ( D c o o u i r o < o ^ TTTcvj m m

Figure 1. Convergence of Simulated Annealing.

<

30 -m1=2,rre=l

- * — m1=2,rr£=2

-m1=4,rre=2

—*—m1=4,nn2=4

EX

-m1=8,rr£=4

5£

15 10

—•—m1=8,rr£=8

(%

°20

Tabu Search J=10

Tabu Search J=10

3

45 40 35 30 £25

Q.

<

^-*

5

•—- •1

f\.»v.

f

^

0-

0

CM 1-

1-

Q

CO

le rations

SI 5 S 8 co T3 to 01

Figure 2. Convergence of Tabu Search.

co

co

m

1-

10

788 Table 1. Average percentage deviation of each heuristic algorithm

Job

Machines 2:1 LH

4

SA

TS

Machines: 4:2 LH

SA

TS

Machines 8:4 LH

SA

TS

10 30 50

13.93 0.94 1.14 0.52 0.47 4.05 0.44 0.42 2.89 Machines 2:2 LH SA TS

24.62 4.21 5.06 12.29 2.64 3.41 8.98 2.3 1.71 Machines: 4:4 SA LH TS

35.87 10.1 10.7 16.96 4.65 5.82 13.19 4.53 5.59 Machines 8:8 LH SA TS

10 30 50

21.04 13.95 9.9

27.7 16.51 11.86

32.03 20.66 16.2

2.62 2.49 2.69

3.28 3.25 3.06

7.07 3.5 4.2

8.49 4.69 4.79

8.88 7.81 7.0

9.21 9.32 7.96

Summary

In this paper, job-scheduling problem on a MPS is considered. A job is made of interrelated MTSs and modelled with its processor requirement and processing time. Two metaheuristics have been implemented for the solution of this problem and their performance were evaluated based on their capacity to shorten overall schedule. The results show that metaheuristics provided a significant improvement. In our further studies, a more generic instance of the problem will be explored. In addition, a comparison with another well-known metahuristic, genetic algorithm, will be conducted. References 1. Cantoni V. and Ferretti M., Pyramidal Architectures for Computer Vision (Plennium Press, New York, 1994). 2. Ercan M. F., Oguz C. and Fung Y. F., Scheduling image processing tasks in a multi-layer system, Computers and Electrical Engineering 27/6 (2001) pp. 429-443. 3. Glover F., Taillard E. and de Werra D., A user's guide to tabu search, Annals of Operations Research 41 (1993) pp. 3-28. 4. Kirkpatrick, S., Optimization by simulated annealing - Quantitative studies, J. Stat. Phys. 34 (1984) pp. 975-986.

TME - A DISTRIBUTED RESOURCE HANDLING TOOL

TOSHIYUKIIMAMURA, YUKIHIRO HASEGAWA AND NOBUHIRO YAMAGISHI Center for Promotion of Computational Science and Engineering, Japan Atomic Energy Research Institute, 6-9-3 Higashi-Ueno, Taitoh-ku, Tokyo 110-0015, Japan E-mail: (imamura, hasegawa, yama}@ koma.jaeri.go.jp HIROSHI TAKEMIYA Hitachi Tohoku Software, Ltd., 2-16-10 Honcho, Aoba-ku, Sendai, Miyagi 980-0014, Japan E-mail: [email protected] TME, Task Mapping Editor, is developed for both handling distributed resources and supporting the design of a distributed application. On the TME console, a user can design a workflow diagram of the distributed application like a draw tool. Since all resources are represented as icons on the TME naming space and data-dependency is defined by a directed arrow linking icons, TME realizes a higher-level view of schematizing the structure. Furthermore, it has an importing mechanism for user-defined applets and some valuable built-in applications. TME provides users greatflexibilityin distributed computing.

1

Introduction

The progress of distributed processing is permeating successively with the spread of Grid [1] and development of the globus toolkit [2]. On the Grid environment, geographically dispersed resources, machines, databases and experiments can be handled uniformly with higher abstraction. Simplification of the interface, handling any resources, and construction of the coupled services become important issues for the next step. The visual support that a GUI offers is more efficient than a script. It supports not only intuitive understanding but also design of the combination of multiple resources. In addition, it enables users to detect the structural bottleneck and errors via real-time monitoring of execution circumstance. Some projects that intend a GUI-based steering and monitoring on the Grid environment are well known. WebFlow [3] is a one of the pioneers of visual designing tools for distributed computing. GridMapper [4] is also one of GUI tools, and it focuses on the monitoring of geographical network. UNICORE project [5] intends to connect multiple computer centers and it supports to illustrate the dependency among data and jobs in order to realize a meta-computing facility. From these trends of the related works, common features, visual design, automatic execution, monitoring, and flexible integration framework can be considered as the key technologies for increasing usability of the Grid.

2

TME (Task Mapping Editor)

The Task Mapping Editor, TME, was originally developed as a component of STA, Seamless Thinking Aid project [6], STA consists of three components, the development environment, communication infrastructure and TME. The goal of TME is to support combining and allocating distributed resources, and to provide users with higher level viewing of the structure of applications and so on. The basic architecture of TME layers four structures, GUI, control, communication, and adaptor layers, i) The GUI layer faces user and some components for design and

789

790

monitoring are facilitated, il) The control layer comprises five components, Proxy Server (PS), Tool Manager (TM), File Manager (FM), Exec Manager (EM) and Resource Manager (RM), and they perform the communication control between distributed machines, the management of any components on TME, handling of distributed files, executing and monitoring the application on local/remote machines, monitoring the state of batch systems and the load of interactive nodes, respectively, iii) The communication layer relays the RPC-based requests from the control layer to the adapter layer deployed on distributed machines via PSs. iv) The adaptors play a role of an outlet between RPC called from EM and the existing applications. Rest of this section shows the features of TME corresponding to four trend issues that are shown previously. 2.1

Visual Design of a Distributed application

TME is an editor that supports to draw specifications of distributed application graphically as shown in Figure 1 and 2. On the TME console, any resources, for example, data files, programs and devices, are expressed as icons with higher abstraction on the TME naming space. By linking program icons and data icons, users can specify data dependencies and execution order. Thus, TME requires users little knowledge of distributed architecture and environment, and they can handle the resources as on a single computer. Parallelism and concurrency can also be taken advantage of the nature of the data-flow diagram. Consequently, TME supports the intuitive understanding for the structure of distributed applications. 2.2

Supporting Automatic Job Submission

Users can choose computers on which they submit jobs interactively. According to the execution order of programs and required data files as defined on the TME console, the RM and EM assign and invoke programs, and transfer data onto specified machines. Complex applications that comprise multiple executables and a lot of data transfer can be encapsulated and automated on the TME system. 2.3

Job Monitoring and Scheduling

EM watches the state of the application and it sends a report to the TME control system, thus user can monitor the application virtually on real-time. On the monitoring console, any resources and dependency are represented as icons and lines as well as the design console. Thus, it helps detecting structural bottlenecks. EM and RM also coordinate semi-automatic resources mapping. When user registers his or her scheduling policy and priority, the user can take advantage of efficient allocation and reservation of the resources. This function optimizes the economical restriction under the circumstance where accounting is a severe problem. 2.4

Flexible Integration Framework

On the TME environment, various kinds of built-in components are integrated. Main built-in components are 'blocking 10', 'pipelining 10', 'tee (duplicate the output stream)', 'container (sequential list of input/output files)', 'text browser', 'GUI layout-composer', 'simple data-viewer', and so forth. This mechanism is realized by the development

791 framework of the STA. The applet developed by a user is also registered as a new built-in component and it can be shared among other users. In addition, the GUI layout composer separates the design phase and the execution phase from the task of an application user. It means that TME facilitates a common frame or a collaborative space between the developer and the application user. 3

Examples

We designed several applications by using TME. In this section, three applications are presented as examples of TME applications. 1. The radioactive source estimation system requires high response capability, speed and accuracy [7], The core simulations should be carried out on the most calculation servers available and on the most sites available in order to minimize the calculation time. This application is designed as Figure 1 (left) on TME. Users can easily specify spawning slave processes over distributed sites by linking master and slave icons. 2. Data analysis for nuclear experiments also requires an acquisition of light-loaded computing resources. In this case, huge observed data is recorded to the DB server at every shot and physicians analyze the data that they want to find specific phenomena. This work is also formulated on TME as Figure 1 (right). 3. In bioinformatics field, one of typical analysis methods is to search the specified genome pattern from huge amount of DNA data. However, it is realized by the combination of a lot of applications and databases. On the TME console, such a complex procedure can be defined as Figure 2 and the user may simplify the whole process. The integration of the applications and database may further reduce the burden of bioinformatics analyses.

Figure 1. Data-flow of the radioactive source estimation system and the data analysis for nuclear experiments

Figure 2. Data-flow of the genome sequence analysis system and a snapshot of a visualization tool

792

4

Discussions and Current Status

Cooperation with the Grid middleware is also one of great interests for developers and users. Currently TME uses the nexus library developed by ANL, the predecessor of globus-nexus in the globus toolkit (GTK). Since in GTK version 2 or later, communication and security framework are shifting to that based on OGSA, future release of TME will adopt a globus-based communication infrastructure and collaborate with many Grid services. The main feature of TME is the description by data-flow; however, the control-flow mechanism has been introduced in the latest version of TME for the expert developers. This extension supports the structure of conditional branch and loop, and it enables the development of more advanced applications. Authors believe that it can contribute to the development of advanced PSE (Problem Solving Environment) on a distributed environment, which is one of ultimate goals for Grid computing. For definition of the distributed application, TME adopts a subset of XML. XML is used for the description for universal documents; however, it is also a powerful tool for the definition of a distributed application. Using the XML format suggests that other tools like an emulator, a debugger, and a scheduler share the TME components. This extension is a significant issue for the next stage of TME. 5

Conclusions

TME supports a design and handling of the computational resources distributed over several sites. It has a higher-level view of schematizing the structure and it helps the intuitive understanding of applications. Automatic submission and monitoring can improve efficiency of jobs. In addition, the framework that makes the integration and co-operation with various user defined functions suggests the possibility of the collaboration environment and PSE. TME is going to improve the scalability and reliability. Authors would like to contribute the advancement of Grid computing through the development of TME. Finally, authors would like to thank Dr. Kei Yura and Prof. Dr. Hironobu Go for their support in construction of bioinformatics applications and databases. References 1. Foster I. and Kesselman C. eds., The Grid: Blueprint for a Future Computing Infrastructure (Morgan, 1999). and activities of GGF in http://www.globalgridforum.org/ 2. The Globus home page, http://www.globus.org/ 3. Bhatia D., et al. "WebFlow - a visual programming paradigm for Web/Java based coarse grain distributed computing", Concurrency - Practice and Experience 9(6) (Wiley, 1997) pp. 555-577. 4. Allcock W. et al., "GridMapper: A Tool for Visualizing the Behaviour of LargeScale Distributed Systems". IEEE HPDC-11, Edinburgh (2002) pp. 179-187. 5. Erwin D.W, "UNICORE - A Grid Computing Environment", Concurrency and Computation: Practice and Experience (Wiley, to appear 2002) 6. Takemiya H., et al., "Software Environment for Local Area Metacomputing", Proc. Intl. Conf. SNA2000, Tokyo (2000). 7. Kitabata and Chino: "Development of source term estimation method during nuclear emergency", Proc. Intl. Conf. M&C99, Madrid (1999).

PROTECTING INTEGRITY IN A DISTRIBUTED COMPUTING PLATFORM TAY TENG TTOW, CHU YINGYI Department of Electrical and Computer Engineering, National University of Singapore, [email protected],

[email protected]

This paper proposes a software detection scheme based on a Self-Signature technique, to address the integrity protection problem in an open Internet distributed computing platform. While it is not possible to guarantee complete protection, under the scenario that a malicious host has full access to an executing program, we analyze our proposed Self-Signature method in terms of the detection probability for such malicious behaviors. The strength of the protection is discussed in detail.

1

Introduction

The distributed computing systems of [1], [2], [3] provide platforms that support Internet distributed computing. As these distributed computing designs operate in an open and heterogeneous environment of the Internet, security is a key issue and is a significant hindrance to their applicability. There are two separate issues. The first is the protection of the servers from malicious programs. The second issue, the protection of the distributed program, can be divided into two parts. The first part is secrecy, which requires the content of data and semantic of codes in the program be shielded from the remote server. This risk is reduced somewhat by ensuring that no one server receives the complete set of private information. Traditional encrypted and authenticated channels are also utilized to enforce this kind of protection. The second part is on integrity, which requires the un-tampered execution of the distributed program by the un-trusted servers. Violation of integrity must be detected as early as possible to recognize malicious servers and avoid using the wrong results. In this paper we focus on this aspect of the security problem. The programs to be distributed to un-trust servers are incorporated with a Self-Signature algorithm before distribution, so that the re-casted program is able to perform interactive self checking throughout its execution on un-trusted servers. The proposed integrity protection scheme effectively reduces the risks of both purposeful and random tampering. 2

Integrity Protection in a Distributed Computing Platform

The computation procedure involves the originating client host and the

793

794

server hosts, which are recruited to perform its tasks. The task is divided into several parallel objects, each of which is sent to a server to execute. During task execution, the client periodically requests servers to send back intermediate image of the program. The image includes the state of the task, that is, the current status of the program and all intermediate results. These information allow the host to restart the execution of the program from that point. To deal with the integrity problem, integrity protection methods are introduced to assuring the un-tampered execution of the distributed programs. There are three general approaches; organizational, hardwarebased, and software-based solutions. Our proposed scheme is based on software detection of integrity violation. When an originating host receives result returned from clients, it decides if the client had behaved maliciously on the assigned program and whether the returned results are trustable. The detection of tampering is based on a Self-Signature method discussed next. 3

Detection of Tampering

This is done through a Self-Signature technique. Every distributed bytecode is incorporated with the Self-Signature algorithm. The bytecode uses its own codes as input to calculate the signature. Any distributed bytecode executes the Self-Signature algorithm in every phase of its execution, in addition to the ordinary computations. The computed signatures are returned to the client host in the intermediate image at the end of each phase. The client host will check these signatures for their correctness. If some parts of the bytecode which is tampered are reflected in signatures, an inconsistency would be observed. Self-Signature algorithms are inserted into the distributed bytecode in a pre-processing of the original bytecode. Thus the calculation of the signatures becomes a part of the bytecode to be distributed to server hosts. Let the tampering behavior of a certain potentially malicious client be characterized by a probability vector pt = [pt (1), ••-,/?, (")], where pt (/) is the probability that the ith bit of the bytecode is changed. Essentially, there is an independent binary random variable ti associated with each bit i of the program, i = \,--,n . The two values of tj, 0 and 1, represents the bit is not tampered and tampered respectively. Let us define the tampering

795

intensity of the client as T = ^ pt (i). Let us represent the property of a Self-Signature algorithm S by a vector Ps = [PsO)>'">Ps(n)]> where ps{i) is the probability that the ith bit of the bytecode, if changed, will be reflected in the signature using the algorithm S. Similar to the probabilistic property of tampering behavior, there is an independent binary random variable st associated with each bit i of the program, i = 1, • • •, n. Probability ps (/) represents the probability that the binary random variable st = 1, i = \,--,n. Let us assume that ps (1) = • • • = ps (n) = ps. We further assume random variables t( are independent of the self-signature property Sj. We define the Detection Probability (Pa) of Self-Signature algorithm as the probability that the algorithm detects the tampering of the program, given the probabilistic property of tampering behavior. For the SelfSignature method, the detection probability is the probability that the tampering of a program is reflected in the signature. Lemma 3.1: If the probability that each bit is tampered is Pt = [/ ? ( (l)> - " 5 Pr(")]> a n d the probability of each bit to be reflected in signature is ps, then the Detection Probability Pd is:

1-ftp-P,/>,(0] P„=

~n

-(3-D

i-Ilti-^CO] 1=1

Theorem 3.1: Given the probability property of signature procedure ps and the length of program n, for any tampering behavior of intensity T, the lower bound of the Detection Probability Pd is 1-(1 —)". n For example, for a program of 5k byte if ps = 0.5, and the tampering intensity T is 10, namely, the expectation of the number of tampered bits is 10, then the lower bound is greater than 0.993. 4

Integrity Protection Scheme

The key of the protection scheme is the insertion of a Self-Signature algorithm into the distributed bytecode. A procedure that implements a Self-Signature algorithm is inserted into every phase of the program. This

796

is done in a pre-processing of the distributed bytecode. In each phase of the execution, the transformed bytecode will perform both the workload computation and the signature calculation. At the end of each phase, the client host will request the client to return the current status of the execution. The signature computed from the Self-Signature algorithm is also included in the image among other intermediate results of this phase. All the signatures can be re-calculated at the client host to check for correctness. The Self-Signature algorithm is a function / from the bytecode to a signature value. We use the Crypto-Computing technique of [9] to encrypt the Self-Signature function such that the function cannot be determined in polynomial time. Crypto-Computing theories deal with the problem of "non-interactive evaluation of encrypted functions (EEF)". It ensures security under the constraint that the function/is a polynomial in Z/NZ as follows. "Let E be an additively homomorphic encryption scheme on Z/NZ." Then the above method "realizes non-interactive EEF for polynomials / e Z/NZ[XV---,XS]. Assume further that the used encryption scheme (E) is polynomial time indistinguishable. Then no information about f except its skeleton." {Skeleton of a polynomial / is the set of monomials of / with coefficients.) The above scheme using achieves the Detection Probability of 3.1 only if the two assumptions of Section 3.2 are satisfied. For the first assumption that the probabilities ps{i) for each

is leaked non-zero Theorem bit to be

reflected in signature are equal, this can be satisfied if the Self-Signature algorithms are chosen randomly. Note that for each Self-Signature function f{{bx,---,bn)), there is a counterpart function for each possible permutation f((b,---,b)).

Q> ,---,b

)

of the input bits, that is, the

function

In our scheme the client host is able to satisfy this

assumption by randomly selecting the Self-Signature functions. The second assumption is that the tampering behavior (random variables, t{) is independent of the Self-Signature property (random variables, si), is satisfied if we have P{ti = l]^. = 1) = P{ti = 1) = pt (i). When a SelfSignature function has acquired all the necessary parameters and is ready to run, whether each bit is to be reflected in signature or not is deterministic, that is, Sj equals either 0 or 1. However, if the server does

797

not know the values of Sj when it is tampering the program, then the above assumption P(ti -Isj

= 1) = P(ti = 1) is satisfied.

A malicious server host can acquire the above information either before execution or during runtime. Firstly, the server host is able to analyze the distributed program or conduct pre-computation of the program before running it for the client host. Our scheme prevents the server from obtaining any information about Sj prior to execution because the SelfSignature function parameters are sent to the server only at runtime. Secondly, the server host may analyze the Self-Signature function at runtime. Our scheme prevents the runtime analysis using a phase deadline for each phase of the execution. The function parameters are sent to the server at the beginning of each phase in the execution of the distributed program. With these information, the server host then can determine the s. of the Self-Signature function to be computed in this phase. However, since the Self-Signature function is encrypted, and cannot be completed within polynomial time, a phase completion deadline can be set, which require the server to complete the computation of this phase within the deadline. If the intermediate image for this phase is returned after the phase deadline, then it will be considered unsafe. The phase deadline is determine to be longer than the ordinary execution time of the phase, but shorter than the time to break the Self-Signature function. The whole bytecode is viewed as a series of bits: bi, b2, ..., bs. Thus, the Self-Signature algorithm is implemented as a polynomial function

p = !*>..<#-KReference: 1 Nisan, N.; et al, (1998), Proc 18th Int Conf DCS. 2 Cappello, P.; et al, (1998) Proc 3rd Conf MPPM. 3 P. Liu, (2001), Master Thesis, National University of Singapore. 4 Ronald, E.M.A.; Sipper, M., (2000) Computer. 5 Sander, T., (1998), Proc 9th Int Sym on Software Reliability Eng. 6 Farmer, W; Guttmann, J; Swarup, V, (1996), Proc ESORICS. 7 http://www.genmagic.com/ Telescript/Documentation/TRM/. 8 Palmer, E, (1994), Proc M P SEC'94 Conference. 9 Sander, T.; Tschudin, C.F.,(1998), IEEE Sym on Security and Privacy. 10 Low, D., (1998), Master Thesis, University of Auckland. 11 Collberg, C ; Thomborson, C ; Low, D., (1998) IEEE ICCL. 12 Aguilar, J.; Hernandez, M., (2000), Proc 8th ISMASCT.

AN INTEGRATED DISTRIBUTED COMPUTING PLATFORM ON A DECENTRALIZED ARCHITECTURE TAY TENG TIOW, CHU YINGYI Department of Electrical and Computer Engineering,National University of Singapore.Singapore, 119260. Email:[email protected],

[email protected]

This paper proposes an alternate distributed computing scheme. The proposed scheme has a fully decentralized architecture and addresses the scalability problem at the fundamental level by removing all dedicated component in the system. Every host participating in the scheme is identical in function. The input to the system, via a graphical user interface, is assumed to be standard, single machine oriented Java program. This paper describes each layer of the platform that allow the said input program code to be executed without any further active user intervene.

1

Introduction

Distributed computing or grid computing is the emerging platform for many computational intensive applications. Most previous proposals such as [1], [2] use dedicated components to organize the participating machines. Our proposal removes any dedicated component in the system. Every host participating in the scheme is identical in function. To support our scheme, a set of network protocols based on group communication is proposed to acquire resources. This may be viewed as a self-broking scheme, in the language of [5]. A key consideration in such a scheme is the reliability of the recruited nodes in getting the assigned tasks done. To address this concern, we use a similar design philosophy as in TCP/IP, in that performance of the recruited hosts is on a best effort basis. The runtime layer then implements a progress monitoring and migrating protocol to determine if the minimum performance measure is met and if not, to migrate the tasks assigned to an errant host to another host. The application layer of any distributed computing platform provides two basic functions. Firstly, it provides the method to map computing requirement to available hosts. Secondly, it provides a method to produce distributed applications that run on the specific platform. In our proposal, the assignment of computations is implemented in the distributed application itself based on the communication affinities. For the second aspect, support for the development of distributed applications is an important issue. In our system, the distributed code generation function automatically ports standard concurrent programs running on single machine to a form that runs on the network. The method is proposed in our previous work [10] and is summarized here.

798

799

2

System Architecture

Every host participating in our scheme is identical in function, and each host can act as a requesting host and/or a contributing host, and possibly both at the same time. Computing hosts can join or leave the scheme as and when desired without the need to register their presence or absence with any controller. Any distributed application running under the scheme involves a group of coordinated nodes taking on roles as a server and/or as a client. As a client, it first recruits available hosts in the system using a group communication protocol. After it receives responses from enough contributing hosts, the client will submit the application to run on the recruited hosts. The recruited hosts take the roles of compute servers. To participate in the computing scheme, a node launches the JDGC software described in this paper. The JDGC software integrates the functions of a client, namely requesting resources and submitting applications, and a server, namely responding to recruiting requests. The JDGC software system consists of three layers; namely, the network layer, the middle layer and the application layer (See Figure 2.1).

Distributed Code Generation Output: generated| System Submission distributed c<

Application Layer

^"£

Runtime Application Manager Network Layer recruit

\1 Code | | Code [

Fig.2.1 Components of Java Distributed code Generating and Computing (JDGC) Platform

object creation

Network Layer protocol and API

3

1^-

objectTOtrmmnicariont

The Network Layer

The recruitment protocol provides the facility for a client application to acquire available server hosts in the system. This protocol is based on group communication and it is implemented in the Recruit() API function on the client side and the server daemon on the server side. The state transition diagrams for the two sides are depicted in Fig.3.1 and Fig.3.2. The other set of API functions in the network layer provides the facilities to create object and invoke their methods in a distributed environment. The RemoteCreate() function enables the application code to create an object on a specified networked machine. The RemoteInvoke() function enables the application code to invoke the method of an object on a different

800

machine. They will be incorporated into the distributed code that runs on the system. These functions only involve uni-cast communication . To facilitate the object backup and crash recovery, the network layer provides the Checkpoint) function. The function sends a message to the destination virtual machine, which in turn uses the Object Serialization interface [11] to transform the object into an array of bytes. The array records the current information of the object. The array is then returned to the code that invokes the CheckpointQ function.

Fig.3.1 Recruit() State Transition Diagram: The Client

Fig.3.2 Recruit() State Transition Diagram: The Server Daemon

4 The Middle Layer All user applications in the JDGC platform are object-oriented Java applications. The manager is the runtime environment in the software system that supports the running of these applications. For every participating host, all the applications and recruited hosts are recorded and organized in the runtime application manager in the local JDGC platform. It always contains the current scenario/status of the dynamic system. The data structure for runtime application management contains three kinds of nodes, namely application nodes, object nodes and host nodes. Every application node has a reference to a vector of object nodes, which represent the objects existing in the application. Every object node has a reference to a host node, which represents the host where the object resides. All the host nodes are linked in a global host list. Thus all these nodes are organized in two global cross-linked lists An important function provided in the runtime application manager is backing up and crash recovery. The system will backup the objects in the local host during the lifetime of the objects. The backup image information will be held in each object node. During an invocation, if the remote host crashes, does not response or has other exceptions, the system will automatically retry the invocation for n number of times. After that, the remote host or server daemon is considered crashed. The system will follow a similar procedure as the object creation procedure to select a host and migrate the object using the backup image information in the

801

corresponding local object node. After the object is created, the system reperforms the invocation. All these will be done by the system automatically without the user's intervention. 5

The Application Layer

The application layer provides an integrated environment for the operations of both client and server hosts. As a server host, the user can start the server daemon from the user interface to response to requests. As a client, the user can open and submit applications to run on the network utilizing idle computation resources in the system. The applications submitted from the user interface are standard single machine oriented programs. The code that runs on the distributed environment, is however distributed code, which is able to place the objects on networked machines and to handle communications among these objects. Therefore, the application layer of the JDGC platform integrates a function of distributed code generation using object level analysis. The distributed code generation using object level analysis is a preprocessing for the submitted standard concurrent application. The objective of the method is to efficiently map the concurrent application to the system's computation resources, and to make this process automatic. The method groups heavily communicating objects in the same machine so that they could use lightweight communication mechanisms. The method first extracts the detailed object level communication affinity metrics between the runtime objects. This is done with two levels of analyzer with an intermediate 3 dimensional affinity metrics. Then the method automatically ports the standard application to the form that runs on the JDGC platform with the aid of a Abstract Syntax Tree (AST) transformation. The details of this method are proposed in our previous work [10]. Output of the process is the bytecode incorporated with the network layer API. It can be submitted to the distributed environment and be managed by the runtime application manager. 6

Applications

The proposed JDGC platform is an integrated distributed computing system supporting Java applications. In this section we present two sample applications that run on the JDGC platform. - MultT: Multiplication of two matrices of floating point numbers. - Curve: Curve fitting for data using the least square error criteria.

802

The two applications are tested in different situations where one or more single threaded and multiple threaded applications are submitted to the system. In the single threaded case, all the computation is done within one object. Multiple such applications can be submitted, and their computations are distributed by the system to available hosts. In the tests for multiple threaded applications, the computation of an application is done in multiple concurrent objects that can be distributed to available hosts. The applications' average execution times are recorded for these situations. The units of execution time in the following tables are in seconds. Results are summarized in Tables 6.1 and 6.2. lable.6.1 1 he Average bxecution tinne or Multiple 1 hreaded Apolications MultT (2400)

Threads

Num of Hosts Execution time Speed up Curve Threads (3200, 10"6) Num of Hosts Execution Time Speed up

Single

Multiple (4)

Multiple (4)

Multiple (4)

1 1401.336 1 Single 1 1164.755 1

1 1582.071 0.886 Multiple (4) 1 1255.500 0.928

2 809.992 1.730 Multiple (4) 2 652.834 1.784

4 426.507 3.286 Multiple (4) 4 335.265 3.474

Table.6.2 The Overheads in Creation and Communication MultT (2400)

Threads Num of Hosts Creation Overhead Communication Overhead Curve Threads (3200, 10'6) Num of Hosts Creation Overhead Communication Overhead

Multiple (4) 1 11.827 2.205 Multiple (4) 1 10.619 0.366

Multiple (4) 2 13.197 15.678 Multiple (4) 2 11.702 11.347

Multiple (4) 4 13.252 20.067 Multiple (4) 4 11.969 25.874

REFERENCES 1] Christiansen, B.O.; et al, (1997), Concurrency: Practice/Exper, Vol 9. 2] Lewis, M.; Grimshaw, A., (1996), 5th IEEE ISHPDC. 3] Michael O. N, et al, (2000), Euro-Par 2000. 4] Noam Nisan, (1998), IC. Distributed Computing Systems. 5] Neary, M.O.; et al, (2000) Concurrency: Practice /Exper, Vol 12. 6] Brecht, T., et al, (1996), 7th ACM SIGOPS. 7] Shmulik L, (1998), MSc Thesis, Hebrew University of Jerusalem. 8] Fahringer, T., (2000), IEEE IC Cluster Computing. 9] Baratloo, P., et al, (1999), HCW'99, IPPS/SPDP 1999. 10] Yingyi Chu, (2002), Master's thesis, National University of Singapore. 1 l]Darby, C , (1997), WEB Techniques, Vol 2, Issue 9.

GRID BASED PROBLEM SOLVING ENVIRONMENT FOR SCIENTISTS EMILDA SINDHU, UVARAJ PERIATHAMPY, MURALI KANTHARAJ Institute of High Performance Computing, 1 Science Park Road, E-mail: {emilda, uvarafmurali] @ ihpc. a-star. edu. sg

Singapore

Building Grid-based Problem Solving Environments (PSE) for scientists is a challenging task. Providing access to various Grid services or to provide access to scientific applications on the grid is not usually a simple matter for PSE developers. Using commodity grid toolkits (CoG Kit's) and other open source software's can overcome a part of the difficulty. A convenient interface to PSE is to design a web portal for a scientific domain or for a particular scientific problem. In this paper we will be discussing about the PSE we have developed for solving large-scale computational intensive problems in science and engineering. We will be discussing about the nature and implementation details of the services provided. The object of the portal is to facilitate access to large-scale, multi institutional, dynamic, distributed application environments for scientific research.

1

Introduction

High performance distributed computing is rapidly becoming a reality. Nationwide highspeed networks are becoming widely available to interconnect high-speed computers at different sites. Projects such as Globus [5] are developing software infrastructure for computations that integrate distributed computational and informational resources. The development of the next-generation problem solving environments (PSEs)[2] are influenced by the rapid advances in distributed computing and the emerging national-scale Computational Grid [1]. The explosive growth of the Internet and a broad spectrum of distributed computing technologies like RMI, CORBA, Jini, DCOM etc., have led to significant technology improvements that are important for the development of PSEs accessing large-scale computational resources. Simultaneously, the high performance computing community has taken big steps toward the creation of Grid [1,4]. Computational Grids have become an important asset in large-scale computing. It is emerging as a popular paradigm for solving large-scale compute and data intensive problems in science and engineering. The Globus toolkit-which has emerged as a de facto standard for Grid computing-is a communitybased set of services and software libraries for security, information infrastructure, resource management, data management, communication, fault detection and portability. The Globus toolkit is now central to distributed computing. The task of building Grid applications remains extremely difficult because there are only few tools available to support developers. The Commodity Grid (CoG) project is working to overcome this difficulty by creating what we call Commodity Grid Toolkits (COG Kits) [3] that define mapping and interfaces between the Grid and particular commodity frameworks familiar to developers. In this paper we will be discussing about the PSE we have developed for solving large-scale computational intensive problems in science and engineering. We will be discussing about the nature and implementation details of the services provided.

803

804

2

Problem Solving Environment for Scientists

The Java-based Commodity Grid Toolkit (Java CoG kit) defines and implements a set of general components mapping Grid functionality into a Java framework. This kit is very useful for developing PSE. This kit is extremely useful in providing access to sophisticated remote compute services/servers using lightweight Web interfaces or portals. The definition of PSE as given by [10]: "A problem solving environment is a computational system that provides all tools required for solving problems from a specific domain, interact with and visualize and analyze results". In the PSE we have developed the primary domain we are concentrating is 'problems requiring large-scale computation'. Our primary goal is to build PSE for scientists for solving large-scale computational intensive problems in science and engineering. For solving the above problem scientists may need to access remote resources, using a secure connection. The process of solving the problem is steered by the scientist, and the progress may be monitored, analyzed/visualized through the portal. Computational portals are emerging as the interface for performing operations on the Grid. Computational Grids have emerged as a distributed computing infrastructure for providing pervasive, ubiquitous access to a diverse set of resources ranging from highperformance computers (HPC) to tertiary storage systems to large-scale visualization systems. One of the primary motivations for building Grids is to enable large-scale scientific research projects to better utilize distributed, heterogeneous resources to solve a particular problem or set of problems. Grid portal provides application scientists with a customized view of software and hardware resources specific to their problem domain and provides a single point of access to Grid resources. It is a web based application server enhanced with the necessary software to communicate to Grid services and resources. 3

Implementation Details

The portal is developed using commodity off-the-shelf software and toolkits. The portal development leverages off existing Globus/Grid middleware infrastructure as well as web technology including Java Server Pages and servlets. The development of the portal is also based on Grid Portal Development Kit (GPDK) [6] and MyProxy [7] toolkits. Based on the Model View Controller design paradigm, the GPDK has proven to be an extensible and flexible software package. The GPDK integrates nicely with other commodity technologies including the open-source servlet container Tomcat and the Apache web server. Our portal development effort also makes use of the APIs of Java CoG toolkit for its Java implementation of client side Globus Grid services. The Grid information services are provided by Lightweight Directory Access Protocol (LDAP) [8,9], an open source LDAP server. The basic architecture of a Grid Application Portal is illustrated in Figure 1. The user makes a secure connection from the web browser to the portal web server. The portal server then obtains a certificate from the MyProxy server (proxy certificate server) and uses that to authenticate the user with the Grid. By taking advantage of the Myproxy package, users can use the portal to gain access to remote resources from anywhere without requiring their certificate and private key be located on the same machine/device running a web browser. The functionalities provided by SER Grid portal includes remote

805

program submission, file transfer, and querying of information services from a single, secure gateway. In addition, profiles are created and stored for portal users allowing them to track and monitor jobs submitted and view results. A snapshot of the Science and Engineering Research (SER) Grid Portal at Institute of High Performance Computing is shown in Figure 2.

Compute Resource

Web server

Client Web Browser

HTTPSo

Storage Resource TirnTIIHHIUHIHIIIIT

GSI

|A

Requestl

User Proxy Information Service

LDAP

Myproxy Server

Figure 1. Grid Portal Setup. b|.MJlUil,llll,M«»»««lin»l*!ffl

••IOIKI

3 <*&> |

Arf*««j€3 https//1b2158T1 18'sei/servet'ier

• E)up .

^ ] g^SwcliW* -£

j Ffe

E« « «

FjOW

3

J ;-t

a i

J

J- 4

—J

Institute of High Performance Gemputing

Seiense & Engineering lesearch Grid ,-*r«*-x User: rauid* Time left before session expires:

Logout]

UsefPrafie I

Submit Job

VitjwJobs J

ReTtansfef

Pesouicetnto J

Qu

00:25:14

Jobs Submitted: 6

View Resource Info from: Idap://192.168.11.18:2135/dc=ihpc, dc=nus, dc=edu, dc=sg, o=Gnd Hostname

cpucount

cpuloadS

manufacturer

0.26

i686-unknown-linuxoldld

ASOCucvjs Porta! mendd.ihpc.nus.edu.sg

« Bun*

H 5»'u«M»«« F i g u r e 2. Science & Engineering Research Portal Interface - A snapshot.

806

4

Conclusion and Future Work

The goal of our PSE for scientists is to provide the users of the area a central place providing seamless and efficient access to the virtual workbench with the needed tools and infrastructure. The SER grid portal takes advantage of the existing commodity software to develop a Web-based services portal for scientists to securely access a range of networked resources. Currently the SER grid portal provides remote job submission, file transfer, and querying of information services. In future, the PSE also provides facility to visualize and analyze results. This task would involve dynamic calculation, rendering calculations and display. Another area of future development is combine all the tasks (related to grid and other tasks related to data staging and result visualization) of scientists' in the problem domain under consideration, and make single point of access by creating a work flow system. References 1. I.Foster and C. Kesseslman, editors. The Grid: Blueprint for a Future Computing Infrastructure, Morgan-Kaufmann, 1999Barradas, Y., Diferentes tipos de competencia. Ecologia Austral 6 (1996) pp.55-63. 2. J.R Rice and R.F.Boisvert, "From scientific software libraries to problem-solving environments", IEEE Computational Science and Engineering, fall:44-53, 1996. 3. http://www.globus.org/cog. 4. G.C. Fox, D. Gannon, "Computational Grids", IEEE Computer Science Eng. Vol 3, No. 4, pp. 74-77 2001 5. I.Foster and C.Kesselman, "Globus: A Metacomputing Infrastructure Toolkit" International Journal of Supercomputing Applications. 11(2): 115-128, 1997 6. J.Novotny, "The Grid Portal Development Kit", IEEE Concurrency Practice and Experience, 2000 7. http://www.ncsa.uiuc.edu/Divisions/ACES/MvProxv/ 8. http://www.openldap.org 9. Netscape Directory and LDAP Developer Central. http://developer.netscape.com/tech/directory/index.html. 10. E.Gallopoulos, E. Houstis, and J.R. Rice. "Problem Solving Environments for Computational Science", IEEE Computational Science and Engineering, 1:11-23, 1994.

MANAGEMENT OF EJB APPLICATIONS USING JAVA MANAGEMENT EXTENSIONS JOONG-KI PARK, JOONG-BAE KIM, AND DUCK-JU SOHN 161 Gajeong-dong, Yuseong-gu, Daejon, 305,350 KOREA E-mail: [email protected], [email protected], [email protected] The availability of object middleware, such as EJB is rapidly being accepted as a mean for cost effective and rapid development of a wide range of the distributed applications. The distributed applications that are built using these technologies include many objects and can become rather complex. Therefore, the development of such complex distributed applications requires a significant improvement of management methods and pertinent tools. In this paper, first, EJB and JMX, which are two extensions of Java technology in the context of Telecom Management Systems and Application Management, are analyzed respectively. Second, an integrated distributed Telecom Network Management framework based on the EJB middleware components is provided. The EJBbased framework assumed to be deployed on J2EE servers and some management issues that are not covered by J2EE servers are also investigated. Third, based on this framework, an architectural model for managing EJB components using JMX technology is proposed.

1

Introduction

The rapid growth of networking technologies in recent years has created much more complex and heterogeneous network environments. To manage network devices and services in such complex environments, various organizations have developed management platforms (e.g. Internet management framework, OSI management framework, TMN [3]) using different kinds of management protocols (e.g. SNMP [4], CMIP [5]). Since there are many different network management schemes, which cannot be easily integrated, there have been many research efforts attempting to harmonize them with new technologies. For example, Joint Inter Domain Management (JIDM) is one of the organizations doing such integration. The goal of JIDM is to integrate CMIP, SNMP and CORBA technologies [3]. These integrations are vendor-oriented and currently there is no de facto standard for integrating different network management solutions. The Sun Microsystems tries to standard integration of different network management solutions by proposing EJBs as a common framework for integrating different network management solutions. The Sun Microsystems's EJB architecture for integrating different network management solutions seems to answer common challenges of distributed integrated network management such as scalability, security, reliability and availability by delegating these low level tasks to be managed by an application server (i.e. J2EE servers) in which EJBs components are executed [1]. However, the spread of EJB components based on distributed

807

808

applications has raised the need to analyze what management techniques are suitable to control and monitor the resources implied in their operation [7] [9]. These management techniques have the goal of improving the performance and the reliability of EJB applications and making easier configuration and security tasks, which are key issues to maximize the quality of service offered. When dealing with component-based server side applications (i.e. EJB), several management issues should be addresses such as Management of the component-based platform, Management of the applicationindependent functionalities, Management of application-dependent functionaries and Management of the underlying system and network resources [6]. Currently, the J2EE platform specification does not define how to solve the management problems commented above. As a result, management development team has to develop its own approach to the management of EJB applications [2]. To this end, adopted management architecture must be proposed based on some management standardizations. 2

Management of EJB-Based application using JMX

A Network Element management application only exposes management information of network element. There is no application management capability defined in EJB application platform. In this section some of the main management related aspects of an EJB-based application that is conceptualized within the scope of the Network Element Management system is described. The Network Element Management system is based on deploying EJB components to J2EE platform provided by Sun Microsystems. As mentioned earlier in this paper, J2EE platform does not provide all management functionality required for management of EJB components. For example, consider an EJB application that is implemented to collect alarm logs form network elements and store them into an alarm database. In deed, network management developers do this implementation. Then, assume that IT department proposes a new requirement that an EJB application must collect summary information from an alarm database and send it to a predefined e-mail address once a day. Since the J2EE model does not define a timer, and does not allow components to manage system resources such as threads, a programmer must include this functionality in custom classes, which are attached to the J2EE server. This implies a need

809

to separate application management from network management. This separation lets network management developers focus on developing only network management solutions and IT management developers focus on developing management systems and tools for managing those systems developed by network management developers. The following section describes how two aspects of the EJB management problem have been faced in the management of the conceptualized Network Element management application: • •

2.1

The design of the management architecture for an EJB application. The design of the management instrumentation of service-oriented management aspects within the managed application.

Architecture ofJMX Management to Instrument EJB-Based Application

JMX is chosen as the basis for the management architecture of the proposed Network Element Management application due to its straightforward integration with EJB [10]. Management information and operations of Java applications managed with JMX are made available through MBeans. According to the JMX specification, MBeans should be plugged into an MBean server that resides in the same Java Virtual Machine than the application. Nevertheless, this scenario covered by the JMX specification is not possible in most cases. In general cases, the J2EE server and the JMX MBean server have to be run in different JVMs. Thus, the management instrumentation of EJBs of the Network Element Management application in the form of MBeans seems to be useless, as they cannot be connected to the corresponding MBean server. To overcome this limitation, a double workaround is adopted [11]. First, for obtaining management information from the EJBs or for invoking management operations on them, the JMX Model MBeans are used. These Model MBeans are dynamic MBeans that can be used to instrument Java code at run-time. Model MBeans read an XML description file at run-time describes the management capabilities of a Java resource in terms of management attributes and operations. That Java resource must be registered within the Model MBean; therefore, when a model MBean receives an invocation of a particular management operation from the MBean server, it redirects the invocation to the registered Java resource.

810

EJB Container

EJB Container

Figure 1. Architecture of JMX Management to instrument EJB-Based application

A Model MBean is instantiated within the JMX MBean Server for each EJB that has to be managed. For each Model MBean an XML management descriptor is created for describing management information of the remote EJB. As well, a local reference to the Remote Interface of the corresponding EJB is registered into the Model MBean. Thus, each time a management operation is invoked on the Model MBean, a remote management operation is transparently invoked on the remote EJB. Using this approach avoids developing customized MBeans for each EJB. The JMX-based agent only has to instantiate the appropriate number of Model MBeans, locate the remote EJBs, obtain the references to their remote interfaces and register them within the Model MBeans [11]. Second, for collecting notifications generated by the EJBs, JMX specifies a complete notification model that includes the definition of Notifications, Notifications Filters, Notifications Broadcasters and Notification Listeners. This notification model only covers the transmission of events between MBeans within the same JMX agent and does not cover the problem of collecting notifications submitted by remote EJBs.

811

In order to solve this limitation, the JMX-based agent for Network Element Management application includes a "Notification Server" that receives JMX notifications remotely using RMI-UOP distributed computing capabilities from the managed EJBs. This RMI-UOP "Notification Server" implements the Notification Broadcaster" JMX interface so that different "Notification Listeners" can be registered in the "Notification Server" in order to receive management information they are interested in. The "Notification Server" registers itself at the JNDI server used by different EJB containers with a predefined name that is known by all the EJBs. . Figure 1 shows architecture of the Network Element Management application with extended application management capabilities applied to each EJB. 2.2 Management Instrumentation There are two requirements that should be considered when deciding what management instrumentation scheme to apply to the EJBs of the Network Element Management application. First, The instrumentation approach adopted should be generic enough so that it might be applied not only to the Network Element Management application but also to any other EJBbased application. This requirement implied that the instrumentation approach should be flexible enough to be able to obtain management information of different types. Second, the instrumentation should be as transparent as possible to the developers of the EJBs [11]. EJB-based applications have characteristics that have to be taken into account when thinking of instrumentation possibilities. After EJB-based applications have been developed, they have to be deployed over the EJB container of J2EE server [11]. This deployment step implies that the EJB container generates all the necessary stubs and skeletons for RMI-UOP communications as well as a wrapper for each EJB (as shown in figure 1, EJB Wrapper) containing the support needed for transactional, security, persistence management related issues. This naturally implies that wrapping approach is most appropriate for instrumentation of EJB components. In this case, the wrappers for each EJB have to be coded manually. Using Java class inheritance, the management wrappers (as shown in figure 1, Mgmt Wrapper) can be integrated with the functional code of the application such as EJB-Alarm Management subsystem and EJB-Topology Management subsystem without modifying the source code of these subsystems.

812

3

Conclusions and Future Related Work

In this paper, the conceptual architecture of a Network Management Element system based on EJB architecture is analyzed. In the future, EJBbased management systems can be analyzed in the area of the Wireless Network management, Alarm Management, Event Correlation, Policy Administration and Quality of Service Enforcement [8]. Furthermore, a conceptual architecture for JMX-based management system for the management of EJB middleware components is proposed. A wrapping technique is used to provide application manageability of EJB components. This technique is introductory and assumed to be performed manually by developers. In the future, the wrappers can be generated automatically by enhancing EJB deployment descriptions with information about the characteristics of the management wrappers and then enhancing J2EE servers to automatically generate management wrappers. In the future, interoperability between the JMX management solutions and existing management applications through other standard protocols such as SNMP, TMN, CORBA/HOP, and CM/WBEM can be investigated. References 1. Gottfried Ludere, Hosson Ku, Baranitharan Subbiah and Anad Narayanan, "Network Management Agents Supported by a Java Environment," International Symposium on Integrated Network Management (ISINM '97). 2. Matjaz B. Juric, Ivan Rozman, Simon Nash "Java 2 Distributed Object Middleware Performance Analysis and Optimization", ACM SIGPLAN Notices, August 2000, No. 8, pages 31-40. 3. Jae-Young Kim, Hong-Taek Ju, J. Won-Ki Hong, Seong-Beom Kim and Chan-Kyu Hwang, "Towards TMN-based Integrated Network Management Using CORBA and Java Technologies," Special Issue on New Paradigms in Network Management of IEICE Transactions on Communications, Vol. E82-B, No. 11, November, 1999, pp. 17291741. (SCI). 4. J. Case, M.Fedor, M. Schoffstall and C.Davin, "The Simple Network Management Protocol (SNMP)," RFC 1157, May 1990. 5. ITU-T Recommendation X.711, "Common Management Information Protocol (CMIP)," Specification, 1991. 6. Jorge E.Lopez.de.Vegraara, Victor A. Villagra, Juan I. Asensio, Jose I.Moreno, Julio J.Berrocal "Management of E-Commerce Brokerage

813

7. 8.

9.

10.

11.

Services," Department of Telematic systems Engineering, Technical University of Madrid (DIT-UPM), Spain. Heather Kreger, "Java Management Extensions for application management," IBM systems Journal, Vol 40, No 1, 2001. Sun Microsystems, Inc. "Telecom Network Management With Enterprise JavaBeans (EJB) Technology," Technical White Paper, Sun Microsystems, May 2001. Sun Microsystems, Inc. "Dynamic Management for the Service Age, Java Management Extensions White Paper, Sun Microsystems, June 1999. Sun Microsystems, Inc. "Java Management Extensions Instrumentation and Agent Specification," technical report, Sun Microsystems, July 2000. Jorge E.Lopez.de.Vegraara, Victor A. Villagra, Juan I. Asensio, Jose I.Moreno, Julio J.Berrocal "Experience in the management of an EJBbased E-commerce application," Department of Telematic systems Engineering, Technical University of Madrid (DIT-UPM), Spain.

ON APPLICATIONS OF DATA MINING TO HUMAN RESOURCES DATA V. KAMALESH AND V. KURALMANI Institute of High Performance Computing 1 Science Park Road, #01-01 The Capricorn, Science Park II Singapore 117 528 Tel: 64191551, 64191442 Fax: 64191230 Email: {vkamal, manivk)@ihpc.a-star.edu.sg This paper gives a brief introduction to data mining and its applications to human resources data. Three well-known data mining technique namely predictive modeling, association rule mining and clustering are suggested for extracting useful knowledge from the HR database. These techniques can be used to analyze attrition, performance, compensation, demographics, skill development, training, career management, retention and so on. Key Words Data mining, association rule mining, clustering, predictive modeling, knowledge discovery and human resources data.

1

Introduction

Data mining is an advanced method and process of exploring and extracting information from large databases to reveal hidden patterns, trends and correlations. The automated, prospective analyses offered by data mining move beyond the analyses of past events provided by retrospective tools. Data mining answers business questions that traditionally were too time-consuming to resolve. Data mining tools scour databases for hidden patterns, finding predictive information that expert may miss because it lies outside their expectations. Such information enables organisations to solve business problems and make strategic decisions. Data mining is being applied to all sectors of the industry, in particular, banking, insurance, transportation, telecommunications, retail, and hospitality. It is also increasingly becoming an important tool in the field of healthcare, genomes and drug discovery. One may refer to Adriaans (1996), Fayyad et. al. (1996), Deslesie and Croes (2000) and Braha (2001) for specific uses of data mining. Application of data mining to human resources (HR) data has not received much attention in the past and is therefore the motivation for this paper. HR data in any business is an excellent source of information that could be tapped easily to profile their employees and extract useful knowledge from masses of unleveraged data. In general, each employee has a complete human resources profile with many attributes. These attributes include all the standard human resources descriptions including date of hire, job grade, salary, review dates, review outcomes, vacation entitlement, organization, education, address, insurance plan and many others. Analyzing these data requires an enormous statistical effort. As it may not be feasible in reality, this paper suggests certain data mining methodologies and procedures that can be used for HR data in any industry to identify the patterns and trends that exist in the database. For example, the traditional question is: "How many employees left in last six months compared to same period last year?" Whereas the data mining question could be: "Is there any correlation between specific employee's competencies and project success measured in revenue terms?" or "Do teams with particular competency sets show greater ability to succeed than other teams?" Efficient data mining normally relies extensively on the data preparation. Thus, one must have a thorough data preparation, which has a few stages. Some of the main stages are: (i) data integration, where multiple, heterogeneous data sources may be integrated into one (ii) data selection, where data relevant to the analysis task are retrieved from the database (Hi) data cleaning, which handles noisy, erroneous, missing, or irrelevant data (iv) data transformation, where data are transformed or consolidated into forms appropriate for mining by performing summary or aggregation operations

814

815 2

Mining the HR Data

Generally, data mining tasks can be classified into two categories: descriptive data mining and predictive data mining. The former describes the data set in a concise and summary manner and presents interesting general properties of the data whereas the latter constructs one or a set of models, performs inference on the available set of data, and attempts to predict the behavior of new data sets. Clustering could be considered as descriptive data mining whereas predictive modeling and association rule mining are considered as predictive data mining. (i) Clustering: Clustering analysis is used to identify clusters embedded in the data, where a cluster is a collection of data objects that are similar to one another. Similarity can be expressed by distance functions, specified by users or experts. A good clustering method produces high quality clusters to ensure that the inter-cluster similarity is low while the intra-cluster similarity is high. This means one may able to cluster the employee's database based on certain specified criteria. For instance, (a) To study the profiles of high performing and low performing employees. Performance can further be easily analyzed. (b) To identify which cluster will make the best employee under specific conditions, to focus on a particular cluster in order to fully evaluate and understand them (c) Able to discover which particular groups of employee, for example, are surprisingly effective? (ii) Prediction: Prediction in general refers to either classification or regression. Classification analyses a set of training data (i.e., a set of objects whose class label is known) and constructs a model for each class based on the features in the data. A decision tree or a set of classification rules is generated by such a classification process, which can be used for better understanding of each class in the database and for classification of future data. Regression function predicts the possible values of some missing data or the value distribution of certain attributes in a set of objects. It involves the finding of the set of attributes relevant to the attribute of interest (e.g., by some statistical analysis) and predicting the value distribution based on the set of data similar to the selected objects. Usually, regression analysis, generalised linear model, correlation analysis and decision trees are useful tools in quality prediction. Genetic algorithms and neural network models are also popularly used in prediction. However, in respect of HR data, both these techniques could effectively be used appropriately. For instance, (a) An employee can be classified as a good performing or poorly performing employee, (classification) (b) Whether an employee would stay with the company for long or leave? (classification) (c) How long an employee can possibly stay with the company? (regression) (d) What is the loss because of his leaving? (regression) (e) Whether a particular staff will be successful in a new project? (classification) (f) Staffing for a project or for training (classification) (iii) Association: One of the reasons behind maintaining any database is to enable the user to find interesting patterns and trends in the data. For example, in a supermarket, the user can figure out which items are being sold most frequently. But this is not the only type of trend that one can possibly think of. The goal of database mining is to automate this process of finding interesting patterns and trends in any environment. Once this information is available, we can perhaps get rid of the original database. The output of the data mining process should be a summary of the database. This goal is difficult to achieve due to the vagueness associated with the term 'interesting'. The solution is to define various types of trends and to look for only those trends in the database. One such type constitutes the association rule. This analysis is quite useful in the HR environment to unearth the patterns that are associated therein. For instance, this gives: (a) Whether an employee can achieve certain designation based on other merits. (b) What are the characteristics that are associated to achieve a particular grade of work? (c) Whether an employee has any diseases because of his other medical conditions? (d) Rules that implies certain association relationships among a set of attributes

816 3

Data and Transformation

Most of the HR data can be classified into four major categories such as personal, medical, educational and service data. For example, (il Personal Information: Employee ID, Name, Date of birth, Gender, Race, Religion, Marital status, etc. (iil Medical Information: Physical characteristics, Height, Weight, BMI, Hearing, Eyesight, Mental State, Principal disease, Additional diseases, etc. (iii) Educational Information: Academic qualifications, Title of degrees, Subject majors, Year of completion, Extra curricular activities, etc. (iv) Service Information (of previous & current jobs): Date of joining, Date of confirmation, Number of years of experience, Number of previous appointments, Division, Designation, Occupation, Salary, etc. Transformations of data are also very important for the data mining process. For example, (a) Aggregating multilevel categorical variables into fewer categories based on similarity between categories [E.g. Educational degrees, subjects, etc]. (b) Disaggregating multivalued categorical variables to a set of binary variables that represent the presence (or absence) of each of the possible values for the original variable [E.g. Disease, Extra curricular activity] (c) Binning numerical data into categories (such as high, medium, and low), particularly if the data is 'soft' (not measured with much precision). Binning can be a convenient way to handle difference in magnitude or skew in numerical data [E.g. Salary, Height, etc]. Other than the above, certain new variables should be derived if necessary on the available attributes. Similarly, outliers should be taken care of. Outliers or spurious values should be identified or otherwise it may lead to bias in the estimation. In addition, in the case of association analysis, the most common obstacle in performing a good association analysis is the presence of low-support variables. There are two ways to deal with this problem. One way is to create a support threshold. Any combination that has a support below a certain percentage will be dropped from the analysis. Unfortunately, the support threshold method has a major disadvantage that it eliminates some potentially valuable data from the consideration. This brings us to the best way to deal with the low-support variables is the creation of a taxonomy. Taxonomy is an orderly hierarchy of variables and variable categories, dividing things such that each variable put into the association analysis occurs with a similar level of support. This is done by aggregating low-support variables into groups (for example, combine all religions who do have less frequencies) and then analysing them as a group, while breaking down the high-support variables into the smaller units (for example, race can be sub-divided into the smaller groups by their linguistics). This will eliminate the uneven support in the analysis thereby to ensure the production of association rules whose confidence and lift can be meaningfully used for comparison. Attention should also be paid while mining the medical data. Understanding medical data in the HR environment can be particularly a very important aspect, especially when it pertains to the disposition and overall health of the employees. As a matter of fact, a high percent of the illnesses and incapacitation may result not from physical injuries received but from infection, and other disease. Planning for treatment of these conditions ahead of time can facilitate medical response and treatment during critical situations. For example, we may discover a relatively high incidence of chicken pox among young employees between the ages of 17 and 19. As we may know, chicken pox in adults can be quite a serious health matter and identification of problematic subgroups within a group of population can facilitate the establishment of policies and procedures aimed at minimising this health threat.

817 4

Limitations

Data mining depends to a large extent on the availability of the relevant attributes and on the accuracy of the data. It requires exponentially more computational effort as the problem size grows. Efficient analyses require familiarity with the domain knowledge of the attributes and their relationships. Sometimes presented results may be actually due to the success of previous strategic decisions; therefore one should pay more attention while interpreting the results. 5

Summary

Data mining on HR can easily be implemented in any large or medium scale organisations like Banking, Ports, Airlines, Police Force, Civil Defence, Military, Public Service and so on. Usefulness of data mining mainly lies in the areas of recruiting and retention, staffing, training, employee evaluation, payment, compensation, time and attendance, safety, fraud, claiming and so on. The ultimate aim of this application is to save the time and cost, and also to improve the organisational effectiveness. As mentioned in Stewart et. al. (2000), 'knowledge is a corporate asset that shall be managed'. Does data mining do this job more effectively? Of course, Yes. References Adriaans, P. and D. Zantinge, Data mining, Harlow, Addison-Wesley, 1996. Braha, D., Data mining for design and manufacturing-methods and applications, Kluwer academic publishers, 2001. Deslesie, L. and C. Croes, "Operations research and knowledge discovery: a data mining method applied to health care management", International transactions in operational research, 7, 159170, 2000. Fayyad, U. D. Madigan, G. Piatetsky-Shapiro and P. Smyth, "From data mining to knowledge discovery in databases", AI Magazine, 17, 37-54, 1996. Stewart, K. R. Baskerville, V. C. Storey, J. A. Senn, A. Raven and C. Long, "Confronting the assumptions underlying the management: An agenda for understanding and investigating knowledge management", The Data Base for Advances in Information Systems, 31, 41-50, 2000. Yiming, M., B. Liu, C. K. Wong, P. S. Yu, and S. M. Lee, "Targeting the right students using data mining", ACM 2000, 457-464.

LINGUISTIC RULE EXTRACTION B Y GA COMBINING D D R AND RBF NEURAL NETWORKS

Institute

XIUJU FU of High Performance Computing Science Park II Singapore E-mail: [email protected]

LIPO WANG School of EEE, Nanyang Technological University, E-mail: [email protected]

Singapore,

117528

639798

We propose a novel method to extract rules by using the data dimensionality reduction (DDR) technique combining genetic algorithms (GA) based on radial basis function (RBF) neural networks. Firstly, data are preprocessed by removing irrelevant or redundant attributes based on attribute ranking results according to the separability-correlation measure (SCM). Secondly, preprocessed data are classified by an RBF classifier. Initial conditions of the premises of rules are obtained based on trained RBF classifiers. The interval for each attribute in the condition part of each rule is tuned by GA. The fitness of a chromosome is determined by the accuracy of the extracted rules. Our method leads to rules with hyper-rectangular decision boundaries directly without the need for an intermediate step to transform continuous attributes into discrete ones, unlike some existing methods based on the multilayer perceptron (MLP). Simulations demonstrate that our approach results in more accurate and concise rules compared to some other related methods.

1

Introduction

As an important tool of data mining, neural networks are widely used in data mining tasks, such as classification, prediction, and estimation, etc.. However, the black-box curse on neural networks impedes data miners' understanding on data concepts. Extracting rules from data sets has attracted more attention in recent years because it helps people break the black-box curse of neural networks and reveal data concepts. In rule extraction task, compact rules with high accuracy are desirable. In order to extract compact rules, we can reduce the number of inputs or simplify the architecture of neural networks. Data dimensionality reduction (DDR) algorithms have widely explored as the data preprocessing in data mining tasks for removing irrelevant or redundant attributes in data. In this paper, we use a novel separability-correlation measure (SCM) 4 for determining the importance of the original attributes. Then different attribute subsets according to ranking attribute queue are input to RBF neural networks to select the best feature subset which leads to lowest classification error rates and the smallest feature subset size. Then rules are extracted based on selected feature subsets. Due to its explicit form and perceptibility, hyper-rectangular decision boundary is often employed in rule extraction, such as rules extracted from the MLPs 167 and from RBF neural networks 8 - 1 0 . In order to obtain symbolic rules with hyper-rectangular decision boundaries, a special interpretable MLP (IMLP) was constructed in l . In an IMLP network, each hidden neuron receives a connection from only one input unit, and the activation function used for the first hidden layer

818

819 neurons is the threshold function. In 7 , the range of each input attribute was divided into intervals. The attribute was then encoded as a binary string accordingly. Rules with hyper-rectangular decision boundaries were thus obtained. In 3 , we proposed to extract rules using GA from RBF neural networks. However, irrelevant or redundant attributes may include in rule sets. The paper is organized as follows. In Section 2, the SCM measure is briefly reviewed first. Then feature subsets are extracted based on the SCM measure by removing irrelevant or redundant attributes. We introduce our rule extraction method in Section 3. The interval of each attribute in the premise part of each rule is encoded into a GA chromosome. Experimental results are shown in Section 4. Finally, Section 5 presents the conclusions of this paper. 2

Separability-Correlation Measure

Class separability and the correlation between attributes and class labels are used to measure the importance of each attribute. The probability of correct classification is large, when the distances between different classes are large. Therefore, the subset of features which can maximize the separability between classes is a desirable objective of feature selection. Class separability may be measured by the intraclass distance Sw and the interclass distance Si, 2. The greater St, is and the smaller Sw is, the better the separability of the data set is. The ratio of Sw and St is calculated and is used to measure the separability of the classes: the smaller the ratio, the better the separability. If omitting attribute fci from the data set leads to less class separability, i.e., a greater Sw/Sb, compared to the case where attribute &2 is removed, attribute k\ is more important for classification of the data set than attribute k-i, and vice versa. Hence the importance of the attributes can be ranked by computing the intraclass-to-interclass distance ratio with each attribute omitted in turn. In addition, we propose to use the correlation C/t between the changes in attributes and their corresponding changes in class labels as another indication of importance of attribute k in classifying the patterns. Hence we propose a separability-correlation measure (SCM) 4 Rk as the sum of the class separability measure Jwk /St,k and the correlation measure Cfc (k refers to the fc-th attribute), where Swk and Sbk are intraclass and interclass distances calculated with the fc-th attribute omitted from each pattern, respectively. The importance level of attributes is ranked through the value of Rk- The greater the magnitude of Rk, the more important the fc-th attribute. Based on attribute ranking results, n (n is the total number of attributes) subsets of attributes are input to RBF classifiers respectively. As the number of attributes used increases the validation error first decreases, reaches minimum when a certain attribute subset is used, and then increases. The attribute subset is selected. 3

Rule Extraction

After RBF neural networks are trained based on selected feature subsets, rules are extracted by using GA.

820 3.1

Encoding Rule Premises Using GA

Uji and Lji are the upper limit and the lower limit of interval j in rule i, respectively. Uji and Lji are set according to the trained RBF classifier. Initially, Uji is randomly generated according to the uniform distribution within range [/i.^,1], and Lji is randomly generated according to the uniform distribution within range {0,fJ.ji\We encode real value pji (p = U, L) using k binary bits: G(v)

ji

= {9k,9k-i,-9i,-,92,9i},

The relationship between pji and G^ji Pii

where B^ji

9i = 0 , l , i = 1,2,..., k

(1)

is as follows:

= f # > / ( 2 * - 1)

is the decimal value corresponding to B^ji=gk*2k-1+gk-1*2k-2

.

,

(2) G^ji:

+ ...+g2*21+g1*2°

.

(3)

A chromosome in the population pool can be represented as a one dimensional binary string: (GW11,G^lll...)GWBl)GW„i,...,G(u)lm,GWlm,...,G(l/)„m)GWBm)

3.2

. (4)

Crossover and Mutation

The roulette wheel selection is used to select chromosomes in each generation. Each chromosome in the population pool corresponds to a rule set. The accuracy of the rule set is calculated to evaluate the fitness level of each chromosome. Two-point crossover is used in our algorithm. The probability of crossover is usually around 80%. Mutation can prevent the fixation at some particular loci. The mutation rate is 1/1000. "Elitism" is used to retain the best members in the population pool. 4

Experimental Results

Iris and Thyroid data sets are used for testing our algorithm. There are 4 attributes in Iris data set, and 5 attributes in Thyroid data set. For Iris and Thyroid data sets, attribute ranking queues obtained according to the SCM are {4,3,1,2} and {2,3,5,4,1}, respectively. For Thyroid data set, the classification error rates of its RBF classifiers input different subsets of attributes in the order of importance are calculated. As the number of attributes used increases the validation error first decreases, reaches minimum when attributes 2, 3 and 5 are used, and then increases. Hence all the other attributes are considered unimportant for data concept and then removed, which decreases the classification error rate from 0.0465 to 0.0233, the number of inputs from 5 to 3. For Iris data set,attribute 1 and 2 are removed from the data set, which decreases the classification error rate from 0.0467 to 0.0333, the number of inputs from 4 to 2. Based on the rule-extraction algorithm described in Section 3, there are 3 rules (each rule is only with 2 premises) extracted for Iris data set. The accuracy of

821 obtained rules is 97.33%, which is the same with the result in 3 . However, the rule set in this paper is more compact since there is only 2 premises in each rule. There are 5 rules with accuracy 88% in the rule set of Thyroid data set, whose rule accuracy is higher than the result (85%) in 3 , and with smaller number of premises in rules. 5

Conclusion

In this paper, we propose to extract rules from RBF neural networks using G A based on the SCM method. Irrelevant attributes are deleted from the original attribute set according to the ranking results from SCM. Genetic algorithm is used for tuning the premises of rules. Experimental results show that the method proposed is effective in reducing the size of data sets, reducing the number of rules, and improving the accuracy of rules. References 1. G. Bologna and C. Pellegrini, "Constraining the MLP power of expression to facilitate symbolic rule extraction", Proc. IEEE World Congress on Computational Intelligence, Vol. 1, pp. 146-151, 1998. 2. P. A. Devijver and J. Kittler, Pattern recognition: a statistical approach, Prentice-Hall International, Inc. London, 1982. 3. X. J. Fu; L. P. Wang, "Rule extraction by genetic algorithms based on a simplified RBF neural network", Proceedings of the 2001 Congress on Evolutionary Computation, pp. 753 -758, vol. 2, 2001. 4. X. J. Fu; L. P. Wang, "Rule extraction using a novel gradient-based method and data dimensionality reduction Neural Networks", Proceedings of the 2002 International Joint Conference on Neural Networks, vol.2, pp. 1275 -1280. 5. E. R. Hruschka and N. F. F. Ebecken, "Rule extraction from neural networks: modified RX algorithm", Proc. International Joint Conference on Neural Networks, Vol. 4, pp. 2504-2508, 1999. 6. H. Ishibuchi and M. Nii, "Generating fuzzy if-then rules from trained neural networks: linguistic analysis of neural networks", IEEE International Conference on Neural Networks, Vol. 2, pp. 1133-1138, 1996. 7. H. J. Lu, R. Setiono and H. Liu, "Effective data mining using neural networks", IEEE Transactions on Knowledge and Data Engineering, Vol. 8 6, Dec. 1996, pp. 957 -961. 8. K. J. McGarry, S. Wermter, and J. Maclntyre, "Knowledge extraction from radial basis function networks and multilayer perceptrons", Proc. International Joint Conference on Neural Networks, Vol. 4, pp. 2494-2497, 1999. 9. K. J. McGarry, J. Tait, S. Wermter, and J. Maclntyre, "Rule-extraction from radial basis function networks", Proc. Ninth International Conference on Artificial Neural Networks, Vol. 2, pp. 613-618, 1999. 10. K. J. McGarry and J. Maclntyre, "Knowledge extraction and insertion from radial basis function networks", IEE Colloquium on Applied Statistical Pattern Recognition (Ref. No. 1999/063), pp. 15/1-15/6, 1999.

WEB-BASED CONFIGURATION AND CONTROL OF HLA-BASED DISTRIBUTED SIMULATIONS NIRUPAM JULKA, DAN CHEN, BOON PING GAN Production and Logistics Planning Group, Singapore Institute of Manufacturing Nanyang Drive, SINGAPORE 638075

Technology,

71

STEPHEN JOHN TURNER, WENTONG CAI School of Computer Engineering, Nanyang Technological

University, SINGAPORE

639798

The need for better understanding, control and optimization of supply chains is being recognized more than ever in the new economy. Simulation of a supply chain holds a great potential in providing the much needed visibility and control of the supply chain operations. The High Level Architecture (HLA), an IEEE standard for interoperable simulations, enables distributed simulations of this type. This paper discusses a framework to enable configuration and control of HLA-based distributed simulations through a web interface. The implementation of a prototype based on the framework is also presented.

1

Introduction

Simulation has become one of the standard technologies used for operational optimization by organizations. In the last decade enterprises have strived to unlock benefits from their supply chains using various cutting edge simulation technologies. Lack of visibility and details of execution of their upstream and downstream partners have meant the studies performed for operational improvement by these organizations have not given accurate results. Important reasons for this include the secrecy maintained regarding execution rules in a company, ever changing business environment and correspondingly changing operational condition, and the inherent geographical distribution of various nodes. The need for secure supply chain simulation in a distributed environment is felt more than ever. Web-based modeling and simulation obviously provide a solution to the above problem. But a close examination of the earlier work on the topic would reveal that their objectives differ from those at hand and hence their solutions are not directly applicable. The solutions provided include exposing the model building and execution facility through a web interface [7] and providing simulation services to customers through an Application Service Provider (ASP) model [5,8]. Kuljis and Paul [4] provide a critical analysis of work in web-based simulation modeling and execution. The High Level Architecture (HLA) is an IEEE standard (1516-2000) for interoperability of distributed simulations. This holds a great potential for supply chain study and optimization through simulation. A group of companies (supply chain partners) can build HLA compliant simulation models (federates) and can jointly perform supply chain experiments through distributed simulations communicating through the web. 2

Motivation

In our earlier work we proposed alternative structures for HLA-based simulations [1,2] for federation communities. These structures include Hierarchical HLA (HHLA) and Setbased HLA (SHLA). The motivation for these alternative topologies is to enable selective

822

823

information shielding between different entities in the distributed simulation. The user thus has more flexibility to run configured simulations in a distributed environment with higher control upon information sharing with other partners. The facility to configure and control simulations of various scenarios using the above mentioned topologies as well as a flat HLA is exposed in this paper. The framework allows users to register, authenticate, configure, modify, invoke and terminate supply chain simulations though a web browser. The choice of Java as the platform for the implementation of this framework allows portability, complements the interoperability promised by HLA and provides a facility to critically compare supply chain scenarios by configuring, executing and analyzing corresponding distributed simulations through a web interface. The overall motivation is to enable companies to provide the use of simulation models of their enterprises to their partners in a manner similar to web-services. Each of the partners can then run customized supply chain scenarios in a grid (distributed computing) fashion. The framework is not specific to supply chains and can be ported to other domains with minimal effort. This provision of secure simulation over the internet bridges the gap between in-house simulations and collaborative simulations between trading partners based on legacy networks.

3

Overall Framework

The framework for web-based control of simulations is illustrated in Figure 1. There are two components in the software, the authentication server component and the company server component. The authentication server component (AS) has five modules handling various different services provided by the server and the company server component (CS) has two modules handling the services provided by the company. The prototype developed based on this framework uses the Model-View-Controller (MVC) architecture with beans providing the Model, Java Server Pages (JSPs) providing the View and servlets providing the Control. All information pertaining to the various entities in the simulation are stored in a central database on the AS. The specific modules, services offered by them and implementation details are as follows.

824

3.1

Federate Information and Management (FIM)

The Federate Information and Management (FIM) module provides the user with the facility to modify information regarding the various federates hosted at CS in the database on the AS. The right to modify each piece of information is linked to the role of the user (AS Administrator, CS Administrator, User, Guest). Signatures of the executable federate programs which are hosted on the various CS are also stored in the same database. These signatures are specific to the executable code of a federate. Upon modification of any federate code, the owner of the federate needs to log the changes made to the federate code and apply for a new signature through FIM. The information regarding the configurable parameters of the federates is also managed through the FIM. These parameters are used to configure different scenarios in a supply chain simulation. FIM is also used for the account management of the various users. 3.2

Authentication Module (AM)

There are three types of authentication in the framework . These are user authentication, company authentication and federate authentication. The need for federate authentication stems from the fact that all changes made by an organization to their federates need to be transparent and clear. Also unauthorized or rogue federates should not be permitted to run in the federation community. This module is available on the AS (AM-S) and the CS (AM-C). Upon invocation of a simulation the AM-S queries the AM-C to send the signature of the federate. This signature is compared with the one in the database and the federate is authenticated. 3.3

Simulation Configuration Module (SCM)

The user is able to configure a supply chain simulation through a series of JSPs displaying the available federates and the configurable parameters. The user may choose to configure a fresh scenario, browse through stored scenarios to select an appropriate scenario or modify an earlier configured scenario. The federation community structure is generated by SCM based on the information sharing requirements. Details of this will be made available in subsequent publications. 3.4

Invocation and Termination Module (IM)

This module enables invocation of the federates and gateways based on the user defined configuration, and the termination of the entire simulation. The federates provided by a particular company and the supporting federation execution run at CS. The gateways running at the AS link the user federations together to create a federation community. Once the user completes the configuration of the simulation, the IM on the AS (IM-S) sends invocation signals which include all configuration information to the respective IM modules on the CS (IM-C). The invocation signals are sent through the AM-S and AM-C. The initiation of the termination process is done by a 'Terminator' federate. The user invokes this federate when he wishes to terminate a particular simulation. The federate then joins all the higher level federations and sends out a termination message (modeled as an interaction) to all the gateways. Any federate that receives the interaction forwards the interaction to the lower level federations, resigns from all federations it is part of and then attempts to destroy them. The execution of the federation community is thus terminated by all federates resigning and all federations destroyed.

825

3.5

Simulation Information Module (SIM)

This module provides users and guests with the information on simulations of the various scenarios running in the distributed environment including their configurations and results. It also allows the user to view previously run scenarios and information pertaining to it. This module can also be connected to a visualization tool to observe an ongoing simulation [3]. 4

Conclusions and Future Work

A solution for management and configuration of distributed simulations has been presented in this paper. With more and more adoption of HLA in the civil domain [6] the need for such tools will surely increase. Future work in the field includes development of appropriate tools for configuration of various input files for remote models, gathering of data from the different hierarchical levels, and analysis of simulations involving hierarchical federation communities. Adoption of industrial standards for data exchange like XML and information exchange like RosettaNet into simulation systems involving federation communities is also under investigation. Future work is also planned in development of strategies and policies to enable such collaboration between supply chain partners. This is necessary to smoothen the transition of this novel methodology of decision-making along with the enabling technology into the industrial domain of supply chain management. REFERENCES 1. Cai, W., S.J. Turner, and B.P. Gan. 2001. Hierarchical Federations: An Architecture for Information Hiding. In Proceedings of the 15th International Workshop on Parallel and Distributed Simulation, pp. 67-74. 2. Gan, B. P., D. Chen, N. Julka, S. J. Turner, W. Cai and G. Li. Benchmarking Alternative Topologies for Multi-Level Federations. Submitted to Simulation Interoperability Workshop (SIW) 2003. 3. Julka, N., B.P. Gan, D. Chen, P. Lendermann, L. F. McGinnis and J. P. McGinnis. Framework for Distributed Supply Chain Simulation: Application as a DecisionMaking Tool for the Semiconductors Industry. In Proceedings of the International Conference on Modeling and Analysis of Semiconductor Manufacturing (MASM) 2002, pp. 376-381. 4. Kujlis, J. and R. J. Paul. A Review of Web Based Simulation: Whither We Wander? In Proceedings of the 2000 Winter Simulation Conference, pp. 1872-1881. 5. Marr, C , C. Storey, W. E. Biles, J. P. C. Kleijnen. A Java-based Simulation Manager for Web-based Simulation. In Proceedings of the 2000 Winter Simulation Conference, pp. 1815-1822. 6. StraBburger, S. Distributed Simulation Based on the High Level Architecture in Civilian Application Domains. Doctoral Dissertation 2001. Otto-von-Guericke University, Magdeburg, Germany. 7. Weidemann, T. VisualSLX - An Open User Shell for High-Performance Modeling and Simulation. In Proceedings of the 2000 Winter Simulation Conference, pp. 18651871. 8. Weidemann, T. Simulation Application Service Providing (SIM-ASP). In Proceedings of the 2001 Winter Simulation Conference, pp. 623-628.

COMPETING RISKS WITH CENSORED DATA: A SIMULATION STUDY ling Lukman1 , Noor Akma Ibrahim3, Fauziah Maarof2, Isa Bin Daud2, Mohd Nasir Hassan1

'Dept. of Environmental Science, Dept. of Mathematics Faculty of Science and Environmental Studies 3

Institute for Mathematical

Research,

Universiti Putra Malaysia Serdang Selangor D.E. E-mail:iingl@yahoo.

com

In this simulation study, two regression models for competing risks with censored data are compared. The first method is the conventional Cox's proportional hazard regression model. The second method is based on Cox's model using a duplicated data technique of Lunn and McNeil. Both methods can be used for any number of different failure types assuming the risks are independent. In this study sample of various sizes and censoring proportion are generated and fitted using both models. The joint estimation of parameters obtained from the data duplication approach has an advantage over the traditional Cox's model thus provide new insight into the analysis of survival data.

1 Introduction Competing risks is a survival model dealing with more than one possible causes of death or failure to a subject in the population [3]. Cox regression is a useful method for survival or failure data analysis [8], Lunn and McNeil [6] analyzed the competing risks in the survival model using Cox proportional hazards model with censored data as an effort to overcome the complexity in the comparison of parameter estimates corresponding to different failure types. Kalbfleisch and Prentice [4] method for competing risks, involved fitting models separately for each type of failure, treating other failure types as censored. We will propose that the modified LunnMcNeil based on duplicated data technique of Lunn and McNeil [6] can work properly. 2 Modification of Lunn-McNeil Method Data Handling. The assumption and data entries are the same as Lunn and McNeil [6]. Suppose that types I and II are given by 8 and 1-8 where 8= 0 or 1. If subject i fails at time t, and the first failure type Si (or 1- 5j) then the second failure type is 1- 8i (or 8;). By providing a column for the second failure type, two entries are made as follows

826

827

Table 1. Covariates

Type Subj Failure time Status 1st Fail

2nd Fail

i t ;

1-5;

i(rep)

ti

1

5i

0

l-8j

8;

1st Fail

2nd Fail

Xi,8iXi

x; , ( 1 - 8 ;

)XJ

Xi,(l-8i)xi xj,

8j Xj

The augmentation of the second failure type in Table 1 is useful for seeking the joint estimation of parameters. Modification of Method. Within the competing risks framework Kay [5] says, for a patient with covariate values x 2 , . . . , xp the estimated hazards for cause-specific j is as follows A,j(t;x)=?i0j(t)(exp(Pj1x1))(exp(pj2X2+

+PjPxP) , j = l,2,...,m

(1)

where xi is the binary treatment indicator and x 2 , . . , xp are the background covariates. The parameter P's are estimated separately for each failure type j by considering all failures of type other than j as censored. The regression model introduced by Cox [1] specifies the hazards rate A,j (t; x) for every individual in terms of a vector of covariates x; and its vector of regression parameters p, then (1) can be then written as follows p

Aj(t;xi) = A,*(t)exp I p r x j r

(2)

r=l

Contribution of (2) to the partial likelihood corresponding to the ith risk set is: p p expZPr Xj r=l

r

/ Z exp jeR

E Pr Xj r=l

(3)

r

By introducing censoring indicator Q; [7] where Q; =1 if t; is failure and Q.i =0 if tj is censored, (3) can be written as exp((Zprxjr)Qi)/ Sexp( i p r x j r ) n i r=l

jeR

r=l

Hence the required conditional log likelihood becomes

(4)

828

L((3)= £ ft i SPrXjr- In [ Eexp(ZprXj r ) ]} i=l

r=l

jeR

(5)

r=l

which is similar to the ordinary Cox regression. 3 Simulation Study The study is to compare both methods, namely cause-specific hazards ordinary Cox and cause-specific hazards based on modified Lunn and McNeil. The simulated data generated is based on [2] of the Standford Heart Transplant Data with five covariates i.e. type of failure, age, mismatch score, age by failure type, and mismatch score by failure type. Exponential distribution is used for covariates age and mismatch score, while binomial distribution is used for indicator variable related to type of failure. The data for the interaction between type of failure (8; ) and covariates (x,) such as age or mismatch score by failure type are merely the product of 8; and X; as indicated by Lunn and McNeil [6] . Censoring indicator variable is fixed complying with certain percentage of censoring that we imposed on the data set. In generating the failure times we imposed two values for X , in the first type of failure 1=1.25, the second ^=.75. This is to get the proportionality. The data generated is simulated 1000 times for every sample size together with the designated percentage of censoring. True (initial) parameter of covariates (3 i = .99; (3 2= .06605; p3=.200;p4=.0654;p 5 =.198. 4 Results Table 2. Analysis of maximum likelihood estimates of cause-specific hazard of rmse Sample size of 15 and censoring percentage of 25 are as follows Modified Lunn-McNeil method Ordinary Cox method Par

A

Mean Bias RMSE

-.008 -.038 .003 .019 -.286 1.405 .134 37.398 .318 .801 .601 36.408 -.252 -.351 1.207 -.856 -.074 -.238 -.062 -.179 .117 612.95 15.49 19.46 32.75 40.31 1.97 .104 .333 .375 Sample size of 15 and censoring percentage of 50 are as follows Ordinary Cox method Modified Lunn-McNeil method

A

A

A

A

A

A

A

A

Par

A

Mean Bias RMSE

15.586 -.831 .124 .622 -.058 -.161 .000 .049 .036 14.596 -.149 .059 -.065 -.149 -.368 -.124 -.361 -.102 222.48 3.034 10.16 .078 3.077 1.627 106.3 13.13 .859 Sample size of 45 and censoring percentage of 25 are as follows

A

A

A

A

A

A

A

A

A

A -.195 -.393 11.27

829

Modified Lunn-McNeil method

Ordinary Cox method Par

A

Mean Bias RMSE

0 2.942 .063 -2.94 -.005 .001 -.002 -.004 23.252 .005 -.198 -.070 2.744 -.927 -.068 -.204 -.065 22.262 -.061 -3.14 .22 .213 .072 47.23 1.105 .071 47.32 6.71 201.45 6.71 Sample size of 45 and censoring percentage of 50 are as follows Ordinary Cox method Modified Lunn-McNeil method

A

A

A

A

A

A

A

A

A

Par

A

A

A

A

A

A

A

A

A

A

Mean Bias RMSE

19.58 18.59 138.06

-.075 -.141 5.163

-1.883 -2.083 38.207

.073 .008 5.16

1.881 1.683 38.19

.087 -.903 1.197

-.004 -.070 .071

-.013 -.213 .23

0 -.065 .078

.004 -.194 .231

5 Conclusion In the cause-specific proportional hazards model the modified Lunn and McNeil method is better (its root mean square error smaller) than the ordinary Cox with respect to sample size and censoring percentage. References 1. Cox, D. R. (1972). "Regression Models and Life Tables (with discussion)". 7. R. Statist. Soc. B, 34, pp. 187-220. 2. Crowley, J and Hu, M. (1977). "Covariance Analysis of Heart Transplant Survival Data". J. Amer. Statist. Assoc. 72, pp.27-36. 3. David, H. A. and Moeschberger, M. L. (1978). The Theory of Competing Risks. London: Griffin. 4. Kalbfleisch, J and Prentice, R. (1980). The Statistical Analysis of Failure Time Data. New York:Wiley. 5. Kay, R. (1986). "Treatment Effects in Competing Risks Analysis of Prostate Cancer Data". Biometrics 42, pp.203-211. 6. Lunn, M. and McNeil, D. (1995). "Applying Cox Regression to Competing Risks". Biometrics 51, pp.524-532. 7 . Noor Akma Ibrahim and Isa Daud. (1995). "Estimating Parameters of Proportional Hazards Model with Censored Data Using SAS". Proceedings of the Annual SAS User's Group Malaysia , pp. 19-20. 8 . Pettitt, A. N. and Bin Daud, I. (1990). "Investigating Time Dependence in Cox's Proportional Hazards Model". Applied Statistics 39, pp.313-329.

COMBINING SUPPORT VECTOR MACHINE (SVR) WITH GENETIC ALGORITHM (GA) TO OPTIMIZE THE INITIAL POSITIONS OF AGENTS IN THE LAND COMBAT SIMULATION L.J. CAO, K.S. CHUA, W.K. CHONG, H.P.LEE AND L. QIAN Institute of High Performance Computing, 1 Science Park Road, #01-01 the Capricorn, Park II, 117528 Singapore. Email: [email protected]

Science

This paper proposes the methodology of combining support vector regressor (SVR) with genetic algorithms (GA) to optimize the initial positions of agents in the multi-agent based land combat simulation, which is a game with two teams of agents competed to capture their opponent agents' flag. Specifically, given a fixed team of agents in terms of their initial positions, the proposed methodology is used to optimize the initial positions of the other team of agents that enable the optimized agents to capture their opponent agents' flag as much as fast. Firstly, a large data set is collected by running the land combat model to record the initial positions of the team of agents to be optimized and the corresponding time difference of two teams of agents to capture their opponent agents' flag. Then, SVR is used to estimate the relationship between the initial positions and the time difference. Finally, GA is used to search for the optimal initial positions where the fitness function is evaluated based on the developed SVR. The simulation shows that the team of agents using the optimal initial positions can capture their opponent agents' flag much faster than me other team of agents without optimization. The result also demonstrates the fact that the initial positions of agents play an important role on the land combat simulation.

1

Introduction

Recently, multi-agent based simulation (MAS) which uses a bottom-up approach to model complex systems has been receiving increasing interest. The reason lies in that MAS could provide a more accurate model than the traditional techniques through the use of more number of spatial degrees of freedom. MAS also emphasizes the adaptability of the elements in the modeling so that the human decision capability can be incorporated into the model to make it more realistic, unlike the traditional methods which treat the modeling as a deterministic process. There are also fewer assumptions in MAS compared to the traditional techniques. One of the successful applications of MAS is in the area of military affairs. Based on MAS, the land combat model called irreducible semi-autonomous adaptive combat (ISAAC) is developed by Ilachinski in 1997 [1]. As illustrated in Figure 1, in ISAAC, there are two teams of agents of the same size, respectively represented by red and blue color. There are also a red flag and a blue flag in the red team of agents and the blue team of agents. The two teams of agents together with their flags are initially located at diagonally opposite corners of the two-dimensional battlefield. The goal of these agents is then competed to capture the opponent agents' flag. The team wins the combat if its agents first reach the flag. In the framework of MAS, learning and adaptation is one of the most important capabilities for agents because of the complexity and dynamic of the environments in the system. Artificial intelligence techniques appear to be the most attractive approaches. Recently, in the enhanced version of ISAAC named as EINSTein [2], GA is used to evolve the personalities of agents. That is, by giving a fixed team of agents in terms of their personalities, GA is used to optimize the personalities of the other team of agents that is "best able" to capture their opponent agents' flag, "best" is measured in terms of either the time to capture the target flag or the casualty in performing the task.

830

831 Motivated by Ilachinski's work, this paper proposes the methodology of combining SVR with GA to optimize the initial positions of agents. In all the previous land combat models, the initial positions for both teams of agents are randomly generated. In this paper, given one fixed team of agents in terms of their initial positions, the proposed methodology is used to optimize the initial position for the other team of agents which enables the team of optimized agents to capture their opponent agents' flag as quick as possible. Our simulation shows that the team of agents using the optimal initial positions can capture their opponent agents' flag much faster than the other team of agents without optimization. The result also demonstrates the fact that the initial positions of agents play an important role on the land combat simulation.

*;::. •

red agents red flag

blue agents blue flag

Figure 1. gives a pictorial illustration of the land combat simulation.

2

The proposed methodology

Data Collection

Support Vector Regressor (SVR)

Genetic Algorithm (GA) Optimizer

Run the simulation using the optimial initial positions Figure 2 A flowchart of the proposed methodology

As illustrated in Figure 2, the proposed methodology consists of three major components: a data collection component, a SVR component, and a GA optimizer component.

832

1. Data collection component: by repeatedly running the land combat model, the data collection component is used to record the initial positions of the team of agents to be optimized and the corresponding time difference of the team of agents without optimization and the team of agents to be optimized to capture their opponent agents' flag, which will be later analyzed by the SVR component. For recording the initial positions, all possible initial positions that could be occupied by agents comprise of one input vector, with the value of 1 denoting the occupancy of one agent and the value of 0 denoting the absence of one agent. 2. Support vector regressor (SVR): Based on the collected data set, SVR is used to estimate the relationship between the initial positions and the time difference of two teams of agents to capture their opponent agents' flag. Compared to other neural network regressors, there are three distinct characteristics when SVMs are used to estimate the regression function. Firstly, SVMs estimate the regression by a set of linear functions which are defined in a high dimensional space. Secondly, SVMs define the regression estimation as the problem of risk minimization where the risk is measured using Vapnik's e-insensitive loss function. Thirdly, SVMs use the risk function consisting of the empirical error and a regularization term, which is derived from the Structural Risk Minimization Principle [3]. 3.GA optimizer: The basic idea of GA to search for the optimal solution is inspired by the Darwin's evolution theory where potential solutions to a problem compete and mate with each other to produce increasingly stronger solutions [4]. The potential solutions are also called chromosomes in GA. The solutions in one generation constitute a population, and the initial population is generated randomly. A fitness function is also required to evaluate the quality of solutions, thus determining the possibility of surviving of solutions in the successive generations. Usually the solutions with larger fitness value will have higher probability of surviving than those of smaller fitness value. Then the three basic operators — selection which copies chromosomes from the current population for the next generation, crossover where parts of paired solutions are combined to create more adaptive solutions, and mutation where one or more components of solutions are randomly changed, are repeated to update the old population until a predetermined number of generations is reached or a satisfactory solution is found. The procedure of GA is outlined as below: t=0 Initialize P(t) Evaluate P(t) While (r<MAXGEN) t=t+l P(t):Se\ect P(t-l) Pf/j.-Crossover P(t) /W-Mutate P(t) Evaluate P(t) End 3

Experiment

The JACOB program developed based on RELATE architecture [5] is used in the data collection component. Each of both teams is composed of 30 agents. All the agents in one team could occupy one different position in the initial position field of 7-by-7 square. The initial position fields for the red team and the blue team are respectively located in the lower left and upper right region of the battlefield of 80-by-80 square. In the collection of data, the initial positions of the red agents are randomly generated in the first run and then

833

kept fixed in the later runs. The initial positions of the blue agents are randomly generated for each run. The JACOB program is run in the high-end computing resources of the Institute of High Performance Computing [6]. A total of 3291 data patterns are collected in the data collection component. Each data pattern consists of 50 bits, with the first 49 bits corresponding to the initial positions of blue agents and the last bit corresponding to the measurement of the time difference of red agents and blue agents to capture their opponent agents' flag in terms of seconds. The whole data set is further randomly partitioned into two smaller data sets: the training set and the testing set. The training set is used for training SVR, and the testing set is used for selecting the optimal parameters of SVR. There are a total of 2800 data patterns in the training set and 491 data patterns in the testing set. The Gaussion function is used as the kernel function of SVR. The values of 8 , C, and £ are chosen based on the testing set, which are respectively determined as 100, 1.0, 0.5. Finally, GA is used to search for the optimal initial positions for the blue agents where the fitness function is evaluated based on SVR. The choice of the initial probability as 0.5, the crossover probability as 0.1, and the mutation probability as 0.01 is because these values could produce the largest fitness value in the converged chromosome. Finally, the JACOB program is run again, where the converged chromosome with the largest fitness value is used as the initial positions of the blue agents and the initial positions of red agents are used as the same values as used in the previous experiment. The time difference of red agents and blue agents using the optimal initial position to capture their opponent agents' flag is 6.53. 4

Conclusion

This paper proposes the methodology of combining SVR with GA to optimize the initial positions of agents in the MAS based land combat simulation. The simulation shows that the team of agents using the optimal initial positions can capture their opponent agents' flag much faster than the other team of agents without optimization. The result also demonstrates the effectiveness of the proposed methodology to optimize the initial position of agents. References: 1.

2.

3. 4. 5.

6.

Ilachinski A. Enhanced /SAAC neural simulation toolkit (EINSTsin): an artificial-life laboratory for exploring self-organized emergence in land combat (U), Center for Naval Analyses, 1999. Ilachinski A. Genetic algorithm evolutions of ISAACA personalities. In Irreducible Semi-Autonomous Adaptive Combat (ISAAC): An Artificial-Life Approach to Land Combat. Center for Naval Analyses Research Memorandum CRM 97-61.10, 1997. Vapnik V. N. The Nature of Statistical Learning Theory, New York, SpringerVerlag, 1995. Goldberg D.E. Genetic Algorithms in Search, Optimization, and Machine Learning, reading, MA: Addison-wesley, 1989. Roddy, K.A. and Dickson M. R., Modeling human and organizational behavior using a relation-centric multiagent system design paradigm, Ph.D. thesis, Naval Postgraduate School, Monterey, California, 2000. www.ihpc.a-star.edu.sg

ACQUISITION OF BACKGROUND COEFFICIENT X. Y. QI AND C. LU Institute of High Performance of Computing, I Science Park Road #01-01 the Capricon Singapore Science Park II Singapore E-mail: qixy,luchun@ ihpc.nus.edu.sg

117528

Z.G.LIU Motorola Co Ltd, USA, 3231 North Wilke Road ARLINGTON HeghtsO, IL 60004 United E-mail: [email protected]

States57

An important issue in data mining research area is forecasting association rules—derive a set of association rules corresponding to the future circumstance from the current dataset. For a new situation, the correct set of rules could be quite different from those derived from the dataset corresponding to the current situation. To derive the set of rules for a new situation, the existing technique of Combination Dataset and Foundation Groups should be used. In this paper, intensive research is focused on the core of this technique—acquisition of Background Coefficients by information gain measure.

1

Introduction

Data mining techniques are well accepted for discovering previously unknown, potentially useful and interesting knowledge from the past datasets. Association rule mining [1] is one of the most important data mining techniques. Much research work has been concentrated on many aspects of the technique to improve the performance of the rule generation [2], however, little attention is paid to the prediction of association rules for the future circumstance with the current dataset. For a future circumstance, because of the unavailability of the data source, the only rules available to make the decision are those mined from the earlier dataset. For example, when a supermarket manager plans to set up a new store in a new location, decisions have to be made including what kinds of goods are likely to be in greater demand and therefore need to be stored more beforehand. Unfortunately, the only information available for the manager to make the decision is the past dataset from the past store that he manages and is operational. But is there any method to obtain a set of association rules for the new store before it starts to run? 1.1 the Concept of the Background Coefficient, Foundation Group and Construction Dataset Consider the above supermarket example. Now setting up a new store, because the new data is still not obtainable, the manager has no other choice but to use the discovered association rules from the past dataset to make the decision. But, if the customer profiles of the two stores are quite different, many of the rules are likely to be inapplicable to the new store. Figure 1 shows the association rules discovered from two stores under one manager with the same supermarket environment. The rule soap => electric shaver is valid for the first store but not for the second, while soap => lipstick is valid only for the second store. The supermarket environment such as all kinds of resources and managerial

834

835

methodology for the store 1 and 2 is the same. Further study shows that 85% of the customers for store 1 are men while 75% of the customers for store 2 are women. Generally speaking, men are more likely to be the customers of electric shavers while women are of lipsticks. Hence, the above rules begin to become understandable.

Confidence threshold = 20%, STORE 1 Identified rule: soap => electric shaver (confidence = 21%) STORE 2 Identified rule: soap => lipstick (confidence = 23%) However, The rule: soap =* electric shaver is not true for store 2 because the fact is that 100 transactions show soap, but only 14 of them are electric shaver. Thus, the confidence level is below the threshold. Figure 1: An example of showing variation of rules in two stores

From the above example, it is seen that the gender of the customer plays an important role in mining the two rules, even though it does not directly appear in the rules as an item. Such kind of background attribute (does not appear in the rules) is called Background Coefficient that can influence the generation of the rules that indicate the relations among the foreground attributes (those items appear in the rules). A set of Background Coefficients with the associated values/value-ranges identifies a Foundation Group that has two natures no matter which circumstance the Foundation Group resides in. 1) The members in the Foundation Group have the same characteristics that result in their same behavior. In the above example, for instance, Background Coefficient "gender" with the associated value "male" identifies a male group whose members are more likely to be the customers of the electric shaver whichever store they go to.

2) Every Foundation Group corresponds to a set of association rules. Still in the above example, the rule that corresponds to the male group is : its members, in general, will buy the electric shaver when they buy the soap. With N Background Coefficients, N-dimensional space can be obtained for specifying the Foundation Groups. For example, if gender and degree are two Background Coefficients, a two-dimensional space can be obtained. Also, the Foundation Groups derive from the combination of the values of these N Background Coefficients. If the values for the attribute "gender" are {male, female} and for the attribute "age" {<25, 25~45,>45 }, then six Foundation Groups need to be formed to address all of the possibilities. {1: Male <25}{2: Male 25-45}{3: Male>45}{4: Female <25}{5: Female 25-45}{6: Female >45} Generally, if there are N Background Coefficients for a circumstance and there are n. possible values for N(. Background Coefficient, for i=l

N. To cover all of the

possibilities for a circumstance, the total number of Foundation Groups is: Nc=n 1 *n 2 *n 3 ,....,n A ,

836

These whole Nc Foundation Groups constitute a set called Complete Group Set. Each circumstance consists of the same Complete Group Set, but every individual Foundation Group in the Complete Group Set is with different proportion. For the above example, both future store customer group and current store customer group consist of those six types of customers. If the current store has more women with age less than 25, the proportion of the Foundation Group 4 in the example should be larger. If for the future circumstance, there is no men with age greater than 45, the proportion for the Foundation Group 3 in the example is simply zero. Thus, the dataset for each circumstance is the sum of all of the Foundation Groups in the same Complete Group Set but with different proportion for each individual Foundation Group. Let the number of Foundation Groups in the Complete Group Set for a circumstance be N, the number of tuples of the Foundation Group i be Fi and the proportion of the Foundation Group i be Pi, for i=l, ,N. The number of tuples of the dataset for this circumstance is:

1=1

Since the dataset for each circumstance is the sum of all of the Foundation Groups in the same Complete Group Set but with different proportion for every individual Foundation Group, this also holds for the future circumstance. Thus if all of the Foundation Groups and corresponding proportions for them, that is. values for the variables on the right side of the formula 1 can be obtained, the dataset for the future circumstance called Combination Dataset can be constructed and the association rules for the circumstance can be mined.

2

Acquisition of Background Coefficients

This section addresses how to find the Background Coefficients, which is also our contribution. According to the section 1, since the definition of the Background Coefficient is that it is a background attribute that can influence the generation of the rules that indicate the relations among the foreground attributes (those items appear in the rules), it shows that this background attribute is highly relevant to these foreground attributes that constitute the association rules. Intuitively, an attribute is highly relevant to a given class if the values of the attribute can be used to distinguish the class from others. For example, it is unlikely that the color of a computer can be used to distinguish expensive from cheap computers, but the brand, the speed of hard-disk and memory are likely to be more relevant attributes. Thus, the first step to find the Background Coefficients is to construct the classes (that will be distinguished) with those foreground attributes appearing in the association rules. To make the classes complete, they are all of the combinations of items in the association rules. Therefore, the attributes that are relevant enough to distinguish these classes are Background Coefficients. Take the rule for division 1 from figure 1 as an example, from the rule: soap => electric shaver four classes can be obtained: (1) buy soap, buy electric shaver (2) not buy soap, buy electric shaver (3) buy soap, not buy electric shaver (4) not buy soap, not buy electric shaver. After the classes are constructed, attribute relevance analysis can be carried out. The general idea is to compute information gain that is used to quantify the relevance of an attribute to a given class.

837

Let S be a dataset that has s tuples, if there are m classes, S contains Sj tuples of class Q, for i=l,....,m. An arbitrary tuple belongs to class Q with probability s/s. the expected information needed to classify a given tuple is

v- s,, s,. ,s m )=-2,— l o g 2 —

«SPS2

i=\

s

(2)

s

An attribute A with values { a ^ , ,av} can be used to partition S into the subsets}Si,S2, ,SV}, where Sj contains those tuples that have value a, of A. Let Sj contain Sy tuples of class Q. The expected information based on this partitioning by A is known as the entropy of A.

E(A)=]T^+ 7=1

+ SmJ

I(sw

,smj) (3)

S

The information gain obtained by this partitioning on A is defined by Gain(A)=I(s,,s 2 , ,s m )-E(A) (4) To use this approach, firstly m classes need to be obtained based on the combinations of items in the association rules, then every attribute except the items in the association rules needs to be tried as attribute A to compute the information gain. By doing so, a ranking of attributes can be obtained. All of the attributes above the relevance threshold are considered as Background Coefficients.

Conclusions It can be concluded that the approach introduced in this paper provides a simple, but powerful means for acquiring Background Coefficient. References 1. R. Agrawal and R. Srikant, "Fast algorithms for mining association rules," in Proceedings of the 20th International Conference on Very Large Databases, September 1994, Santiago, Chile, pp. 487-499. 2. R. Agrawal and J. Shafer, "Parallel mining of association rules: Design, implementation and experience," in IEEE Transactions on Knowledge and Data Engineering, 1996.

A FRAMEWORK FOR A REAL-TIME DISTRIBUTED RENDERING ENVIRONMENT

Zhu Huabing, Chan Kai Yun,Tony Centre for Advanced Media Technology SCE Nanyang Technological University Singapore 639798 E-mail: [email protected], [email protected] This paper proposes a framework for a Distributed Rendering Environment (DRE). Our research focuses on the parallel rendering architecture and the image transmission scheme from a server to its clients. Compared with the object space sorting scheme, the image space sorting scheme requires lower communication bandwidth but is prone to load imbalance. In order to overcome this disadvantage, we propose a Dynamic Mesh-based Screen Partitioning (DMSP) algorithm to partition a large task into several smaller, similar-sized tasks and assign these tasks across the rendering nodes evenly. Another challenge of DREs is the transmission of image data between clients and the rendering engine. We propose a novel image transmission scheme using image-based rendering techniques to solve this problem. The goal of these efforts is to minimize the communication cost without impacting the image quality and load balance.

1

Introduction

C o m p u t e r A i d Design ( C A D ) , Scientific Visualizations a n d 3 D g a m e s often n e e d user-steered interactive displays of very c o m p l e x e n v i r o n m e n t which usually cont a i n s massive d a t a s e t s . Ideally, t h e i n t e r a c t i v e visualization n e e d s t o m a i n t a i n a frame r a t e of a t least 20 frames p e r second t o avoid j e r k i n e s s 1 . However, in m o s t case, such massive d a t a s e t s a r e t o o huge t o b e r e n d e r e d in r e a l - t i m e by even highe n d g r a p h i c s s y s t e m s , i.e. S G I Infinite Reality E n g i n e . M o s t large-scale parallel c o m p u t e r s d o n o t have directly a t t a c h e d , high-resolution graphical displays w i t h interactive i n p u t devices. It is m o r e u s u a l t o access t h e parallel c o m p u t e r s from a w o r k s t a t i o n connected via a L A N , such as E t h e r n e t or F i b e r D i s t r i b u t e d D a t a Interface ( F D D I ) , or even a W A N . T h i s t y p e of access s u p p o r t s s h a r e d use of a n expensive c e n t r a l resource b y multiple users. T h e s i t u a t i o n w h e r e b y t h e user ( a n d t h e user interface) is decoupled from t h e i m a g e synthesis engine is called a s t h e Distributed Rendering Environment (DRE). R e c e n t research work in D R E h a s focused on r e a l - t i m e s y s t e m b a s e d o n clusters of low-cost c o m p u t e r s . T h e goal of an effective parallel r e n d e r i n g s y s t e m s h o u l d achieve t h e following i m p o r t a n t goals: — It should

balance the workload

among

— It should minimize

the overheads

— It should implement stage. — It should minimize part of the system.

an effective

the rendering

nodes.

due to pixel and primitive control component

the communication

between

redistribution.

to avoid limiting

the rendering

the

pipeline

part and the

GUI

I n o r d e r t o m e e t t h e s e r e q u i r e m e n t s , o u r a p p r o a c h focuses o n t h e p a r t i t i o n algorithm and the communication.

838

839 2

S y s t e m Overview

Our system is divided into four subsystems. They are Client Sub-system, Data Management Sub-system, Parallel Rendering Sub-system and Image Composition Sub-system. The Client Sub-system is composed of a GUI and an image assembling component. The GUI provides the interaction with the user and displays the animation from the image assembling subsystem. The image assembling subsystem is used to compose the last image by the raw image generated by the image-based rendering and the image difference from the server side. The data management system partitions the geometry data into viewpoint cells associated with a cull box each. According to the viewpoint, it updates the primary storage memory and secondary storage memory. It manages the data structure of the scene data to optimize the performance. There are two parts in the parallel rendering system, controller and rendering nodes. The controller partitions the screen space and assigns the tasks across the rendering nodes evenly. Another function of the controller is to monitor the workload of every rendering node to avoid load-imbalance. The rendering node renders a region of the screen space and sends it to the image composition system. The image composition system receives the data from the rendering nodes and generates an exact image. It also generates a raw image by image-based rendering from the old image and the predictive image. Comparing the exact image with the raw image, the comparison manager gets the difference and sends it to the client. The system works as following steps. Each new viewpoint sent into the pipeline is passed to the data management system and the parallel rendering system. The viewpoint determines what geometry are retrieved from the database and loaded into primary and secondary storage memory. After cell-based culling, the rendering system runs view-frustum culling and screen space partitioning. The tasks are assigned to the rendering nodes evenly. The image composition system receives the image fragments from the rendering nodes and generates an exact image. Comparing the exact image with a raw image generated by image-based rendering techniques, the system sends the result to the client. In client, the system assembles an exact image with the help of the image difference from server side and a raw image generated by the same method in image composition system. 3

Task Partition scheme

The taxonomy of parallel rendering was developed around 1994 by UNC (University of North Carolina) graphics researchers according to where the redistribution occurs in the rendering pipeline. It includes the classes "sort-first", "sort-middle" and "sort-last" 8 . It has been shown that communication overheads can be reduced by partitioning the workload in screen space so that pixels rendered by a rendering node can be sent to the display directly with little or no depth compositing and oversampling. That means sort-first scheme is a good choice for the real-time parallel-rendering system with PC clusters. However, sort-first is prone to load-imbalance because of the random distribution of the primitives in the screen space. The choice of strategy

840

Figure 1: Procedure of DMSP

for mapping screen regions to processors has a critical impact on the performance of a sort-first system. One of the main advantages of sort-first is its ability to take advantage of the coherence of on-screen primitive movement. In the real-time interactive system, the viewpoint usually changes very little from frame to frame, and thus on-screen distributed of primitives does not change appreciably either. Using retained mode with sort-first means that the resulting database distribution from one frame can form the initial distribution for the next frame. Thus the primitives will migrate from rendering nodes to rendering nodes, and they will only need to be communicated as they cross over boundaries between different rendering nodes, screen regions. Load-balance, another important aspect of the parallel rendering system, centers around efficiency. The key idea of our approach is to cluster primitives into groups for rendering by each server dynamically based on the overlaps of their projected bounding volumes in screen space. It is a Dynamic Mesh-based Screen Partition (DMSP). Each single partition step of DMSP proceeds as follows: 3D object bounding boxes are used instead of the object in the scene. According to the amount of the polygon in the object, its a weight value is set to the bounding box. E.g. an object with 4000 polygons, its bounding box weight is 4000. Then, we split the current region along the longest axis. Hence, we shall sweep vertical lines. We begin with a vertical line on the left. The vertical line will move to the right. The objects assigned as we move the left line will belong to the left group. The line is moved until the total amount of the bounding box weight in the left group is almost half of the total amount of the weight in the screen space. The other objects are assigned to the right group. At that time, if the partition line crosses a bounding box, this object will be assigned to the both groups. Then, the right and left groups are partitioned following the same steps as above. This process is recursively until exactly N tiles and groups are formed, and one is assigned to each of the N Rendering nodes. The following figure shows the partition procedure. Because of the frame-to-frame coherence, DMSP does not need to run every frame. It runs when the load-imbalance rises. In order to utilize the frame-to-frame coherence, the last partition result can be the initial status of the next partition operation. At that time, the system calculates the amount of the primitives in each partition region. Following the last partition sequence, system adjusts the partition lines according to the primitives' amount in each partition region. Figure

841

Figure 2: DMSP—Utilization of Frame-to-frame Coherence

2 shows the steps. When the workloads among the partition region are distributed imbalance, the vertical partition line moves from the initial position right or left until the amount of primitives in both sides are equal. Then the horizon partition lines move from the initial position up or down until the primitives are distributed evenly in both sides. This procedure repeats recursively until the workloads distributes evenly among the partition regions. By this way, the partition cost is reduced obviously because of the temporal coherence and space coherence. Therefore, the average of partition cost of every frame will not be much higher in comparison with some static partition method. 4

Image Transmission Scheme

Reducing the communication overhead between the remote display nodes and the rendering nodes is one of the key problems in distributed rendering systems. Typically distributed rendering system deploys the display nodes remotely from rendering nodes. The straightforward approach of transmitting and displaying rendered images results in a delay of one round-trip between viewpoint change and the corresponding change in the displayed image. The display nodes send the camera parameters to the rendering system and the rendering system send compressed images sequences back to the display nodes. Therefore, this solution will require high network bandwidth to display video at interactive frame rates. Temporal coherence, image space coherence and 3D geometry information are used to overcome the bandwidth limitation in our solution. This framework is designed for a cooperative client-server approach. Our approach is to combine 3D viewpoint prediction with the more general image warping and image interpolating to increase the quality of the raw images and reduce the image data that need to be sent from the rendering nodes. The scheme is to predict the new viewpoint arrived at after several frames according to the camera's motion vector in 3D scene. The predicted frame will be rendered and sent to clients. According to the previous frame and the predicted frame, a raw current frame is generated using image-based rendering techniques by both the client and server. At the same time, the rendering system renders the exact current frame, compares the raw frame and the exact frame and sends the difference to the client side. Therefore, the server sends only incremental amounts of information for each frame, greatly reducing the bandwidth required for remote navigation. As shown in figure 3, by

842 F«*m«1S

Figure 3: Frame Generation in Client

this way, in most cases, a whole image is transmitted once every 15 frames and the other 14 frames are generated using image-based rendering and the difference information. Therefore, the bandwidth requirement can be reduced greatly. Because of the motion prediction, the network latency can be hidden but the media generation latency is unavoidable. With image-based rendering techniques, in order to smoothly transforming one image to another, one option is to warp the referenced image to get the extrapolated images. However, this way is resource critical and difficult to be implemented in real-time system. It also needs pre-processing computation. Therefore, linear interpolating based on pixels' offset vector is a better choice. Image interpolating is the technology that uses image pixel depth information and camera parameters to find out the pixel-by-pixel correspondence between each pair of images. The procedure includes following steps. Firstly with the image warping equation 7 , we can get pairs of corresponding points in the reference image which are projections of the same world space point. Then, offset vectors can be defined for the pixels from the source image to the destination image in image space. The 'offset map' can be created to specify the pixels' motion between the reference images. Each pixel in both the source and destination image is moved along its offset vector by the amount given by linearly interpolating the image coordinate. Figure 4 shows the procedure. When the camera moves linearly, the camera's local coordinates remain parallel. We can get the interpolation equation as below: " 3 = Wl + 0{U2 - « l )

V3 = Vl + P(V2 ~ Vl)

Z(ux,Vi)

Z(u1,v1)

+

/3{Z(u2,v2)

Z(uuVl)

Z{ui,vi) + /3(Z(u2,v2)

Z(ui,vx)) -

Z(uuVl))

(1) (2)

Where the (ui,v\), (u2,v2) are the coordinates of the pixel in the reference images and («3,^3) is the correspond coordinates of that pixel in the interpolated image. (3(0 < (3 < 1) is the linear coefficient and Z(u, v) is the depth of pixel (u, v).

843

Figure 4: Image Interpolation

Therefore, we can find that when the camera pans (the image planes remain parallel), the Z{u2,V2) = Z(u\,Vi). Then, we can get the equation as: u3 = m +/3(u2 - ui)

(3)

v3 = vi + p{v2 - t>i)

(4)

Under this condition, the linear interpolation result precisely matches the images generated in the normal way. In the other words, linear interpolation of images produces valid interpolated views. In most cases, Z{u2,V2) / Z(ui,vx) , but difference between two reference images is slight and imperceptible in our system because the camera does not moves very fast. Therefore, the non-linear factor,-?-, .. -W/"1' •,—y? TT , is close to 1. The result of linear interpolation is much close to the image generated in the normal way. Image interpolation method gives a pair of interpolated images. These can be composited and using a pair of interpolated images in this way to abate the holes problem. This scheme uses image-based rendering technology on two frames to generate the raw image of the internal frame. Therefore, the raw image is much close to the exact one. Hence, the difference will be so slight that the cost of the communication is low enough. Because of the 3D motion prediction, the system can render the frames before they are required to display. Therefore, the network latency can be hidden. 5

Conclusion and Future Work

A load-balance image partition scheme are designed named Dynamic Mesh-based Screen Partition. This algorithm is designed to overcome the susceptivity to loadimbalance for sort-first. The basic idea of this algorithm is to partition the screen

844

based on the distribution of the primitives in the image space. In order to simplify the computation when partitioning, the bound volume is employed instead of primitive groups. In order to decrease the requirement of the bandwidth between the client and server, our research on image transmission results a scheme which is based on the image-based rendering technology. The main advantages of this system are: (1) low inter-communication cost within the cluster for primitive redistribution; (2) low communication cost for image transmission; (3)load balance across the rendering nodes; and (4) sorting is performed before the whole rendering pipeline, enabling the system to be suitable for various shading approaches and models. In the next phase, still a lot of research works need to be done before the proposed scheme to be a robust system. We will investigate the geometry preprocessing step before sorting in detail and try to obtain an efficient algorithm. On the other hand, exploiting frame-to-frame coherence for sort-first is quite challenging and useful. We also wish to implement our system on the GRID platform as well. References 1. Aliaga D., Cohen J., Wilson A., Baker E., Zhang H., Erikson C , HofF K., Hudson T., Stuerzlinger W.,Bastos R., Whitton M., Brooks F., Manocha D., MMR: An Integrated Massive Model Rendering System Using Geometric and Image-Based Acceleration, Proceedings of Symposium on Interactive 3D Graphics (I3D), pp. 199-206, April 1999. 2. Ellsworth D., "Polygon Rendering For Interactive Visualization On Multicomputers", Ph.D. Dissertation, Computer Science Department, University of North Carolina at Chapel Hill, 1996. 3. Eyles J., Molnar S., Poulton J., Greer T., Lastra A., England N., and Westover L., "PixelFlow: The Realization", Proceedings of the 1997 Siggraph/Eurographics Workshop on Graphics Hardware, pp. 57-68, Los Angeles, CA, Aug. 3-4, 1997. 4. Gautier L., and Diot C., Design and Evaluation of MiMaze, a Multi-player Game on the Internet, Proceedings of IEEE Multimedia Systems Conference, Austin, 1998. 5. Hancock D. J. , Hubbold R. J., Distributed Parallel Volume Rendering on Shared Memory Systems, Proceedings of HPCN Europe, pp. 157-164, 1997. 6. Jarmasz J. and Georganas N.D., Designing a Distributed Multimedia Synchronization Scheduler, Proceeding of IEEE Multimedia Systems '97, Ottawa, June 1997. 7. McMillan L., An Image-based Approach To Three-Dimensional Computer Graphics, Ph.D. Dissertation, University of North Carolina, 1997 8. Molnar S., Cox M., Ellsworth D., and Fuchs H., A Sorting Classification of Parallel Rendering, IEEE Computer Graphics and Applications: Special Issue on Rendering, Vol. 14 No. 4, pp. 23-32, 1994.

845 on Rendering, Vol. 14 No. 4, pp. 23-32, 1994. 9. Molnar S., Eyles J. and Poulton J., PixelFlow: High-Speed Rendering Using Image Composition, Proceedings of Computer Graphics: SIGGRAPH '9,2,111 25-31, pp.231-240.Chicago, 1992, 10. Mueller C. A., The Sort-First Architecture for Real-Time Image Generation Real-Time Image Generation, Ph.D. Dissertation, Computer Science Department, University of North Carolina at Chapel Hill, 2000. 11. Wilson A., Lin M., Manocha D., Yeo B. L., Yeung, M., A Video-Based Rendering Acceleration Algorithm for Interactive Walkthroughs, Proceedings of ACM Multimedia 2000, Los Angeles, CA, 2000.

IMMERSIVE VISUALISATION OF NANO-INDENTATION SIMULATION OF CU SHUHONG XU, JU LI*, CHONGHE LI, FRANK CHAN Institute of High Performance Computing (IHPC),Singapore E-mail:

117528

[xush,lich,chancj]@ihpc.a-star.edu.sg

f

MIT, Massachusetts Avenue, Cambridge, MA02139, USA E-mail: [email protected]

This paper introduces the immersive visualisation of nano-indentation simulation of Cu. The molecular dynamics simulation is performed in a system consisting of 326,592 atoms of size 18.6 x 17.0 x 18.4 nm, under periodic boundary conditions. Based on the coordination number calculation, a practical method for the extraction of "extraordinary" atoms is developed. The simulation has been visualised in the CAVE™ - an advanced full-immersive virtual environment.

1

Introduction

With the advent of high performance computing, materials research now embraces another approach: computer simulation. Increasingly materials modelling has taken on the meaning of theory and simulation of materials properties and behaviours. Indeed, in the near future, it is not far-fetched for new materials to be created with enhanced performance, extended service life, acceptable environmental impact and reduced cost. This is realised by exploiting advanced materials modelling and visualisation techniques. Scientists became aware of the importance of molecular graphics in the mid-1960s [1]. However, Due to the limitations of computer hardware and numerical technology, molecular modelling and visualisation research did not prosper until 1990s. In 1992, Richardson [2] described the kinemage, and supporting programs MAGE and PREKIN. This was the first program that brought molecular visualisation to lots of users. In 1993, Roger Sayle [3] developed a much complete molecular visualisation system RasMol and placed the C language source code in the public domain. This allowed others to adapt the program to additional types of computers, and to incorporate RasMol's wonderful user interface and renderings into derivative programs. Such derivatives include notably MDL Chime [4] and WebLab [5], RasMol is widely used throughout the world. However, its as well as PDB file format is not very convenient for large-scale atomistic simulations. So, Ju Li [6] developed his own visualisation program atomEye in 1999. More molecular visualisation tools can be found at [7]. One notable phenomenon is that all these systems confine the user to a 2D environment to visualise and interact with 3D molecular structures. This can be very limiting in that the spatial relationships between atoms may be unclear. Virtual reality systems such as CAVE™ make 3D spatial interaction possible. Being fully immersed in a CAVE™ environment, the user can interact visually, aurally, and tactilely with molecular models in the most natural way. This helps users efficiently extract intrinsic physical properties and gain mechanistic insights from atomistic modelling. Interactive and immersive visualisation of Molecular Dynamics (MD) simulations is still in its infancy, especially for large data. How to efficiently extract useful features within a large-bulk of uninteresting atoms and how to achieve real-time walkthrough speed remains two of the top challenges. This paper introduces the immersive visualisation of nano-indentation simulation of Cu. A practical method for feature extraction is presented and an integrated approach is

846

847 employed to achieve real-time walkthrough speed. The system has been implemented in a CAVE ™ virtual environment. The paper is structured as follows. Section 2 briefly introduces MD simulation of nano-indentation of Cu. The method for "extraordinary" atom extraction is presented in Section 3. Section 4 introduces the techniques for realtime visualisation. Conclusion and future work is drawn in Section 5. 2

Nano-indentation Simulation

MD simulation of nano-indentation of Cu is performed in a system consisting of 326,592 atoms of size 18.6x17.0x18.4 nm, under periodic boundary conditions. The (111) surface of the system faces the indentor, which is a blunt cylinder 2.5 nm in diameter, composed of immobile Cu atoms, as shown in Fig. 1. The simulation is carried out at T = OK and T = 300K. The EMT potential [8] is used. Fig. 2 shows one configuration [9].

Fig. 1 Nano-indentation simulation setup

Fig. 2 Nano-indentation of Cu

The MD simulation shows successive events of stress reliefs in variation with indentor depth, thus providing an atomistic link between heterogeneous deformation response and the theoretical strength of the bulk material. For the details, see [9]. 3

Extraction of "Extraordinary" Atoms

One of the problems of 3D materials simulations is that we may only be interested in a small subset of the data such as dislocations or defects. Often, a feature of interest is embedded within a large-bulk of uninteresting atoms. As a typical example, Fig.2 shows the surface slip lines caused by nano-indentation. But it cannot reveal internal dislocation information. A bulk view without treatment is not helpful due to the large quantity of atoms. To extract interesting features, a practical method is to make use of the coordination number. In a perfect crystal lattice, every atom would have the same number of neighbours since every atom is equivalent. However, in a configuration other than a perfect crystal lattice, the above scheme would yield some atoms with different number of nearest neighbours than the others. For perfect crystal Cu at equilibrium, its crystal structure is FCC and the lattice constant is a = 0.36078 nm. Each inner atom would initially have twelve nearest neighbours and 4r = a , as illustrated in Fig. 3. After indentation, the nearest neighbours of each atom can be found using a cut-off radius /^ ut . If the distance 4 j between arbitrary two atoms Pj and Pj is less than 7^ut, then Pj is one of Pi's nearest neighbours. Obviously, 2r < RcM < a, i.e. 0.255110 nm < /^ ut < 0.36078 nm. An enclosing box {[;q-Ax, xi+Ax], [yi-Ay, y,+Ay], [ZJ-AZ, ZJ+AZ]} can be used to look for the potential nearest neighbours and speed up the distance calculations, where fa, y„ zj)

848

are the Cartesian coordinates of Pj, Ax, Ay, and Az are pre-defined non-negative values. After "ordinary" atom deletion, the "extraordinary" atoms are shown in Fig. 4.

Fig. 3 Crystal structure of perfect Cu ((001) plane)

4

Fig. 4 "Extraordinary" atom extraction

Immersive Visualisation

Interactive visualisation in an immersive environment has crucial requirement on frame rate (normally should > 15 fps). To achieve real-time walkthrough, A. Nakano and R. K. Kalia [10] used a multiresolution MD algorithm and the associated data structures to visualise large quantity atoms. In another paper, A. Nakano [11] employed an octree data structure for visibility culling and levels-of-detail control for fast rendering. Theoretically, these techniques will decrease the number of polygons to be displayed. However, accounting for the user perspective requires additional calculations. In my opinion, solving large-scale MD visualisation problem requires an integrated systems approach. On a high level, the following four issues are worth mentioning: 1.

2.

3.

4.

Data pre-processing and feature extraction. Generally, a direct rendering of all atoms is not helpful. We are interested in extracting the physically based features that we specify. We can represent these features, if successfully extracted, in an economical way for interactive visualisation. Scalable parallel visualisation. To visualise large-scale MD simulations at the highest possible resolution, we need the processing power and memory of a parallel computer. PC clusters, which have become increasingly popular and affordable, make parallel visualisation an even more attractive approach [12]. Optimisation of basic graphics functionality. For molecular visualisation, only very few types of objects in massive quantities, such as spheres (as atoms), cylinders (as bonds), and points (as charge density), need to be rendered. Therefore, very efficient graphics routines might be written and optimised. Visualisation management. There exist some useful techniques for large-scale visualisation, such as visibility culling using octree data structure [13], Level of Detail (LOD) and database paging [14], etc.

We adopt an integrated approach. Before visualisation, MD simulation results are preprocessed and the coordination number of each atom is calculated. According to the purpose of MD simulations, atoms might be divided into three groups according to their significance, i.e. important, normal and less important. For important atoms, we use high quality spheres (more polygons) to represent them. We also extend the LOD technique for navigation. During fast navigation, atoms are rendered in a lower resolution. The system has been implemented in the CAVE™ located at IHPC. The host machine is a SGI

849 Onyx2 with four InfiniteReality2 graphics pipelines, sixteen R10000 CPUs and eight Gbytes of system RAM. Fig. 5 shows one example. 5

Conclusion

The techniques for immersive visualisation of large-scale MD simulations are introduced. Based on the brief introduction to MD simulation of nano-indentation of Cu, a practical feature extraction method is presented. This may help materials researchers identify the atoms of interesting and gain mechanistic insights from atomistic modelling, at the same time, reduce rendering burden. To achieve realtime walkthrough in a full-immersive virtual environment, we adopt an integrated approach. Fig.5 Immersive visualisation in CAVE Visualisation of MD simulations in the CAVE™ environment provides users a very intuitive way to understand, explore and interact with the microworld. The feedback from our users is very positive. Future work will focus on exploring collaborative visualisation of large-scale MD simulations in tele-immersive environments over broadband networks. References 1. Levinthal, Cyrus, "Molecular Model-Building by Computer", Scientific American 214(6):42-52(1966). 2. D. C. Richardson, J. S. Richardson, "The kinemage: a tool for scientific communication", Protein Science 1, 3-9 (1992). 3. http://www.umass.edu/microbio/rasmol 4. http://www.umass.edu/microbio/chime/ 5. http://www.accelrvs.com/about/msi.html 6. http://lonp-march.mit.edu/liju99/Graphics/A/ 7. http://molvis.sdsc.edu/visres/index.html 8. K.W. Jacobsen, P. Stoltze, J.K. Norskov, "A semi-empirical effective medium theory for metals and alloys", Surface Science 366, 394 (1996). 9. Ju Li. "Modeling microstructural effects on Deformation Resistance and Thermal Conductivity ", Ph.D. thesis, MIT, Dept. of Nuclear Engineering (2000). 10. A. Nakano, R. K. kalia, and P. Vashishta, "Scalable Molecular-Dynamics, Visualisation, and Data-Management Algorithms for Materials Simulations", Computing in Science & Engineering 1(5), 39-47 (1999) 11. A. Nakano, M. E. Bachlechner, etc., "Multiscale Simulation of Nanosystems", Computing in Science & Engineering 3(4), 56-66 (2001) 12. B. Wylie, C. Pavlakos, etc., "Scalable Rendering on PC Clusters", IEEE Computer Graphics and Applications 21(4), 62-70 (2001) 13. http://www.flipcode.com/tutorials/tut octrees.htm 14. J. Hartman and P. Creek, IRIS Performer Programming Guide, (Silicon Graphics, Inc. 1998)

Distributed processing and visualization of MEG data DATE SUSUMU, Graduate School of Information Science and technology, Osaka University, Japan Email: [email protected] PROF. SHIMOJO SHINJI, CyberMedia Center, Osaka University, Japan Emsil: [email protected] MIZUNO-MATSUMOTO, YUKO Graduate School of Engineering, Osaka University, [email protected]

Japan

SONG JIE, A/PROF. LEE BU SUNG, A/PROF. CAI WENTONG AND WANG LIZHE School of Computer Engineering, Nanyang Technological University, Singapore E- mail: [email protected]

Magnetoencephalography(MEG) is a non-intrusive method slowly gaining acceptance in use in brain wave analysis. It is an important medical tool for early detection of brain related disease. A major problem is the amount of data generated by such imaging systems and its analysis. In this paper we will present the use of GRID Technology to provide analised the data across Linux clusters located in Japan and Singapore. The data is then visualized and tools for detecting adnomality in the brain signals are highlighted.

1.

Introduction

Medical health care over the years has seen the use of new technology to assist medical personal to more effectively perform their tasks. Highly sophisticated medical technologies such as magnetoencephalography (MEG), positron emission tomography (PET), and functional Magnetic resonance imaging(fMRI) facilitates early detection and diagnostic of medical problem. All these technology produces large quantity of data that needs to be stored and processed. Medical imaging has emerged as an important practical application that requires huge amount of storage as well as computation, which fits nicely into the application that could be supported by Grid Technology. Understanding of the brain function, have always been a major area of interest. With the advance in technology especially sensors technology, researchers are certainly given the opportunity to understanding the brain function much better than before. In a society that has an aging population it is important to have early detection of brain disease. This would help doctors in the early detection. MEG is an important tool that is gaining wider usage for such purposes.

850

851

Our research focuses on the deployment of a distributed collaborative environment for the analysis of MEG data. In this paper we will describe the research and practical work carried out by the participants in this project. Section 2 will introduce the MEG system [1]. Section 3 will describe the infrastructure setup for the SC2002 demonstration. This paper concludes with section 4.

2. MEG system Brain imaging technology has advance tremendously over the past few years. MEG system is a highly sophisticated medical instrumentation system that is used in the early detection of brain disorders, eg. cerebrovascular disease and dementia. It can measure brain signals from multiple measurement points of the head mount. MEG measurement is characterized in terms of high degree of accuracy and non-invasiveness. The amount of data collected with MEG is very large. For example, in the case of 1-hour measurement, at a sampling frequency of 250 Hz using 64 sensors, the amount of data collected reaches 0.9 G bytes. The signal is processed using signal source localization method, i.e. to match the data from the various sensors so that we can find the origin of the signal. To date, there are a number of source localization methods such as SAM (synthetic aperture magnetometry) and wavelet crosscorrelation analysis[3]. The wavelet cross-correlation analysis method was used in our study. The computational requirements for processing the data is very high. The computational model adopted for the analysis is Single-Program Multiple Data (SPMD). The processed data are mapped on to a graphic model of a head with the location of the sensors indicated as shown in figure 1. Different colors are used to color the status of the region. Red color is used to color the sensor that has detected an abnormal signal. The display of the data signal is time synchronized.

852

Figure 1: Visualisation of MEG data 3.

Distributed MEG Infrastructure

The infrastructure to support the distributed computing environment is shown in Figure 2. This set-up is used for the SC'2002 demonstration. There is a set of PC clusters at School of Computer Engineering (NTU), Singapore and the Cybermedia Center of Osaka University in Japan. The experimental environment at the Cybermedia Center is a PC cluster system composed of 12 compute nodes, with each containing single 1266Mhz PHI processor and a PC cluster with 40 nodes. The access to the MEG machine is through a PC running a daemon process. The PC cluster in Singapore consists of 8 Linux machines, each of which has two 500MHz Pentium III processors. User access to the system is through a portal running servlets to control the system functions. The capturing of the data from the MEG machine is done life through the activation of the daemon on the capturing machine. The user defines the duration of the capture period. The MEG data are stored in a directory according to the record of the patient. A LDAP database also stores the patient details and date and time of capture.

853 Osaka thiversity, Japan (40 nodes cluster and 12 nodes cluster)

Ifenyang Technological University, Singapore (8 nodes duster)

Figure 2: Set-up for SC2002 Once the data capture is complete, the user can then initiate the analysis of the data. The distribution of the data and analysis requests are automatically done by the analysis servlet. It communicates with proto daemons of three clusters both in Osaka and Singapore. The proto daemon pulls the raw data to be analyzed within its cluster from the data server, monitors the progress of the analysis, and finally sends back the results to the data server. The signal process of wavelet and cross-correlation analysis is implemented by a MPICH enabled parallel program named "meg". Each "meg" process analyzes the data of one sensor which is distributed by the proto daemon.

854

When the user in Baltimore sends the visualization request, the visualization servlet transfers results from the data server to the client, and reorganizes the data structure as required by the visualization program. Then the user can observe the animation of brain signals of the patient. 4.

Conclusion

In this paper, a distributed infrastructure for real-time capture and analysis of the brain signals obtained from MEG has been explored. To the end, the MEG capturing system was made online at Osaka University, MEG data are analyzed across the distributed environment at Osaka University (Japan) and School of Computer Engineering at Nanyang Technological University (Singapore). The project will be demonstrated live at the coming SC'2002. Although presently we are mainly working on the wavelet cross-correlation as the localization tool, other localization techniques will be explored. In addition we will be working towards optimization of the processing to improve the performance. Certainly Grid technology is well suited for this application. Acknowledgement We would like to express our gratitude to SingAREN and Asia Pacific Advanced Network for the use of the network. References 1. 2. 3.

4.

5.

S. Sato, ed. Advances in Neurology Vol. 54: Magnetoencephalography. NY Raven Press. 1990 htttp: Wwww.globus.org H. Li and T. Nozaki," Wavelet cross-correlation analysis to a plane turbulance jet", JSME International Journal. Vol. 40, No. 1 pp. 5866,1997 I. Foster, C. Kesselman, and S, Tuecke, " An Anatomy of the Grid: Enabling Scalable Virtual Organisation", International Journal Supercomputer Applications, Vol. 15, No3. 2001 Yuko Mizuno-Matsumoto, et. al, " Telemedicine for Evaluation of Brain Function by a Metacomputer", IEEE Transaction on Information Technology in Medicine, Vol. 4 NO. 2 June'2000.

A FAST ALGORITHM OF LEVEL SET METHOD FOR 3D PROSTATE SURFACE DETECTION SHAO School ofEEE, Nanyang Technological E-mail: LING

FAN

University, 50 Nanyang Avenue, Singapore [email protected]

639798

KECKVOON

School ofEEE, Nanyang Technological University, 50 Nanyang Avenue, Singapore E-mail: [email protected]

639798

NG WAN SING School ofMPE, Nanyang Technological E-mail:

Uninersity, 50 Nanyang Avenue, Singapore [email protected]

639798

Prostate surface detection from ultrasound images plays a key role in prostate disease diagnoses and treatments and the level set method can be employed to fulfill this task. This paper presents a fast algorithm of level set method to automatically detect the prostate surface from 3D transrectal ultrasound images. To reduce the computational load, a so-called "narrow band" solution is used to implement the level set method. However, the computation expense increases rapidly with the number of voxels residing in the narrow band and the narrow band rebuilding at each iteration takes up most of the processing time. To speed up the algorithm further, a straightforward solution is to reduce the size of narrow band or the number of iterations or both of them. In this work, we first reduce the image size by sampling it every other slice along x, y, z direction respectively, then we apply the narrow band solution to it to start the surface detection procedure. We applied this fast algorithm of level set method to 8 3D transrectal ultrasound images and the results have shown its effectiveness.

1

Introduction

Prostate boundary detection from ultrasound (US) images plays a key role in prostate disease diagnoses and treatments [2]. Currently, boundary detection and volume measurement are performed manually, which is arduous and heavily user dependent. A possible solution is to improve the efficiency by automating the boundary detection and volume estimation process with minimal manual involvement. In fact, there have been a number of works so far on automatic segmentation of prostate from ultrasound images [1, 2, 4]. However, most of them are interactive (or semi-automatic) and applied to 2D images only. In our work, we developed a new approach based on level set method [3] to automatically detect the prostate surface from 3D transrectal ultrasound (TRUS) images. However, the heavy computational load hinders its practical use. This paper presents a fast algorithm of our proposed method by first reducing the 3D TRUS image size then applying the narrow band solution [3]. The algorithm is detailed in the following section.

855

856

2

Methods

In the level set formulation, the 3D surface (S) detection problem can be expressed as the computation of a 4D function yr as dt

'

'

'

3

where Xe 9t , F is the evolving speed and the deformable surface S(t) (zero level set) is embedded in y/(X, t). The key task of this level set method is to design an appropriate speed function F which can drive the evolving surface to the desired boundary. In this work, F = (\-0.5P-0.5R)-eK is designed, where K is the mean curvature of the surface which keeps it smooth, P is a designation function of intensity probability and defined as f0 [l

// the probability of I(X) > 0.01 else

The intensity probability at voxel X is calculated according to a Gaussian mixture model (GMM) which models the intensity distribution of prostate. After that, we pass a combination of standard deviations of GMM as RT to the region discrimination function R to roughly extract the prostate region: [0 [1

z/2<[max(/J-min(/„)]<^ else

where max(7„) and min(7„) stand for the maximal intensity and the minimal intensity respectively in a cubic (n ) sliding window. In this work, we built a five layer narrow band to implement the level set method. Its 2D projection is shown in Fig.la, where active set contains the set of grid points which lie adjacent to the zero level set, immediate neighbors constitute the actual boundaries of the narrow band and next neighbors are used to compute the second order derivatives of y/ for calculating the mean curvature K. The idea behind the "narrow band" solution [3] is that it only affects points close to the region where the evolving surface is located so that the computation time is reduced significantly. Fig. lb shows an example of this narrow band solution applied to a typical 3D prostate scan. However, the computation time increases rapidly with the number of grid points which reside in the narrow band and the narrow band rebuilding at each iteration takes up most of the processing time. As seen in Fig.l, every run to completely detect the 3D prostate surface takes more than 60 minutes. One way to speed up the algorithm further is to reduce the

857

(a)

(b)

Figure 1. The narrow band construction and its application, (a) 2D projection of a five layer narrow band construction, (b) narrow band application on a typical 3D prostate scan: image size: 256x256x256, iteration number: 2538, elapsed time: 68 min.

number of voxels in narrow band or the number of iterations or both of them. In this work, we reduce the narrow band size simply by compressing the original image into 1/8 of it, that is, rearranging the image data by sampling them every other slice along x, y, z direction respectively. Fig.2b illustrates the data sampling process.

(a)

(b)

Figure 2. The illustration of image size reduction, (a) voxels concerned with the computation in the original image, (b) image data sampling by every other slice along x, y, z direction respectively.

After image size reduction, we apply the narrow band solution to start the surface detection procedure. Obviously, through this way, the voxels concerned with the computation are reduced dramatically. At the same time, the necessary iteration number for deforming the evolving surface to the desired prostate surface is reduced accordingly. 3

Results

Fig.3 demonstrates the experimental results before and after the image size reduction, where the green curves are manually outlined contours (the reference). It is very clear that these two detected results are almost the same compared with the reference but the run time is reduced to 4.5 minutes from 68 minutes. In this work, we applied this fast algorithm to

858

seven other 3D TRUS images to test its effectiveness, and the detected results are similarly satisfactory. As we expected, this fast algorithm of level set method reduces the computation time to at least 1/8 of which is needed by the narrow band solution alone.

(a)

(b)

Figure 3. Comparison of the experimental results before and after image size reduction, (a) the detected result before image size reduction, iteration number: 2538, elapsed time: 68' 04"; (b) the detected result after image size reduction, iteration number: 927, elapsed time: 4' 34".

4

Acknowledgements

The authors are very grateful to Wu R.Y. for his help in data acquisition. They would like to thank Dr. Kwoh C.K. for his knowledgeable comments and suggestions. Thanks to members of CIMIL for their support. References 1. Liu Y.J., Ng W.S., Teo M.Y. and Lim H.C., Computerised prostate boundary estimation in ultrasound images using the radial bas-relief method. Medical and Biological Engineering and Computing 35 (1997)pp.4450-454 2. Pathak, S.D., Chalana, V., Haynor, D.R. and Kim, Y., Edge-guided boundary delineation in prostate ultrasound images. IEEE Transactions on Medical Imaging 19 (2000) pp. 1211-1219 3. Sethian J.A., Level Set Methods and Fast Marching Methods: Evolving Interfaces in Computational Geometry, Fluid Mechanics, Computer Vision and Materials Science. (Cambridge University Press, Cambridge, UK, 1999) 4. Wu R.Y., Ling K.V. and Ng W.S., Automatic prostate boundary recognition in sonographic images using Feature Model and Genetic Algorithm. Journal of Ultrasound in Medicine 19 (2000) pp.771-782

A PATHOLOGICAL DIAGNOSIS SYSTEM FOR BRAIN WHITE MATTER LESIONS HAN SHUIHUA, LI FAN Department of Computer.Huazhong University of Sci & Tech,430074, E-mail: hsuihua® hust.edu.cn

P.R.CHINA

An automatic quantitative analysis system for brain white matter lesions is discussed. In general, brain white lesions are caused by head trauma, cerebral infarcts and so on. It has been proved that properties of the lesions is related to cognitive impairment, but it is a nontrivial task to build reliable tools to relate MR images with pathological findings. Here we have presented a novel algorithm for the segmentation of the brain tissue by mapping red, green, and blue intensity values of the imaged specimens into L*u*v* color space and utilizing the fast nonparametric clustering method. The x, y coordinates corresponding to the outer boundary of delineated white matter lesions structures within the segmented images served as input for shape analysis. Then we propose an image content based algorithm to determine the amount of white matter lesions, to find the location of pathological findings, to detect the speed of progression of the white matter lesions. Using Learning Bayesian Probabilistic, an efficient brain diagnosis expert system is created, it can be used for classification and retrieving of brain white lesions. Experimental results demonstrate that our approach can be very useful for pathologists.

1

Introduction

In general, brain white lesions are caused by head trauma, cerebral infarcts and so on. It has been proved that properties of the lesions is related to cognitive impairment, The ophthalmologist uses images to aid in diagnoses, to make measurement for cerebral volume, to assess brain development, to detect the difference between normal brain and those in pathological states, to look for severity of disease, and as a medical record. It is a natural human desire to find ways to avoid repetitive or routine work and be left with interesting and challenging work. Also it is advantageous to make use of outside expertise at the moment it is needed. There is a need for an imaging system to provide physician assistance at any time and to relieve the physician of drudgery or repetitive work[ \ Here we introduce a PDSB computer vision system, it seeks to reproduce the capabilities of the human expert, who can extract useful information and make decisions about diagnosis or treatment from medical images, even if the images are degraded. The PDSB system extracts objects of interest (lesions and anatomical structures) from the rest of the image of the cerebral volume, identifies and localizes the objects, and infers about the presence and location of abnormalities to make diagnoses or look for change in sequential images. Using a neuroimage database composed of clinical volumetric CT image sets of hemorrhage (blood),

859

860

bland infarct (stroke) and normal brains, a framework of our approach is shown as Figure 1. The three major components in this scheme are (1) Feature extraction maps each volumetric image into a multi-dimensional image feature space; (2) Feature selection determines the relative scale of the feature space and the best metric for image comparison; and (3) Pathological diagnosis give user the most suitable classification result.

Vi *
Quay Image

Speech Mouse

Images Video

Voice Tent

Clinical Data

±

Recognition i | C I3 1 :»!:Uiid^j ;_J%i--ii'v.,.,i \

va

CLIENT PROCESSOR ROI

Segmentation

CLIENT PRESENTER

Feature Extraction

\ Internet \

Qutiy DaLa

Display

Rcui«viil JDa'a

DnlaUfcW

SERVER INDEXING

Throws*

l-'eature Mnlchinjt

, 1 * . .1 1

vju.sbU

.1

i 1 1HRARY

flnwiiai IH'A SERVER RETRIEVAL^

\

|

Fig 1 Architecture of the PDSB system

2

Image Preprocessing

The PDSB segmentation is based on nonparametric analysis of the L_u_v_ color vectors obtained from the input image. The algorithm detects color clusters and delineates their borders based on the gradient ascent mean shift procedure , Show as Figure 2. It randomly tessellates the space with search windows and moves the windows till convergence at the nearest mode of the underlying probability distribution. The nonparametric, robust nature of the color histogram analysis allows accurate and stable recovery of the main homogeneous regions in the image. i (Map into Apply I]nf'.-n\' Derive Define] Prune Delineate Input iL*«*v* Mean • Spri iii Cluster SampleCenters Clustersr rtT.-iirunts Shift Image*' Color Centers Set I ftceedure

Figure 2 the processing flowchart of the segmentation algorithm

861

First, the RGB input vectors are converted into L_w_v_vectors following a nonlinear transformation. A set of m points Xi : : : xm called the sample set is then randomly selected from the data. Distance and density constraints are imposed on the points retained in the sample set, automatically fixing its cardinality. The distance between any two neighbors should not be smaller than h, the radius of a searching sphere Sh(x), and the sample points should not lie in sparsely populated regions. A region is sparsely populated whenever the number of points inside the sphere is below a threshold 71. Next, the mean shift procedure is applied to each point in the sample set. The mean shift vector at the point x is defined as

Mh(x) = — T

xf-x~3&

(l'l

' ' «x JsZw /(x) where nx is the number of data points contained in thesearching sphere Sh(x). It can be shown that the vector (Eq. 1) has the direction of the gradient density estimate when this estimate is obtained with the Epanechnikov kernel. The m points of convergence resulted by applying the mean shift to each point in the sample set are called cluster center candidates. Since a local plateau in the color space can prematurely stop the mean shift iterations, each cluster center candidate is perturbed by a random vector of small norm and the mean shift procedure is let to converge again. The computation of the mean shift vectors is based on the entire data set, therefore, the quality of the density gradient estimate is not diminished by the use of sampling. Finally, spatial constraints are enforced to validate homogeneous regions in the image. Small connected components containing less than 72 pixels are removed, and region growing is performed to allocate the unclassified pixels.

Figure 3 Segmentation result for variation of lesions

862

3

Feature Extraction

The inherent categorization of pathologies in medicine provides a way to classify medical images (cases). This content-based organization of medical images naturally forms a hierarchical structure with its bottom leaves corresponding to a set of specific images (cases), and higher nodes correspond to a subcategory of pathological cases. Each image can be viewed as a data point in some multidimensional feature space. The feature vector, the coordinates of a 3D image, functions in turn as the image index. With the help of neuroradiologists, the following salient visual features, which have significant semantic meanings and medical implications in interpreting brain images, have been identified131: • mass effect: asymmetry with respect to the ideal center line, due to structural/density imbalance • anatomical location: where the lesion resides in terms of the brain's 3D anatomical structure • density: relative brightness and darkness of the lesion • contrast enhancement: lesion sensitivity to contrast enhancement • boundary: the region between the lesion and its surroundings • shape: a characterization of the 3D volume the lesion occupies • edema: lucent area around a lesion, usually caused by excessive liquid • texture: the texture of the lesion • size: the dimension of the lesion • age: some visual features vary with patient's age. Lesion Detector: Using the results from the symmetry detector we have further developed the lesion detector which aims at automatically locating possible lesions (bleeds, stroke, tumors) by detecting asymmetrical regions with respect to the extracted central symmetry plane. The goal is to make this process adaptive and robust to different image densities, for example, acute blood appears white on an CT image while acute infarct (stroke) appears dark. Mass effect detector: The extracted symmetry axis is used as the initial position of an open snake. The final resting position of the snake indicates how much the brain has shifted from its ideal centered position due to a tumor. The difference between the deformed midline of a pathological brain and ideal midline, the ratio of the maximum distance of

863

the two curves over their vertical length, is used as a quantified measurement of mass effect. Anatomical location of the lesion detector: In order to determine the anatomical location of a detected lesion in a 3D image, the atlas is deformably registered onto the pathological brain. Figure 10 shows the result of a deformable registration (affine warping) from a 3D digital atlas (MRI Tl image) to a brain with lesion (MRI T2 image). Since the atlas is completely labeled, the general anatomical location of the lesion can be identified from the labeled voxels. While we are working on more sophisticated automatic deformable registration of a brain atlas to a pathological 3D brain image for exact matching, an interactive tool has been used that allows the user to identify corresponding anatomical points on the images, and a simple warping algorithmwarps one image to the other. 4

Feature Selection

In this work, feature selection is designed as a mapping from a potential feature set Fl = {fl; f2; :::; fn} to another feature set F2 = {wifi; W2f2; :::;wnfn} where 0 ^ Wj^ 1. Since some of the w;s may take the value 0, IF2I^IF1I. Two types of features are expected to be removed from Fi: irrelevant features and redundant features. As a result, for any image semantic class ci the posterior probabilities P(cjlFl) and P(CJIF2) are equivalent. The values of the w;s are the result from our learning algorithm. When many w;s are zeros (as is the case in this work) F2 becomes a much lower-dimensional space than Fl, and computation cost is reduced greatly.

864

Salact TsrwlCaee Salsa Rankwf Petrlavac! Cue U*JL.,.11A I . J ^ J

Target

Case

. -

L»d...k._J. ^4 Rank •-, I

Rstrifive Oais^sl '

K«£ricv*<* Gi.<« , 4 1 ftCHOJS

in

WtdgHt

Sim Score:

Patient w

> 43

| j: j

F

in," - i , r - -. '.-i 1

| OO/C0/1B 3.1U.1C

! |-;->

. 2 0 / 4 0 / 4 0 00 00 :'">!?

! hlKh

hl-lh

'

Is ft

:

Field

i^y

iiht h*-* ganglion

' hs«al ganglion

•-•lllp-r -

i siHpsoid

^-,.. ' 11 "(IIN<1

. w i l l oaflnud ;

Mid i i :

Mild

SOM

Modality Voxel Si»(» •MBWher Of Lesion l e s i o n Si
I r. frer.fcal hsactecha

Symptom

! acute, b i s e d

Diagnosis

Figure 4 feature selection by learning algorithm 5 Pathological Diagnosis We consider pathological diagnosis as a process of image classification, here present the format for the inferencing system as it is designed to be completed. Knowledge engineering: The evolution of the expert system involves selecting a set of diagnoses and features, defining a causal probabilistic structure over the set of diagnoses and features, quantizing the features into discrete values, assembling a representative set of images, annotating the images, teaching the network, filling out the feature set with beliefs and frequency values, improving performance, and validating the results of the process . The values of each feature (image manifestation) is correlated to each disease with conditional probabilities, p [M 11 D j ] . Such features include mass effect, anatomical location, density, contrast enhancement, boundary, shape, edema, texture, size ,age, etc. Figure 3 is an example of how relevant cases are grouped in the database. The key question is how we can reach the right branch at the pathological level by starting from the visual feature level.

865

Knowledge acquisition: We insert the relative incidence of each disease in four age groups, 0 to 6 months, 6 months to 2 years, 2 years to 60 years, greater than 60 years. We use annotated examples of each disease to propagate probabilities in the Bayesian network. The computational cost of obtaining each manifestation from an image is entered to help in deriving the utility of the next best feature. If necessary, probabilities are adjusted to optimize classification accuracy. Embedded expert system: Hypothetic deductive reasoning is applied. For any disease group, the values of specified features are always presented to the expert system. Based on the ranking of the diagnoses with the current set of features entered, the feature with the maximum utility from the remaining set is used to update the disease ranking. When the probability of a diagnosis reaches threshold probability, the diagnosis is accepted. (

a set of medical images

inlra-axial

Inside Brain

Q

a pathological subcategory

Extra-axial

Outside brain

DIAGNOSIS

mass aftcx:!

dtmsily

cettxinitj enkisncext

shape

boundary

IMAGERY

Figure 5: Pathological cases are classified by causes, anatomical locations and then visual features.

6

Discussion

In this work, we have demonstrated quantitatively the discriminating power of the statistical measurements of human brain asymmetry. One novelty of our approach in comparison to others in the medical image retrieval domain is to let the computer learn an image similarity metric suited for the given image semantics, instead of imposing such a metric by a human system designer subjectively. The main computational tools used in our study include memory based learning, Bayesian classification. References 1. Liu,Y., Rothfus,W.E., Kanade,T. Content-based 3D Neuroradiologic Image Retrieval: Preliminary Results. International Conference on Computer Vision (ICCV98), Bombay, India, January, 7998 2. David J. Foran, Dorin Comaniciu, Peter Meer etc. Computer-assisted discrimination among lymphomas and leukemia using imunophenotyping, intelligent image repositories . IEEE Transaction on information technology in biomedicine, 4(4), 265-271, 2000 3. Dorin Comaniciu, Peter Meer. Mean-shift: A robust approach toward feature space analysis. IEEE transaction on Pattern analysis and Machine Intelligent,24(5), 2002 4. Goldbaum M, Moezzi S, Taylor A, etc. Automated diagnosis and image understanding with object extraction, object classification, and inferencing in retinal images. IEEE International Conference on Image Processing, Proceedings , 695-698.1996 5. H. G. Schnack, H. E. Hulshoff Pol, W. F. C. Baare, W. G. Staal, M. E. Viergever, and R. S. Kahn, "Automated Separation of Gray and White Matter from MR Images of the Human Brain", Neurolmage, vol. 13, 230-237,2001. 6. D. H. Laidlaw, K.W. Fleischer, and A. H. Barr, "Partial-volume Bayesian Classification of Material Mixtures in MR volume data using voxel histograms", IEEE Trans. Med. Imag., vol 17, 74-86,1998.

USING STREAMING SIMD EXTENSION ON HIGH LEVEL IMAGE PROCESSING M. FIKRET ERCAN School of Electrical Engineering, Singapore Polytechnic, 500 Dover Rd, E-mail: mfercan @ sp. edu. sg

Singapore

YU-FAI FUNG The Hong Kong Polytechnic University, Department of Electrical Engineering, Hong Kong S.A.R.. E-mail: eeyffung® polyu.edu. hk

Hung Horn, Kowloon,

Streaming SIMD Extensions (SSE) is a unique feature embedded in the Pentium III and Pentium IV classes of microprocessors. By fully exploiting SSE, parallel algorithms can be implemented on a standard personal computer and a significant speedup can be achieved. In this paper, we study performance of SSE in higher-level image processing algorithms. Hough transform and Geometric hashing techniques are commonly used algorithms of this class and their implementation using SSE are presented.

1

Introduction

Recent microprocessors include special architectural features in order to boost their computing performance. Many of them employ a set of SIMD registers where data parallel operations can be performed simultaneously within the processor [1,2] and the overall performance of an algorithm can be improved significantly. The performance of Intel's SSE in common image and signal processing algorithms have been studied extensively in the literature. Nevertheless, most of these studies concerned with low-level image processing algorithms, which involves pixels in pixels out type of operations. In this paper, we exploit SSE technology in higher-level algorithms where the recognized features are the output of the operation. Hough Transform and geometric hashing techniques are the most commonly used algorithms of this type. The SSE registers are 128-bit wide and they can store packed values in various types (characters, integers etc.). There are eight SSE registers and they can be directly addressed using their register names [2]. Utilization of the registers is straightforward with a suitable programming tool. 2

Hough transform

The Hough transform is commonly used for detecting lines or circular objects in an image. In general, Hough transform has two phases: voting

867

868

process where the result is accumulated in a parameter space and candidate selection. Line detection: The voting phase involves calculating the candidate lines which are represented in terms of parameters by using the following equation: r = x CosG + y Sin G One method to utilize SSE registers for this computation is to pack four consecutive Cos 6 and Sin 6 values and compute four possible r values for a given x, y coordinate. Row and column values are copied to all the four words of a pair of data packs and computation is done for four values of 8 angle simultaneously. This method will be named as angle grouping (AG). Another method is to pack x, y coordinates of four image pixels into SSE registers. We named this method as pixel grouping (PG). Although the number of packing and unpacking operations is similar, the performance achieved with this method was slightly better. The PG method also enabled further optimisation by loop unrolling technique. This method named as SSE-optimised in the results section. Circle Detection: The Hough transform technique can be extended to circles and any other curves that can be represented with parameters. If point (x,y) positioned on a circle, then gradient (x,y) points to the centre of that circle. For a given radius d direction of the vector from point (x,y) can be computed then the coordinates of the centre can be found. Thus, circles are represented with the following equations where a and b indicates the coordinate of the centre point: a = x - d CosG and b = y - d Sin G The first method that we have employed packs edge pixels and gradient angles and calculation is performed for four of them simultaneously. This method will be called Gradients Grouping (GG). The second method deals with computation of values of a and b. This time coordinates x, y and gradient angle are copied into all the four words of data packs. The second method named as centre point grouping (CG). Timings given in our results are only measured for time consuming accumulator filling step. 3

Geometric hashing

In geometric hashing, a set of models is specified using their features points and a hash table data structure is established. All the possible feature pairs in a given model are designated as basis set. Feature point coordinates of each model are computed relative to each of its basis. These coordinates are then

869

used to hash into a hash table. During recognition phase, an arbitrary pair of feature points is chosen from the scene as a basis and the coordinates of the other feature points in the scene are computed relative to the coordinate frame associated with this basis. The new coordinates are used to hash into the hash table. In our application, the tactic used in parallel implementation was exploring data parallel segments of the algorithm and utilizing the appropriate SSE facilities. Time consuming operation is the computation of the coordinates of all other feature points referring to selected basis and vote for the entries in hash table; this is also where the data parallelism can be exploited using SSE registers 4

Experimental results

For line detection, we have used three different image sizes. For each image size, we have used three images with different percentage of edge pixel content. Table 1 shows execution times for all the test images. From the experimental study we can observe that each method provided a steady performance improvement regardless of image size. On the average PG provided better performance than AG method. However, loop unrolling provided a significant improvement. With the increasing percentage of edge pixels in the image, we could observe slight decline in the performance. This deterioration, which is due to increasing amount of data packing/unpacking operations resulted from increasing amount of edge pixels, did not affect the overall performance dramatically. For the circle detection we have used two different image sizes. For each image size, we have used three images with different percentage of edge pixel content where the smallest percentage has 4 and highest percentage 10 circular objects. The results obtained for the two different techniques are combined in the following Tables 2. We observed a tendency to better speedups with increasing edge pixels in circular Hough transform. CG resulted in slightly better performance compared to GG technique. Scanning the image and packing data into SSE registers generates a considerable overhead in the above implementations and makes a major speedup difficult.

870 Table 1. Performance of different approaches for Hough transform (in msec).

Imag;e size 256x256 Image size 640x480 Img size 1024x1024 %5 Non-SSE 80 60 AG 60 PG SSE Opt. 30

%10 160 130 120 70

%15 240 191 180 110

%5 420 340 320 191

%10 721 591 550 320

%15 %5 1092 1392 1101 892 1042 841 490 611

%10 2744 2274 2133 1212

%15 3996 3315 3054 1792

Table 2. Performance of different approaches for circular Hough transform (in msec).

Image size 640x480 Image size 256x256 %20 %15 %20 %10 %15 %10 Non-SSE 280 360 440 720 912 1592 GG 175 220 259 480 507 885 CG 185 212 254 450 536 838 For geometric hashing algorithm, we have used 1024 randomly generated models, each consisting of 8 points. The scene used in the experiments was also created synthetically and total number of points in the picture was 128. The probe time, that is calculation of coordinates for the remaining 126 points for a selected basis, was 14.8 sec for the sequential case. The SSE utilization reduced this timing to 7.9 seconds. By utilizing SSE, we could save time in computation of coordinates, though voting process has to be sequential which is a considerable bottleneck on the speed-up. 5

Conclusions

In this paper, we have examined the application of SSE to object recognition algorithms. According to our results, a minimum speedup ratio of 1.6 can be obtained without difficulty for all the algorithms we have experimented. In order to utilize the SSE features, only minor modifications to the original program are needed. Most importantly, additional hardware is not required. References 1. AltiVec Programming Environments Manual (Motorola, 2001). 2. The Complete Guide to MMX Technology (Intel Corporation, McGraw-Hill, 1997).

AN APPROACH FOR OPTIMIZATION OF IMAGING PARAMETERS FOR GROUND SURFACE INSPECTION USING MACHINE VISION

V. SIVASANKARAN, A. JOTHILINGAM, B. RAJMOHAN AND G.S. KANDASAMI Department of Production Technology, M.I.T, Anna University, E-mail:vsivasankaran@yahoo. com, lingamjothi@yahoo.

Chennai-600044 co. in

The paper presents a method for the adaptive control of imaging parameters in automized machine vision systems. The imaging parameters are focus, zoom, stop and illumination. By this approach, in ground surfaces that show specular reflections or polished surfaces, the imaging parameter, illumination is adjusted optimally without any prior knowledge about the surface characteristics. As a result, the image generated may be free of irrelevant information in the image. An image series is generated and between each image capture only the angle of incidence of the illuminating light is slightly changed. The image series comprises of about 10 to 20 images, so that between each image the position of the illumination has step forward about 20° to 35°. The criterion applied is "edges or contours from shadows and reflections change their location within the image when the illumination parameters are changed. True edges belonging to features really existing on the object's surface do not change their location. The only change that can be observed with true edges is just a change of the contrast with which these contours appear in the image. An algorithm is developed which measures a contour's tendency to change its location, when the position of the illuminating light source is changed. By comparing the images in the series the optimized image is obtained which is free of contours originating from reflections and shadows.

1

Introduction

Many measurement and inspection tasks, which are not controlled so far, can be automized by means of machine vision systems. Solving these problems is a contribution to enhance the stability of many manufacturing processes and the quality of the related products. In cases, where these tasks are already executed by human inspectors using their own human vision, a considerable amount of non-detected errors can be assumed. This is mainly due to the fast decline of concentration going along with such inspection tasks, high inspection rates, subjective decisions of the inspectors and a lack of training. Nevertheless, human inspectors distinguish by their ability to adapt themselves very fast to new situations and inspection tasks, they are equipped with an extraordinary powerful image processing-the brain and have access to a huge knowledge-base trained over decades. 2

Requirements for modern vision systems

At present vision systems are rather inflexible in use, since they are very much restricted to the inspection tasks, which they specially have been designed for. This applies for the type and arrangement of illumination, the choice of cameras and optical components as well as for the software for image processing. Since vision systems do lack of flexibility, from an economic point of view their investment can only be justified, if inspection tasks for batches, which are large in size, will be automized.

871

872

In many areas of manufacturing a huge number of components and parts can be found, which have metal or polished surfaces showing specular reflections. The realization of an automized inspection for this class of objects but machine vision affords much efforts and skill for the optimized adjustment of imaging parameters like focus, zoom and especially of the illumination, which is a crucial parameter for the images quality. Even if the illumination is optimized, there still have much influence on the image, so that in most cases the vision systems are not robust. Besides the aspect of information loss, illumination is also important means to enhance image features like tiny scratches or engravings on polished surfaces, which can only be observed, when they appear with sufficient contrast in the image. This requires an illumination from special directions. 3

The "optimized image"

The methods presented here focus on 2D-inspection tasks using machine vision with top down illumination directly onto metal or polished reflective surfaces. The optimized control of the illumination is one of the key points in this discussion, since this parameter is decisive for the images quality. The developed techniques for image optimization provide an image as a result, which only contains the edges and contours actually found on the object's surface. This image is almost free of edges, which are originating from undesired artifacts like reflections and shadows. Furthermore, the contours in the optimized images are entire and fully embrace the related feature on the surface. The optimized image is achieved by automatically identifying and enhancing true edges and by damping false edges in intensity. It must be emphasized here, that the resulting optimized image is not a grey scale image, but an image containing only contours like in sobel or any first order high pass filtered image. 4

Methods for controlling imaging parameters

Imaging parameters like focus, zoom, stop and illumination can be split into two groups. Parameters like stop or focus can be optimally adjusted by well known standard feedback control loops. This is due to the fact, that the influence of any change in one of these parameters can be predicted at least qualitatively. It is different with respect to variations of the illumination. Here the exact changes in the image cannot be predicted anymore. This specially applies to metal, polished or reflective surfaces with complex shapes. The reason for this is, that much of their behaviour is determined by the micro-topography of their surface, which principally is unknown in detail. For example grinded or turned surfaces locally behave like grids, which have strong reflections when illuminated from very special directions. 5

Active exploration of the scene by analyzing image series

In many cases the scenery first has to be explored by the vision system actively, before an optimized adjustment of the imaging parameters can be achieved. This exploration has to take place without any knowledge about the scenery. In order to study the sceneries

873

behaviour under variations of the illumination, in the proposed vision system a special procedure initiates the automized capture of an image series, where between each image capture only the angle of incidence of the illuminating light is slightly changed. For this, light sources, which are located in a circular arrangement around the object, are successively switched and an image is captured for each light position. The object and the camera stay in the same position all the time. This makes an comparative evaluation of the images much more easier, since no coordinate transformations have to be applied to the images, before combining or comparing them. The generation of image series is of special interest since in many situations it cannot be assumed, that an interesting feature of an object's surface can be viewed entirely and in full contrast in just one image. Also, the analysis of an image series implies an aspect of active learning. Only by investigating the different behaviour of image contours under variations of the illumination, so called true edges can be distinguished from false edges. 6

Generation of optimized images by means of image series

An image series comprises of about 10 to 20 images, so that between each image the position of the illumination has step forward about 20° to 35°. Due to the characteristic behaviour of true edges and false edges under variations of the illumination, the contours in an image can be classified. A general criterion has been formulated: edges or contours from shadows and reflections change their location within the image, when the illumination parameters are changed. True edges, belonging to features really existing on the objects surface, do not change their location. The only change that can be observed with true edges is just a change of the contrast with which these contours appear in the image. Applying these general rules does not require any prior knowledge about details of the underlying scenery like surface characteristics, features etc. 7 Algorithmic implementation of an image optimization by analyzing image series A method is developed, which measures a contours tendency to change its location, when the position of the illuminating light source is changed. For this, the contour images G„, c and G„.i, c of the captured images say Gn(x,y) and Gn_i(x,y) are generated by high pass filtering. In the contour images, due to the variation of the illumination the shadow and reflex-contours change their location. Then the binarized contour-images are multiplied pixel wise so that only those areas are found in the resulting multiplication-image, where a contour pixel Ck,o(n) m Gn,c has overlapped with a contour pixel C;, o(n-i)in Gn_i There appears a large overlap area OV for the contours. Each contour Ck, o „ Gn.i is ascribed a value DOVkj; G(n)> G(n_i) for the so called degree of overlap (DOV), which is calculated by counting the pixel of a contour, which has contributed to an overlap-area within the multiplication image. The contours of true edges gain a high value of DOV. But the contours of shadows and reflections get a low DOV value, since they hardly overlap as a result of their movement.

874

The DOV value of a contour Ck, a© resulting out of the comparison with image Gm is given by DOV (Ck,G,) = E S [ OVk;I (G,(x,y), Gm (x,y) ].50,,. After all contours has been ascribed a DOV, the contours are plotted into a resulting image, where each contour pixel is given DOVkj', G(n), G(n-l) as an intensity value. Repeated application of the procedure to all possible combinations of two images in the series will lead to the completion of the contours of true edges by dominating contours of shadows. The intensity of a pixel in the final optimized resulting image Gres(x,y) id computed by Gres (x,y) = L Z DOV (Ck, G,) 5 (Ck,G(l) -1) The resultant optimized image will be free of contours originating from reflections and shadows. References 1. Pfeifer T and Wiegers L, Adaptive Control for the optimized Adjustment of Imaging Parameters for Surface Inspection using Machine vision. In The Annals of the CIRP Vol. 47/1/1998. 2. Defigueiredo R I P , Illumination control as a means of enhancing image features. In IEEE Trans. On Image Processing, Vol.4, Nr 11. 3. Rafael C Gozalez and Richard E Woods, Digital Image Processing, Pearson Education Asia Pte Ltd. (2000).

MODEL DEVELOPMENT AND BEHAVIOR SIMULATION OF pH-STIMULUSRESPONSIVE HYDROGELS HUA LI, T. Y. NG AND Y. K.YEW Computational MEMS Division, Institute of High Performance Computing, 1 Science Park Road,#01-01,The Capricorn, Singapore Science Park II, Singapore 117528 One of the most intriguing features of pH-stimulus-responsive hydrogels is its ability to perform as an actuator/sensor in BioMEMS devices. A classical example is a closed-loop insulin device constructed by glucose-sensitive swellable hydrogels. This paper presents the development of mathematical models and the behavior simulation of the pH-sensitive hydrogels when immersed in a bathing solution with varying pH. As a preliminary work, one-dimensional models are developed. The diffusion mechanisms of the different species of ions into the fluid-filled region of the polymerbased porous hydrogels from the external bathing solution are described by the Nernst-Planck equations. The Poisson equation describes the variation of the electric potential distribution with diffusing. The mechanical equilibrium equation is used to describe the deformation of skeletal solidphase long-polymer-molecules network of the hydrogels, in which the osmotic pressure generated by the ionic concentration differences between the interior and exterior hydrogel acts as a driving force for deformation. The relation between the concentrations of the fixed charge and hydrogen ions is obtained based on the Langmuir isothermal adsorption. The response of pH-sensitive poly-HEMA (2-hydroxyethyl methacrylate) hydrogel immersed in a NaCl solution with a simple buffer generating various pH environments is simulated. Numerical results are presented for validation with experimental data. They prove the presently developed models to be satisfactory in predicting the swelling/shrinking trend of the hydrogel behavior in various pH environments.

1

Introduction

Based on the Poisson-Nernst-Planck (PNP) equations, the mathematical models including the chemo-electro-mechanical multi-field effects are developed for the first time, known as the Multi-Effect-Coupling (MECpH) model for pH-responsive hydrogel. The presently developed models are able to simulate the behaviors of the responsive hydrogels stimulated by pH. Usually the Nernst-Planck flux system is used to describe the transport mechanisms of ionic species in solution. It is obvious that the Nernst-Planck system is insufficient as it includes only the gradient effects of ionic concentrations and electrical potential. Therefore, a more rigorous model is required to include the variation of the electric potential with the spatial distribution of the electric charges. This requires to couple Poisson equation to form the PNP system. Further, this models couple the mechanical equilibrium equations with the PNP equations for deformation simulation of hydorgels. One of the important contributions of the presently developed MECpH models is to develop a relation between the fixed charge bounded on the long-molecules chain network and the diffusive hydrogen ion for hydrogels stimulated by varying pH of surrounding solution, based on the Langmuir absorption isotherm. The MECpH models for pH-stimuli-responsive hydrogels is able to simulate the concentration distributions of diffusive ionic species, the electric-potential distribution and mechanical deformation of the hydrogels when immersed in the bathing solution with varying pH. In order to validate the MECpH model, a one-dimensional steady-state simulation is conducted numerically by a novel developed meshless technique, Hermite-Cloud method [1, 2]. After compared with experimental results [3-5], it is observed that the present model is accurate and stable.

875

876

2

Presently Developed MECpH Model

If the convective transport of the ionic species is neglected, the Nernst-Planck equation describing flux of ionic species k in solution, is derived based on the mass conservation Jk =-[Dk](grad(ck) + zkFck gratify)/ RT + ck grad(lnyk)) (k=l,2,3,...N) (1) where Jk is the flux of the Mi species , Dk the diffusivity tensor, ck the Mi ionic concentration, zk the Mi-ionic valence number, \ji the electrostatic potential and yk the chemical activity coefficient. F, R and Tare the Faraday's constant, universal gas constant and absolute temperature respectively. The Poisson equation is used to describe the spatial distribution of the electric potential in domain,

V2y/ = -(F/ee0)(lzkck + zfcf)

(2)

where cy is the density of fixed charge group in hydrogels, e relative dielectric constant of the surrounding medium and 8o the vacuum permittivity or dielectric constant. According to the Langmuir absorption isotherm, a relation between the fixed charge and the diffusive hydrogen ion is developed as, zfcf=-(csm0K/H(K + cH)) (3) where csm0 is the xerostate concentration of fixed charge, cH the concentration of hydrogen ion H+ within the hydrogels, K dissociation constant of carboxylic acid groups, H the local hydration of the hydrogel. The mechanical equilibrium equation that describes the hydrogel deformation is written as ffI=((2^1)«J-J!TI(crcJ))I=0 (4) where ck is the concentration of Mi ion species in stress free state and ck the concentration of Mi ion species within hydrogels. 1 and// are the Lame's coefficient of the solid matrix. 3

Meshless Hermite Cloud Method

Based on the classical RKPM [6], the approximation f(x, y) of function/(*,)>) is given as / ( * , y) = | C(x, y, p, q)K(xk -p,yk-q)f(p, q)dpdq (5) where C(x,y,p,q) is correction function, K(x-p,y-q) kernel function that is centered at points (xk ,yk) and constructed by different weighted window functions depending on PDBV problems. A cubic spline function is considered here as K(xk-p,yk -q) = l/(AxAy)*W'[(xk - p)l Ax)]W*[(yk -q)/Ay] (6) where W*(z) is a cubic-spline window function. The C{x,y,p,q) is expressed as a linear combination of independent basis functions. Therefore, the C(x,y,p,q) is expressed as a product of a jS^-order row basis function vector B(p,q) and a jS^-order column coefficient vector C (x,y), C(x,y,p,q)=Yl(p,q)C (x,y), where 1-D linearly independent basis function, B(p) = {b1(p),b2(p),...,bp(p)}

= {\,p,p2} ( £ = 3 )

The correction function coefficient is provided by C\x,y) = A-\xk,yk)BT(x,y)

(7) (8)

where \{xk ,yk) is a symmetric matrix. By employing the point collocation technique for discretization with combination of the Hermite interpolation theorem, a true meshless approximation f(x, y) of the unknown real function f(x,y) is constructed as follows, / ( * . y ) - I N J „ + l(x-NiNnx„)MmGxm n=l

m=\

n=l

+ I ( y - IJV„yn)MmGym m=l

n=\

(9)

877

in which Nn = N„(x,y) = B(pn,qn)A-i(xk,yk)BT(x,y)K(xk

-p„,yk

-qJASJs

as shape functions corresponding to f(x, y) and in similar manner as Mm corresponding to fiX(x,y)

defined -Mm(x,y)

1

and f
The Hermite-Cloud method is able to directly compute the approximate solutions of both the unknown function and its first-order derivatives. Further, the results at discrete points in the domain are much more refined when compared with the classical RKPM. 4

Numerical Results and Discussions

0.01

0.C05 x[m]

(a)

(b)

Figure 1. Comparison of electrical potential in the gel and bathing solution due to applied external electric field betwen (a) stabilized space-time FEM [5] and (b) Hermite Cloud meshless method [1,2].

bathing sojytion

^ hydrogel

(a)

(b)

Figure 2 (a) Geometrical narration of the hydrogel and its bathing solution domain, (b) Comparison between experiment and numerical results for equilibrium swelling of hydrogels as a function of bath pH at 25°C.

4.1

Effect of Externally Applied Electric field

Let us consider a stimuli-responsive hydrogel immersed in a NaCl bath solution with simple buffer (15x15mm2). The concentration of fix-charge groups within the hydrogel is Cf - lOmM. The boundary conditions of Na+ and CV ions in the solution are set to c + = c =lmM. An external electric field is applied with a time-constant electric potential 0.1V next to the anode and -0.1V to the cathode. Figure 1 shows the comparison

878

of numerical simulations of the MECpH model with those of FEM, where the maximum relative error is less than 10%. Curve 1 depicts the linear variation of electrical potential in the solution before the hydrogel is immersed. Curve 2 shows the superposition of both the electrical potentials after immersing the hydrogel without applied electric field and the linear curve 1. Curve 3 represents the simulated electric potential in both the hydrogel and exterior solution when external electric field is applied. 4.2

Equilibrium swelling patterns of pH-stimuli-responsive hydrogel

As shown in Figure 2(a), a cylindrical hydrogel with 400um in diameter at dry-state is immersed in a bathing solution with ionic strength of 300mM, where, as a pH stimulus, the pH of surrounding solution is subjected to step change. The 1-D steady-state stimulation is conducted, in which the computational domain covers only half of the problem domain due to symmetry. Figure 2(b) presents the comparison between the present simulation results shown by solid line and equilibrium experimental data [3-4] of the diameter of cylindrical hydrogel by circular markers. As visualized from the figure, the simulation results were comparable well with the experimental works. 5

Conclusion

This paper develops a chemo-electro-mechanical multi-field coupling (MECpH) model which is able to make qualitative comparison with experimental swelling measurements of poly-HEMA hydrogel. Despite the complexity of the swelling behavior with steps changes in bath solution pH, plenty of physical insight can be obtained by systematically varying the hydrogels and outer solution composition, based on computer-based numerical simulation. References 1. Li Hua, Ng T. Y., Cheng J. Q. and Lam K. Y., Hermite-cloud: a novel true meshless method. Comoutational Mechanics (Submitted). 2. Ng T.Y., Li H., Cheng J.Q., Lam K.Y., A new hybrid meshless-differential order reduction (hM-DOR) method with applications to shape control of smart structures via distributed sensors/actuators, Engineering Structures, (in press). 3. David J. Beebe, Jeffrey S. Moore, Joseph M. Bauer, Qing Yu, Robin H.Liu, Chelladurai Devadoss, Byung-Ho Jo, Functional hydrogel structures for autonomous flow control inside microfluidic channels, Nature 404 (2000) pp.588-590. 4. B. Johnson, D. J. Niedermaier, W. C. Crone, J. Moorthy, D. J. Beebe, Mechanical Properties of a pH Sensitive Hydrogel, Society for Experimental Mechanics, 2002 SEM Annual Conference Proceedings, Milwaukee. (2002) 5. Gulch R. W., J. Holdenried, A. Weible, T. Wallmersperger, B. Kroplin, Polyelectrolyte gels in electric fields. A theoretical and experimental approach, Smart Structures and Materials 2000, Electroactive Polymer Actuators and Devices, Proc. SPIE 3987 (2000) pp. 192-202. 6. Liu W.K., Jun S., Zhang Y.F., Reproducing kernel particle methods. Int. J. Numer. Meth. Engng 20 (1995) 1081-1106.

FRINGE-FIELD AND GROUND PLATE EFFECTS FOR ELECTROSTATIC MEMS SIMULATIONS ANDOJO ONGKODJOJO1 AND FRANCIS E.H. T A Y 1 2 'Micro- & Nano- Systems Cluster, Institute of Materials Research and Engineering, Link, Singapore 117602 E-mail: [email protected] 2

Department

3 Research

of Mechanical Engineering, National University of Singapore, 10 Kent Ridge Crescent, Singapore 119260 E-mail: [email protected]

This paper considers inherent non-idealities such as fringe field and ground plate effects that are currently limiting wide commercialization of certain MEMS devices. In this case, electromechanical derivations including fringing fields tend to significantly change the electrostatic force and the capacitance detected. The analytical derivations are compared with an electromechanical analysis of IntelliSuite™ for verifications, and the generated FEA results are in good agreement with the analytical computation results. In addition, it is clear that there is a fringe field correction factor of 0.046 and a ground plate factor of 0.5 affecting the resulting displacements. Thus, this research work certainly has an important impact on the design of MEMS by solving the difficulties associated with the realistic problem formulations.

1

Introduction

Research and development efforts in MEMS have been directed at design, fabrication and testing of MEMS devices. Nevertheless, analysis and computer simulation of MEMS devices are also active fields of research and development [1]. There are many commercially available MEMS simulation tools. CAD tools based on the finite element method and the boundary element method such as Oyster™, MEMCAD™, CAEMEMS™, SESES™ and IntelliSuite™ have been developing rapidly since the late 1990's [2]. These CAD systems applicable to MEMS are primarily aimed at simulating fabrication processes and the electromechanical behavior of a given design. The determination of dynamic behavior via full 3-D simulation is computationally very expensive and time consuming. To overcome these shortcomings, we have reported the novel backpropagation approximation approach based macromodeling technique [3]. In this paper, we propose analytical modelings of MEMS devices by deriving their mathematical equations for design problems of actual MEMS devices. We focus on more realistic cases such as fringe field and ground plate effects for electrostatic simulations, which are required for the analysis of micro electromechanical systems. The mathematical model and the FEA results are compared for yielding suitable theoretical design considerations for practical measuring structures. This process is applied to our single mass micro-gyroscope as shown in Fig. 1.

879

880

2

Mathematical Modeling of Electrostatic Force and Capacitance

In order to construct a model in the electrostatic energy domain, an analytical model of the capacitance of the system must be developed. The electrostatic forces are found by computing the spatial derivative of the electrostatic co-energy T51: Vibrating Direction Beam

V^P

/

Proof Mass Resonator (Movable Part) z

,r y

Stationary Comb-Drive Figure 1. Schematic diagram of the comb-drive single mass microgyroscope [4]

F, = -

2

V2

(1) where Fe, V and C are the electrostatic force, the applied voltage and the capacitance between the conductors respectively. As the applied voltage is independent of motion, the gradient is applicable to the capacitance only. The electrostatic forces in the 'push' direction (per unit length) are expressed by:

F

V( 2

--i£ '>

(2) where V(t) is the applied voltage for step actuation input voltage (V). The capacitance between the comb fingers in the 'push' direction is given by equation (3), with the assumption that the capacitance due to fringing fields between the interdigitated fingers and a presence of a ground plate can be neglected [6]: c = ( N + 2)eoh(x 0 + x) g (3) where N, Co, Xo, x, h and g are the number of comb-drive fingers (one side), the permittivity of air (F/m), the overlap displacement (initial displacement) in the x-direction (m), the deformed displacement (m), the height of the comb finger (m) and the finger gap (m) respectively. The derivative of the above equation with respect to x is given by: 3 C _ ( N + 2)e0h 9x

g

(4) If we consider the more realistic case where there are fringing field effects and a ground plate beneath the fingers of our device, equations (10), (11), and (12) of [7] have been reported. However, we need to modify and apply the equations for our MEMS device application as equation (12) of [7] is per unit length and per movable finger. In this paper, parameters such as c, d, I, g, Xo, and h are replaced by Wfmgen h, lfmger, g, XQ, and d respectively. The equation has to be multiplied by the number of comb-drive fingers (one

881 side), N and the overlap finger length, x0 as shown in equation (5). The global problem including fixed and movable fingers and their interactions, which is also influenced by the presence of the ground plate, has been considered to obtain the true results for our problem. The global problem has been modeled by placing the magnetic line currents, and more descriptions have been reported [7]. Thus, the corrected electrostatic forces with the ground plate effect are given by: c

_knger+gL

•!Nt0-

2TC

4>, (<|>2 V

^ ^ 2 v(t) 2V

(5) where CQrmger is the width of the comb-drive finger (m), if\ is the potential above the engaged fixed-movable comb finger regions (Volt), expressed by equation (14) of [4], and §2 is the potential above the unengaged fixed comb finger regions (Volt), expressed by equation (15) of [4], The equation is similar to equation (12) of [7] with some modifications as presented before, and is suitable for our MEMS device application, which considers the fringe field and ground plate effects. In addition, the applied input voltage for step input actuation is included explicitly in equation (5). If we adopt equation (2), the corrected electrostatic forces including the fringe field effect (per unit length) will be expressed as: Fe_c= e c

I^LV(t)2 w 2 3x

(6) where dCJdx is the corrected capacitance gradient due to the fringing fields between the interdigitated fingers as shown in equation (7). dx

e

g

(7) where o^, is the fringe field correction factor for our MEMS device application as given by: N g(w finger +g) '2 l a . = N + 2 7ih(x +x) V 2V 0 (8) From equation (7), we can easily obtain the corrected capacitance between the comb fingers due to the fringe field effects. 3

Numerical Results and Discussion

In this section, we discuss our analytical simulation results based on the equations as expressed in the previous section. These results are compared with those for the full 3dimensional coupled simulation using IntelliSuite™ as shown in Table 1. The analytical results are obtained using the Simulated Annealing algorithm, which has been reported in [4]. This table clearly shows good agreement between the analytical simulation results and the FEA results. Fig. 2(a) and (b) show the deformed MEMS structure in the lateral direction generated by IntelliSuite™ without and with the ground plate respectively.

882 Table 1. Comparison of the Maximum Lateral Displacements between the SA-based Analytical Simulation and the FEA Simulation Generated by IntelliSuite™

Structure

No Ground With Ground

SA without Fringe Field Effect (urn) 5.98 x 10'2

SA with Fringe Field Effect (urn) 5.54 x 10 3 2.77 x 10"3

FEA (um) 5.39 x 10"3 2.83 x 10'3

Error (%) 2.78 2.12

Error*' (%) 9.1x10'

This error is a discrepancy between the SA and the FEA without fringe field effect.

It is also clear that a ground plate effect, having a factor of 0.5 has been inserted into equation (5). Thus, the presence of the ground plate weakens the resulting displacement by exactly 50%. For the fringe field effect, the displacement is clearly weakened by roughly 90%. By using equation (8), a fringe field correction factor (a,.) of 0.046 has been easily obtained. The presence of the ground plate without the fringe field effect has been reported [6]. However, the results show the ground plate effect, which weakens the capacitance gradient by roughly 30%, is only for a gap distance of 2 u.m and a structure height of 2 urn These are really different from our design values as reported in [4].

(a)

(b)

Figure 2. (a) The Deformed Microgyroscope without the Ground Plate; and (b) The Deformed Microgyroscope with the Ground Plate generated by IntelliSuite™ [4]

4

Conclusion

This research work certainly has an important impact on design of MEMS by solving the difficulties associated with the practical and realistic problem formulations. We can directly use the mathematical modelings, which consider the non-idealities, for determining dynamic behaviors of actual MEMS devices without using the full 3-D simulation. The results, which have been shown and demonstrated, validate our mathematical modelings; and our mathematical models in turn, are valid for electromechanical and electrostatic simulations. 5

Acknowledgement

This work was supported by the DSO Gyroscope Project (DSO/C/98063/L), Singapore Defence Science Organisation, Singapore.

883

References 1. Ye W. and Mukherjee S., Optimal shape design of three-dimensional MEMS with applications to electrostatic comb drives. Int. J. Numer. Methods Eng. 45 (1999) pp. 175-194. 2. Madou M. Fundamentals of microfabrication (Boca Raton, FL: CRC Press, 1997), pp 375-380. 3. Tay F. E. H., Ongkodjojo A. and Liang Y. C , Backpropagation approximation approach based generation of macromodels for static and dynamic simulations. Microsyst. Technol. 1 (2001) pp. 274-282. 4. Andojo Ongkodjojo and Francis E. H. Tay, Global optimization and design for microelectromechanical systems devices based on simulated annealing. J. Micromech. Microeng. 12 (2002) pp. 878 - 897. 5. Gabbay L. D., Computer aided macromodeling for MEMS (PhD Thesis, Massachusetts Institute of Technology, Cambridge, MA, 1998). 6. Tang W. C. Electrostatic comb-drive for resonant sensor and actuator applications (PhD Thesis, University of California at Berkeley, 1990). 7. Johnson W. A. and Warne L. K., Electrophysics of micromechanical comb actuators. J. Microlectromech. Syst. 4 (1995) pp. 49 - 59.

A COUPLED MULTI-FIELD FORMULATION FOR STIMULI-RESPONSIVE HYDROGEL SUBJECT TO ELECTRIC FIELD

ZHEN YUAN, HUA LI, T. Y. NG AND JUN CHEN Computational MEMS Division, Institute of High Performance Computing, 117528, Singapore E-mail: lihua @ ihpc.a-star. edu. sg Based on a multi-phasic mixture theory and convection-diffusion equations for ion concentrations, a multi-field formulation including the effect of chemo-electro-mechanical coupling is presented to simulate the response behaviors of stimuli-sensitive hydrogel immersed into a bath solution applied to an external electric field. The presently developed mathematical models consist of the continuity equations for the solid, interstitial water and ion phases, the convection-diffusion equations for the distributions of diffusive ion concentrations, Poisson equation for the electric field and the momentum equation for the mixture phase. To solve the multi-field coupling nonlinear governing equations, a hierarchical iteration procedure is conducted and the steady-state response of a hydrogel strip subject to the external electricfieldis numerically simulated by a developed meshless HermiteCloud method. The ionic concentrations, electric potential in both interior and exterior hydrogel and the strip deformation are studied. The simulating results validate the presently developed model.

1

Introduction

Over the past decades, the actuators/sensors based on stimuli-responsive polymer hydrogels have attracted much attention for wide-range biological applications such as artificial muscles and Bio-MEMS. Their reversible volume changes can be induced by external bio-stimuli including pH, temperature and electric field. Usually hydrogels are composed of the solid, interstitial water and ion phases. If an external electric field is applied, the ions flow, osmose and redistribute. This results in the hydrogels swelling, shrinking and bending with the multi-field coupling response. The triphasic mixture theory was early developed by Lai et al. [1] for the swelling and deformation behaviors of articular cartilage. Based on this theory, a new mathematical model is developed with consideration of chemo-electro-mechanical coupling effect, in which the modified continuity equations include the influence of electric potential, and the solid-phase displacement is explicitly computed easily. Further, the present computational domain covers both the hydrogels and surrounding solution. In the developed mathematical model, the continuity equations describe the solid, water and ion phases, the convection-diffusion equations are for the distributions of diffusive ion concentrations, Poisson equation is for the electric field and the momentum equation for the mixture phase. By a novel meshless numerical technique called Hermite-Cloud method [2], the coupled nonlinear governing equations are solved for simulation of responses of stimuliresponsive hydrogels applied to an external electric field, including the distributions of ionic concentrations and electric potential as well as swelling deformation of hydrogels.

2

Developed mathematical models

If the body and the inertial forces are neglected, the governing equations of the multiphasic model are briefly summarized as follows, Momentum equations: V • a = 0 and paV{ia - II" = 0 (or = w,+,-)

884

885

Continuity equations -2— + V(ava) = 0(a = + -) and V - ( 0 V +wvw)=0 (assuming '=0) of

Constitutive equations ff = - P / + 4 s fr(£)/ + 2//,£ w

M =Mo-Rm(c+ + c-)/pT + ^ r Ma=Mg + RT\n[ya(ca)]/Ma + zaFyf/Ma (a = + -) Poisson equation for electric field p

n +n

f i>

For infinitesimal deformation, if assuming 0s +
cF=cg(l + e/tf),

^ = l - ( ^ / ( l + e))

According to the constitutive and momentum equations, the continuity equation of the mixture phase is derived as V ( « s , ) = V - { ( ( ^ ) 2 / / w s ) [ V p - / ? r ( a ) - l ) V ( c + + c _ ) + F c (z + c + + z_c_)V^]} A diffusion analysis is coupled with the Poisson equation to describe the chemical and electric response of hydrogel immersed into a bathing solution. The present diffusion equations for ion concentrations are given as, Ca,,=(.Dacaj + (F/RT)zacaDa^ti)J-(cavi)j + ra(ca) ( a = +,-) In the governing equations above, a is the elastic stress of solid matrix, us displacement, c ion concentrations, y/ electrical potential and p fluid pressure, they are solved by the meshless Hermite-Cloud method. The fixed charge concentration cF and fluid-phase volume fraction (j>w are also computed iteratively. For steady-state analysis, the continuity equations can be rewritten, V • { ( ( f ) 2 / fws )[Vp - RT(.0 - l)V(c+ + c_) + Fc (z+c+ + z_c_ )V ¥]} =0 V • {DaVca)+ F(zaDaCa¥,), /RT=0 where, is osmotic coefficient, Fc Faraday constant, % water volume fraction at reference configuration, / ws frictional coefficient between solid and water phases, e volume strain, D diffusive coefficient, v velocity of fluid phase ,za valence. nf and nh are numbers of mobile and bound species, ju" (ce=w, +, - ) the chemical potentials, As. and Hs Lame coefficients, y+ and y_ activity coefficients of cation and anion, R and T universal gas constant and absolute temperature. To investigate the chemo-electro-mechanical coupling response of stimuli-responsive hydrogels, the numerical simulations are conducted for a hydrogel strip immersed into a bathing solution with two electrodes. As a preliminary work, only 1-D problem is considered in the width direction. A two-level hierarchical iteration procedure is employed by the meshless Hermite-Cloud method [2] to solve the coupled nonlinear governing equations. The inner iterations are for the coupled diffusive concentrations with electric potential. Substituting the resulting concentration distributions and electric potential into the subsequent outer iteration loop, the deformation and pressure are computed.

886

3

Numerical results and conclusions 0.005

I < j(mm) -7.5 solution

_ _ _ •»~ — '"' ' — .. ; — \ ,•" — '•^

hydrogel

+ + + + + + + +

0.006

0.007

0.008

0.008

-•"

0.00065-

15 x<mm

0.0

,.'

0.00080-

«_

•

•

0.00045-

_••'

0.00040-

•"

0.006

•

0.006

0.007

0.O0B

0.000

0.010

X(rr»

Figure 1. A hydrogel strip in a bathing solution applied to an external electric field.

Figure 4 (c). Strain distribution.

Figure 2. Distributions of ionic concentrations (right) and electric potential (left) without external electric field.

ooDO

ocffi

C U M awe

QOIO

ooiz

oow

Figure 3. Distributions of ion concentrations (left) and electric potential (right) with external electric field. O.0O5

0.006

0.005

0.008

0.007

0.006

0 009

0.010

0006

0 006

0 007

OOP.

0O1O

Figure 4. Mechanical behaviors with external electric field: (a) Displacement, (b) Pressure and (c) Strain.

887

For a numerical validation, let us immerse a hydrogel strip into a bathing NaCl solution with 2 electrodes for applying an external electric field, as shown in Figure 1. Three simulating cases are considered here, where the 1-D computational domain is always set at y=Q, also the boundary conditions of the ionic concentrations are ImM at x=0 and 15. In Case 1, c / = 1 0 m M and the external electric field is not considered. The corresponding simulation distributions of diffusive ionic concentrations and electrical potential are shown in Figure 2, where the electroneutrality is observed directly in the bathing solution and also examined in the hydrogel domain after considering c{ = 10 mM. Further, Case 2 takes the external electric field (+0.1V, see Figure 1) and c/ = 10 mM without consideration of mechanical deformation. Corresponding computed distributions of diffusive ionic concentrations and electrical potential are shown in Figure 3. It is seen from the left figure that, in the bathing solution, the ionic concentration increases near cathode side and decreases near anode side. It is also clear according to the right figure that the variation of electric potential of interior hydrogel is smaller when compared with that in the exterior solution, due to the higher conductivity of the hydrogel strip for mobile ions. The present simulating electric-potential distribution agrees well with Wallmersperger's FEM results [3] and is satisfactory if compared with his experimental results [3]. Finally, Case 3 applies the external electric field (±0.1V) and considers the mechanical deformation of the hydrogel strip by taking 2"=293K, /?=8.314J/mol.K,
NUMERICAL SIMULATION OF ELECTROMECHANICAL BEHAVIOR FOR MEMS OPTICAL SWITCH F. WANG, C. LU, Z. S. LIU Computational Mechanics Division, Institute of High Performance Computing, 1 Science Park Road, #01-01 The Capricorn, Singapore Science Park II, Singapore 117528 E-mail: wangf@ ihpc. a-star. edu. sg This paper presents finite element (FE) dynamic and theoretical analysis of a new MEMS optical switch. The switch is composed of a skew plate, one drawing beam, a mirror and a substrate. The plate is restrained translationally at one end. The drawing beam provides additional translational restrain to the plate. The mirror is mounted on the plate. Two identical bending beams are inserted inside the plate to adjust the bending rigidity of the switch. The switch is actuated by electrostatic attraction applied on the skew plate. Finite element dynamic simulation is performed to predict the mechanical behaviours of the switch. The minimum natural frequencies and the corresponding mode shapes, maximum stress distributions, dynamic responses under different levels of electrostatic attraction loads are derived. A theoretical dynamic model is further applied to validate the finite element simulation results. Good agreement is found between the finite element simulation and the analytical results with regard to the pull-in voltages, and the time to pull in at different voltages. The current study is part of the research work for development of a novel MEMS optical switch with optimal electromechanical characteristics. The new MEMS optical switch presented in this paper is the second version of the novel switch. Design and experiments are carried out with the help of numerical simulation tools.

1. Introduction In a joint research program by School of Electrical & Electronic Engineering, Nanyang Technological University, Singapore and Division of Computational Mechanics, Institute of High Performance Computing, Singapore, novel MEMS fiber-optical switches are developed. Design and experiments are carried out with the help of finite element analysis and theoretical study. Scenario for the numerical study of the primitive designs of the MEMS fiber-optical switches is seen in Reference [1,2]. According to the numerical analysis for the primitive design described in [1,2], the minimum driving voltages are higher than 250 V and the switching times are longer than 170 [xs. Experiments are then carried out for the primitive designs. It fails to derive any useful data because of the high driving voltages. On the other hand, FE parametric study of the primitive designs reveals that the dimensions of the bending beams play a critical role in determining the driving voltages and the switching times. Slender bending beams yield lower driving voltages and shorter switching times. Based on these numerical results, a final design for the switch is derived with the footprint of 1 x 0,6 mm and skew angel of 3.7°. The number of the bending beams is reduced from 3 to 2 and the dimensions are also marginally reduced. Efforts of the current study focus on FE simulation and theoretical analysis of this revised or final design.

2.

Finite element analysis

Figure 1 gives the schematic diagram for the revised or the final design of the switch. It is composed of a skew plate (1 x 0.6 x 0.0021 mm), two bending beams (0.04 x 0.01 x 0.002 mm), a mirror and a substrate. The plate is restrained translationally at one

888

889

end. The drawing beams provide additional restrain to the plate. The mirror is mounted on the plate.

Figure 1. Schematic Diagram of the Final Design of the Fibre-Optical Switch

In the FE Dynamic analysis, the switch structure is modelled as that shown in Figure 2, in which the boundary conditions are also illustrated. For the restraining condition of the drawing beams to the plate, it is assumed that the translations of the edge of the plate connected to the drawing beam are restrained, while their rotations are independent. When the plate moves to the substrate, impact/contact between the plate and the substrate takes place. To simulate this phenomenon of impact/contact, the contacts between the plate and substrate are introduced. The surface to surface contact algorithm between each part is established.

Figure 2. Finite Element Mesh for the Switch

The material for the whole structure is polysilicon except that a 0.5 (im thick gold layer is directly deposited on the mirror frame, which is composed of 1.5 u.m thick polysilicon. Properties for these two materials are shown in Table 1. The plate and the substrate are charged with opposite static electricity, so that the plate is driven by the distributed electrostatic attraction. The magnitude of the attraction is the inverse to the square vertical distance of specific loading point to the substrate, which has the following expression [3], v &.&5xW-nxVolt2

, (,

'~

(1)

2y(X,,f

where x represents the longitudinal variable, t is time variable. y(x,t) is the vertical distance of the loading point to the substrate, which varies with the position of the loading point and time. In the present dynamic analysis, user-subroutine is used to apply the

890 variable distribution load. The switching on and off times refer to the time duration that the plate moves from the original position (plate has a skew angle of 3.7° with the substrate) to touch the substrate and vice versa. The derived pull-in voltage is 8.27 V and the corresponding pull-in time is 38 ms. Table 1. Material Properties. Material

Polysilicon

Gold

Young's modulus (MPa)

150,000

77,000

Poisson's ratio

0.3

0.3

Density (Kg/ m3)

2,340

19,300

It is seen from Table 2 and Figure 3 that the switching on times vary with the switching voltages, where a larger switching voltage yields shorter switching on time. On the other hand, marginally different voltages have insignificant effects on the switching off times (see Table 2). The switch off voltages from the FE analysis is about 3.7 V regardless of the different switching voltage. The rise-up times of the switch under different switching voltages fall in a narrow range of 39-41.2 ms. The reason of the almost constant rise-up times is that switch-off speed depends on the restoring force only. It may vary due to the capacitance resistance (CR) time constant of the electrostatic actuator. In the present study, the CR time constant is not taken into account. FE mode analysis is also performed to derive the natural frequencies and the corresponding mode shapes. The derived first ten natural frequencies are tabulated in Table 3, and the first three mode shapes are shown in Figure 4. Voltage=8.387V

Voltage=8.8V

Tlme(ms)

Time(ms)

0 •5

•10

I -16

-Theoretical study | "FE analysis I

1

-20

j

S 2

FE analysis

- Theoretical study

S -25 -30

a

L _

_J

-35 -40 -45

Voltage=7.6V

Voltage=8.359V

T)me(ms) 150 /

/

E 3. -is

1 -*H J -25

-Theoretteal study ~- FE analyst

Theoretical study ™***- FE simulation

Figure 3. Displacement at Mid-point of Mirror Versus Time Calculated from FE Simulation and Theoretical Study.

Tlme(ms)

891 Table 2. Switching On and Off Times.

Switching Voltage (V) Switching Times (ms) FE Simulation Results

On 60.03

Off 41.00

8.387 On Off 43.68 39.79

On 9.51

Off 39.79

Theoretical Results

57.00

NA*

45.18

NA*

11.58

NA*

Percentage Errors

5.32%

NA*

-3.32%

NA*

-17.88%

NA*

8.359

8.8

Note: NA* —The data is not available. Table 3. The First Ten Natural Frequencies for the Switch Structure from FE Simulation.

Natural frequencies (Hz) 1st 1,425

nd

2

1,898

3rd

4th

5th

6*

7th

gu,

9*

10th

2,581

6,310

8,100

9,168

9,485

13,752

13,982

16,201

1st mode shape

Figure 4. The First Three Mode Shapes Derived from FE Simulation.

3.

Theoretical Analysis

To verify the FE dynamic simulation results, an analytical beam model is derived. The formulation of the analytical beam model is based on the observations that the structure, the boundary and loading conditions of the switch are symmetric about the central line of the plate and only vertical or transverse loads are involved. Therefore, the switch is firstly simplified as a Bernoulli-Euler beam assembly. Furthermore, the beam model is simplified as a cantilever beam according to FE simulation results. Observation on the dynamic response reveals that the restraining of the drawing beam on the plate is rather rigid so that the displacement of the near end of the bending beam is almost zero even under a high level of attraction load. Therefore, the beam model can be further simplified as cantilever beam clamped at the near end of the bending beam. In solving the analytical model, Rayleigh-Ritz method is used to calculate the natural frequencies and mode shapes. Dynamic response is derived using the mode-superposition method. Computer codes in FORTRAN is developed to solve the equations involved in the derivation of the coefficients. The simplified Bernoulli-Euler beam model consists of

892

a number of beam segments where the properties of each segment match the transverse properties of the original structure. In particular, the transverse properties of the mirror are combined into a specific segment of the beam. The dynamic response of the beam under electrostatic attraction load is then derived with the mode-superposition method. In the present study, only the first three normal modes are used for calculation of the displacement. FORTRAN computer codes are developed to implement the calculation. The derived displacement versus time curves under different levels of attraction for the far mid-point of the mirror are shown in Figure 3. The switching on times at different electrostatic voltages are tabulated in Table 2. Percentage errors of switching on times between the FE model and analytical model are also listed in Table 2. The analytically derived pull-in voltage is 8.18 V. 4.

Discussions and Conclusions

Comparison of the results from the analytical and FE dynamic model shows that the switching on times at different voltages (see Figure 3 and Table 3) are close to each other. The percentage errors of switching on times between the FE simulation and theoretical study at different switching voltages of 8.359 V, 8.387 V and 8.8 V are 5.32% and 3.32% and -17.88% of the analytical derivations. It is seen that the percentage error of the switching on time between the FE simulation and the analytical results is a little bit high for the switching voltage of 8.8 V. It is because the absolute value of the switching on time at switching voltage of 8.8 V is very small compared with the values at voltages of 8.359 V and 8.387 V (see Table 2). Still it can be stated that the FE simulation and the analytical results agree with each other. Especially, very good agreement is found between the pull-in voltages. The FE simulation result is 8.27 V, while the analytical calculation is 8.18 V, which yields a percentage error of 1.1% of the analytical result. The switching off voltages from the FE analysis is about 3.7 V regardless of the different switching on voltages. Therefore, conclusions can be drawn that a dynamic analytical and FE models for novel MEMS fibre-optical switch have been successfully established for the MEMS fibreoptical switch. Very useful results are obtained from FE simulation and theoretical study with regard to the mechanical properties of the switch, such as the natural frequencies, the corresponding mode shapes, the pull-in, lift-up voltages, the switching on and off times, and the maximum stress distributions. References 1. F. Wang, C. Lu, Z. S. Liu, A. Q. Liu, X. M. Zhang, "Modeling of Optical Mirror and Electromechanical Behavior," Proceeding of SPIE, Vol 4582, APOC 2001, Beijing, China, pp. 95-105, 2001. 2. F. Wang, C. Lu, Z. S. Liu, J. Li, A. Q. Liu and X. M. Zhang, "Finite element simulation and theoretical analysis of fiber-optical switches," Sensors and Actuators A Physical, Vol. 96, pp. 167-178, 2002. 3. G. C. Wetsel, Jr. and K. J. Strozewski, "Dynamical model of microscale electromechanical spatial light modulator," Journal of Applied Physics, Vol. 73, No. 11, pp. 7120-7124, 1993.

DESIGN AND MODELLING HIGH-EFFICIENCY ACCELEROMETERS A. T. NG, W. H. LI, H. DU AND N. Q. GUO School of Mechanical

& Production Engineering, Nanyang Technological 50 Nanyang Avenue, Singapore 639798 E-mail:mwhli@ntu. edu. sg

University

The paper presents design and modelling of a new uniaxial silicon-based micro-accelerometer. This design utilizes the piezoresistive sensing concept to detect the mechanical strain induced by the acceleration of the proof mass. Unlike the conventional common cantilevered proof mass type of design that relies solely on bending strains, this design adopts the combination of both bending and axial deformation. Basically, the deformation of the bending structure rotates and displaces the proof mass, which consequently induces axial strains that will be sensed by the piezoresistors. The level of sensitivity and natural frequency can be controlled and adjusted to suit different requirements depending on the application intended. During the design and analysis stage, theoretical models were constructed to predict the behaviour and performance of the new design. Subsequently, finite element simulations were carried out to verify the predictions from theoretical models, including sensitivity and natural frequency. It was observed that the sensitivity of the proposed new design was significantly higher when compared with other commercial accelerometers.

1

Introduction

Micro-machined accelerometers, based on a variety of working principles had been developed over the years, striving for improvement in performance. Regardless of the operating principle, all micro-machined accelerometers (and in fact the conventional accelerometers) require a transduction mechanism to transform a mechanical input (such as displacement, stress and strains) induced by the applied acceleration into a measurable electrical signal [1]. The common micro-machined devices include those utilizing piezoresistive sensing, piezoelectric sensing, capacitive sensing, thermal sensing and etc. [2]. Each design has its set of unique characteristics that depicts their advantages and disadvantages. Comparing among the different modes of sensing, the piezoresistive type can be constructed easily using the wafer fabrication techniques. Piezoresistance of a material is the fractional change in bulk resistivity induced by small mechanical stresses applied to the material. The piezoresistive effect can be normally measured with a simple Wheatstone bridge with very minimum signal conditioning [3]. In this paper, a new piezoresistive micro-accelerometer design is proposed and developed. The behavior and performance of the new design are theoretically analyzed and evaluated with FEM modeling.

893

894

2

Conceptual Design

2.1 Working Principle of conventional accelerometers Accelerometers are typically specified by their sensitivity, maximum operation range, frequency response, resolution, full-scale nonlinearity, and cross-axis [2]. The most common piezoresistive micro-accelerometer design utilizes the cantilever concept. In this design, a beam with a seismic mass (proof mass) attached to one end is cantilevered onto the supporting frame of the housing for the sensor. When an external acceleration is applied to the supporting frame, it moves in relative to the proof mass due to the latter's inertial effect. This causes the cantilevered beam to deflect under the inertial force, which induces bending stresses / strains on the beam. In micro-machined accelerometers, the inertial forces that are detected as the measurement of acceleration are usually very small because of their tiny masses. Consequently, the value of induced strain values will also be limited, affecting ultimately the sensitivity and resolution of the sensor. For the conventional cantilever beam design, increasing the proof mass or reducing the stiffness of the beam allow a larger displacement under the same applied acceleration. Although such remedies increase the inertial force required for improving sensitivity, they inevitably result in lower natural frequency of the sensor. Moreover, it might cause the overall structure to be less robust and reliable. The useful bandwidth of the microaccelerometer will be narrowed, hence yielding a poorer frequency response. 2.2 New Design The new design takes advantage of the fact that structure experiencing bending deformation yields a larger displacement with low strain/displacement ratio, while structure experiencing axial deformation produces high strain/displacement ratio but limited range of displacement values. Ideally, these two modes of deformation can be integrated into a combined system to achieve high strain values, thereby increasing the sensitivity. The main idea in the integration is to obtain an initial large displacement from the bending structure, and translating this displacement into an amplified, focused axial strain through the axial structure. Figure 1 (a) and (c) show the proposed new design before and after deformation, respectively. The new design encompasses basically four main

895

components, namely the bending element, proof masses, sensing elements, and the supporting frame.

lement

(b) Section A -A

(c)

Acceleration

Figure 1. Schematic of the proposed microaccelerometer before and after deformation.

3

Mathematical Modeling

With reference to Figure 1, the inertial force exerted by the proof mass can be represented as F. the bending element itself is also experiencing a distributed inertial force, Q. Fs represents the axial spring forces from the sensing elements. The moment acting at the upper end of the element is M. The rotation angle of the bending element is 0. The governing equation for the free body at y = 0 and the approximated axial strain are given by: FL; EIb6 = MLb + ». + M (1)

9 Jk e=- 21 4

(2)

Analytical Results based on FEM

As listed in Table 1, four models of the new design, model 1 to model 4, are devised for the analytical purpose. Each model differs from one another in terms of the width of the bending elements, the size of the proof masses, and the gap between each pair of sensing elements, i.e. Hg. By manipulating these parameters, different sensitivity values and their corresponding frequencies are achieved.

896 Table 1. Dimensions the proposed models Models Dimension (mm) H„

L, Hb U Hs L, H„ T„

Tb Ts

1

2

1.5 2.0 0.12 0.08 0.003 0.01 0.16 0.35 0.03 0.03

1.5 2.0 0.2 0.08 0.003 0.01 0.25 0.35 0.03 0.03

3 1.5 2.0 0.34 0.08 0.008 0.01 0.46 0.35 0.03 0.03

4 3.0 4.0 0.4 0.16 0.006 0.02 0.5 0.35 0.03 0.03

The performance comparison in terms of sensitivity and natural frequency is summarized in Table 2. It can be seen that the new designs surpass the commercial products significantly with respect to their sensitivity levels at the corresponding nature frequency levels. Table 2. Dimensions the proposed models

5

Models

Sensitivity (mV/g)

Natural frequency (kHz)

New design 1

6.49

16.1

New design 2

3.37

26.9

New design 3

0.69

56

New design 4

7.58

12.8

Entran EGAX-2500

0.1

6

E ndevco 7264B-2000

(1.25

28

Conclusions

In this paper, a new uniaxial micro-accelerometer was proposed and designed. The analytical results demonstrate that the new design is capable of achieving both high sensitivity and natural frequency, as compared with the contemporary commercialized products' performances. References 1. Maluf N., An introduction to micro-electro-mechanical systems engineering (Artech House, Boston, 2000). 2. Navid Y., Farrokh A. and Khalil, N., Micromachined inertial sensor. Proceeding of the IEEE, 86 (1998), pp. 1640-1659. 3. Chen H., Shen S. Q. and Bao, M. H., Over-range capacity of a piezoresistive microaccelerometer. Sensor Actuators A: Phys 58 (1997) pp. 197-201.

A FINITE ELEMENT ANALYSIS FOR PIEZOELECTRIC SMART PLATES INCLUDING PEEL STRESSES QUANTIAN LUO AND LIYONG TONG School of Aerospace, Mechanical and Metrachonic Engineering, University of Sydney NSW 2006 Australia E-mail: [email protected] This paper presents a novel finite element analysis (FEA) formulation for piezoelectric (PZT) smart plates taking into consideration peel stresses. To model shear and peel stresses at the interface between the PZT patches and the parent plate structure, a finite thickness adhesive layer with a lower elastic and shear moduli, as compared with those of the PZT and host structure, is considered. This layer is modeled as a continuous spring system with a constant shear and peel stiffness. It is then sandwiched between two collocated 4-node Reissner-Mindlin plate elements to form laminatedelements for composite plates. This FEA framework can consider independent rotational angles, and is applicable to thin or moderately thick plates with debonded PZT actuators and sensors. Numerical results are presented to validate the present formulation.

1

Introduction

Crawley and de Luis ' developed an analytical model for smart structures, in which, a finite thickness adhesive layer was assumed to experience pure shear strain. This classic shear lag model has been widely used for piezoelectric smart structures and bimorph applications. In our recent studies |21[31, it has been shown that peel stresses in adhesive layer can significantly affect the mechanical behavior and dynamic response. This is especially true for flexible smart structures and in the presence of debondings. To date, analytical solutions for smart structures are very limited or too complicated for practical applications. Finite element analysis has been widely used in the area of smart structures as it effectively deals with complicated geometric shapes, loadings and boundary conditions. Allik and Hughes [4] presented a finite element method for piezoelectric or electroelastic structures. They derived the FEA formulation by applying a variational principle to the virtual work density of mechanical strains and electric fluxes, obtaining the following dynamic equations for a piezoelectric element: [m]{ Ui}+ VUM+UcM^m+ifsHVp) [k
1 J

(1)

In equation (1), [m], [kuu], [ku(^ and [k^ are defined as the kinematically consistent mass matrix, structural, coupling piezoelectric and dielectric stiffness matrices respectively. \fB], [fs] and {//»} are the body force, surface force and concentrated force vectors respectively. {qB}, {qs) and {} are the body charge, surface charge and point charge vectors respectively. {«,} and {$} are the displacement and electric potential vectors respectively. Once the nodal values of the displacement and electric potential for a PZT element have been found, the stresses and electric flux density at any point in the element are given by: {T} = [c][Bu]{u,} + [e][B^]{(/,i}

{D} = [e]T[Bu]{u,}-{JtHBJ{6}

"I

\

897

(2)

898

where, {T} and {D} are stress and electric displacement vectors; [Bu] and [B^ are strain and electric field matrices; [c], [e] and [$ are elastic, coupling piezoelectric and dielectric matrices respectively. In smart structures, PZT patches are normally bonded to or embedded in the host structure to implement self-monitoring and controlling functions. Therefore, the global mass and stiffness matrices of smart structures are comprised of PZT patch structural element matrices, coupling piezoelectric and dielectric stiffness matrices, and the normal host structural element matrices. PZT patches in smart plates can be used as actuators and sensors. This paper presents actuated performance of PZT patches in smart plates only. 2

A FEA framework for smart plates including peel stress

To implement FEA formulation for a host plate with the bonded PZT patches, we use 4-node Reissner-Mindlin plate elements for the host plate only and the pseudo-elements derived by Tong and Sun [5] in the area with the bonded PZT patches. The 4-node element is based on the first order Reinssner-Mindlin theory. Displacement fields of the local element are: U{x, y, z) = u0(x, y) + zOy (x, y) -s V(x,y,z) = v0(x,y)-ZeAx,y) L W(x, y, z) = w(x, y) J

(3)

where, u0, v0 and w are the translational displacements in the mid-plane; 6y and 0X are the rotations about coordinate axes y and x respectively. The stiffness matrix of the pseudoelement is: -\ kp + k„u ka]2 [*»] = kan kh + ka22

r [ka] =

kail

K12

ka21

ka22

(4)

where, [ka] is a stiffness matrix of the adhesive element derived on the basis of the adhesive model developed by Goland and Reinssne^ and shape functions defined in the 4-node plate element. By assembling conventional plate elements and the laminated-elements, the structural dynamic equations can be obtained. When the thin PZT actuators are used in plate or shell structures, the electric field is only poled in the direction perpendicular to the structural plane. Actuated equations can then be expressed in the form of: [M]{d}+[C]{d}+[K]{d}={Fv}+{Fp} (5) where, [Af], [C] and [K] are structural mass, damping and stiffness matrices; {d} is global nodal displacements; {Fv} and {FP} are electric force and loading matrices respectively. 3

Numerical results

By implementing the present FEA framework for smart plates, two examples of a static analysis are presented here.

899

Example J Verification of the present FEA for PZT smart beams For a smart beam shown in Figure 1, a PZT actuator is bonded a distance of 20 mm from the clamped end of the 0.24 m long host beam whose thickness is 10 mm. The PZT actuator has a thickness of 1 mm and a length of 10 mm. The adhesive layer is 0.1 mm thick. The elastic moduli of the PZT actuator, host beam and adhesive layer are: Ep = Eh = 70 GPa, Ea = 3 GPa respectively, with the adhesive shear module, Ga = 1.07 GPa. The coupling PZT constant is e31 = -5.2 Nl(m v) and the applied voltage is assumed to be -100 (v).

Figure 1 A cantilever beam with the bonded PZT patch To verify the present FEA for smart plates, we set e32 be zero and the plate width as 2 cm, and thus its deformation is equivalent to that of a smart beam whose exact solution was obtained [2] I31. Four equal elements are used along the width direction. Small elements of 1 mm long are used near the PZT edges along x direction, whereas large element of 1 and 2 cm long are utilised in the rest. Implementing the FEA program for this PZT smart plate, we find that the non-dimensional tip displacements are: «„ = u/h = 7.92xl0"3, w„ = win = 1.09, and 8y = -5.20xl0"6, whose errors are all less than 3%; therefore, the accuracy of the present FEA for smart plates is validated. Example 2 comparison with higher order theory For a smart pate shown in Figure 2, two PZT patches with the thickness of 0.5 mm are bonded to a 0.2x0.2 m2 and 0.5 mm thick host plate. The other material are: Ep - 76 Gpa, vp = 0.36, Eh = 72.4 Gpa, vh = 0.33 respectively. The coupling PZT constants for actuator 1 and actuator 2 are: e3I' = e32' = -15.56 N/(m v), e3I2 = e322 = -17.58 N/(m v) respectively. The same adhesive as that used in example 1 is used here. 4

I Y (cm)

Actuator 1 is 2 cm to the clamped end and 3.5 cm to the top edge. Actuator 2 is 2 and 3.5 cm to the left and bottom edges.

/ / / /

Al

P. (15,113.5)

A2

P2 (15,13.5)

A size of both actuators is 3x4 cm2. >

x (cm)

Figure 2 A thin plate with the distributed PZTs When a voltage of-100 (v) is applied to actuators 1 and 2, the deflections at points P! and P2 are: wPi = 0.125 mm and wP2 = 0.123 mm respectively. The deflections computed

900

by the FEA based on a higher order theory (HOT) |71 [81 are 0.156 and 0.152 mm respectively, which are 24.8% and 23.8% higher than the present results. In the exact static solutions[31 to smart beams, we showed that the difference between the shear stress model and the present model might be up to 20% for the thin host beam. It can be seen that the deflection difference between the present FEA and the FEA based on HOT is also in this range, as the adhesive layer and the peel stresses are not modelled in the HOT. 4

Discussion and conclusion

This paper presents a new finite element formulation for piezoelectric plates, in which, an adhesive layer sandwiched between the host structure and the piezoelectric patch is assumed to transfer both constant shear and peel strain along the thickness direction. The numerical results show that the present FEA is effective for analyzing smart plates, also, that peel effects on PZT smart plates may be significant for flexible plates and debonding analysis. Acknowledgements The authors are grateful to the support of the Australian Research Council through a Large Grant Scheme (Grant No. A10009074). References 1. Crawley E. F. and de Luis J., "Use of Piezoelectric Actuators as Elements of Intelligent Structures", AIAA Journal, Vol. 25, No. 10, 1987, pp. 1373-1385. 2. Luo Q. and Tong L., "Exact Static Solutions to Piezoelectric Smart Beams Including Peel Stresses, Part I: Theoretical Formulation", International Journal of Solids and Structures, Vol. 39, No.18, 2002, pp.4677-4695. 3. Luo Q. and Tong L., "Exact Static Solutions to Piezoelectric Smart Beams Including Peel Stresses, Part II: Numerical Results, Comparison and Discussion", International Journal of Solids and Structures, Vol. 39, No.18, 2002, pp.4697-4722. 4. Allik H. and Hughes T. J. R., "Finite Element Method for Piezoelectric Vibration", International Journal For Numerical Methods in Engineering, Vol.2, 1970, pp.151157. 5. Tong L. and Sun X., "Stresses in Bonded Repair to Cylindrical Curved Shell Structures", Research Report, Department of Aeronautical Engineering, The University of Sydney, 2000. 6. Goland M. and Reissner E., "The Stresses in Cemented Joints", Journal of Applied Mechanics, March 1944, A-17 - A-27. 7. Chee C. Y. K., Tong L. and Steven G. P., "A mixed model for beams with piezoelectric actuators and sensors", Smart materials and Structures, Vol. 8, 1999, pp.417-432. 8. Nguyen Q. and Tong L., "Shape Control of Smart Composite Plate Structures with Non-rectangular Shaped PZT Actuators", Proceedings of the Third Australian Congress on Applied Mechanics, 2002, pp.421-426.

A STUDY OF THREE-DIMENSIONAL MESH GENERATION FOR COAL MINING MODELLING S.G. CHEN, S. CRAIG, D.P. ADHIKARY AND H. GUO CSIRO, Po Box 883, Kenmore QLD 4069, Email: [email protected]

Australia

An automatic mesh generator is developed for coal mining modelling. The mesh generator is incorporated into the preprocessor that is an accompanying part of a finite element package COSFLOW. An example is given showing that, in most cases, the mesh generator meets the requirements for coal mining modelling.

1

Introduction

In finite element modelling, the problem domain is discretised into a mesh, possibly consisting of more than one type of element, e.g. segments, triangles, quadrilaterals, tetrahedral, pentahedra and hexahedra [1, 2]. These elements must be connected, cover the domain and not overlap. This paper reports on a user-friendly automatic mesh generator used for underground coal mining applications. The mesh generator is incorporated into the preprocessor developed for the finite element program COSFLOW [3]. COSFLOW is used to simulate deformation and two phase fluid flow in rock. An example is presented to show the application of the mesh generator. 2

Mesh generation

2.1

Mesh requirement for COSFLOW

Coal forms in seams in sedimentary, layered, relatively soft rock. When surface mining is uneconomic, coal is mined by longwall methods where coal is extracted and roof rocks are allowed to cave behind the supported mining face. The selected regions for extraction are called panels (rectangular in plan) separated by pillars that are not mined. To gain access for machinery and transport of mined coal, roadways are first driven in the coal near the outer perimeter of the panels. The finite element model COSFLOW is used to simulate the rockmass deformations, stress changes and the flow of water and gas. The mesh used for COSFLOW modeling needs to be aligned with the boundaries between rock layers so that appropriate material properties can be given to the elements and it also needs to be aligned with the coal seam to be

901

902

extracted in each notional step so that the appropriate elements can be removed to simulate mining. Thus the domain of interest is divided into a number of subzones, each of which must be meshed, with the meshes of neighbouring subzones matching on their boundaries. The design of the numerical mesh is a compromise between accuracy and solution time. The accuracy of an analysis is usually greater with smaller elements, but, especially in three dimensions, the number of elements must be limited to enable feasible computer run times that may still be in the order of hours or days. Finer meshes are required near the excavation where the gradients of displacement and the pore pressure etc. are greatest, while the mesh at some distance from the excavation may be coarser. For this reason, a graded mesh may be required within each subzone. COSFLOW uses quadrilaterals elements for two-dimensional analyses and hexahedra elements for three-dimensional problems. 2.2

Meshing in a subzone

For many applications in underground coal mining, the subzones are hexahedral. In this case, the meshing in a subzone consists of three steps. Firstly, the subzone with six faces is converted to a cube with side length of 2 in the manner similar to the concept of transforming a hexahedral element to a cube element used in the finite element method. The local coordinate origin of the cube is located at the center of the cube and thus x, y and z-coordinate in the cube ranges from - 1 to +1. Secondly, by giving seed numbers and ratios, in x-, y- and z-axes, the mesh is generated in the cube and the node coordinates are calculated. The nodes and elements are numbered sequentially along the x-axis first, then the y-axis and finally the z-axis. Thirdly, the node coordinate in the cube is transformed to the subzone according to the function: x = A, + a2x0 + a3y0 + a4z0 + a5x0y0 + a6y0z0 + anzax0 + agx0y0z0 y = bx +b2x0+biy0

+b4z0+b5x0y0+b6y0z0

+b7z0x0+b&x0y0z0

Z = cx + c2x0 + C^Q + c4z0 + c5xQy0 + c6y0z0 + CTZQXQ + csx0y0z0 where the subscript 0 refers to the coordinates in the cube. Substituting the coordinates of the eight vertexes in the subzone (x, y, z) and the cube (x0, yo, Zo) in the above equation and solving them, the parameters (a„ £,, c„ / = 1 ~ 8) could be determined.

903

23

Mesh connection

The meshes generated in the subzones are independent from each other and need to be connected together to form a complete mesh. To do this, the nodes are merged at the interfaces. This is done by including the nodes of the previous subzones on common boundaries in the list of nodes for the current subzone. The node transferring from the previous subzone to the current subzone is closely related to the sequence of numbering of the nodes and the elements in the subzones. 3

A roadway example

A typical example using this mesh generator is illustrated in Figure 1, for simulating the gas emission during the roadway excavation. The model involves seven strata including a coal seam of 3 m thickness (the black layer between the red and green layers in the model). The domain of interest has a size of 250, 100 and 90m in x, y and z directions, respectively. The roadway with the cross section of 5.2 m in- width and 3.0 m in height is constructed in the coal seam advancing along the x-axis. The generated mesh consisting of 5200 elements and 6237 nodes is also shown in the figure. Typical longwall meshes are much larger, but constructed in the same way.

Figure 1, A three-dimensional mesh of a coal mine roadway generated by the mesh generator.

904

The model is divided into 1, 3, 7 segments in x, y and z directions, respectively, and thus a total of 21 subzones. Only one segment is given in x-direction because no grading is required as the roadway advances at a constant rate of about 40m per day. From the figure, it can be seen that the interfaces between different strata are not horizontal and slightly oblique to the horizontal plane. As the roadway is almost located in the middle of the model in the y-z plane, the mesh is finer at the area around the roadway and coarser further from the roadway in both the y and z directions. 4

Discussion and conclusions

The mesh generator is specially developed for coal mining modelling and is incorporated into the COSFLOW preprocessor. An example using the mesh generator indicates that models with oblique strata interfaces or excavations could be generated. The restriction of the mesh generation is that the subzones must have six faces and the nodes at the interfaces of adjacent subzones must be matched. This may produce a mesh with poor quality when the area of one face is significantly different from that of its opposite face in the subzone, as the two faces must have the same number of elements with this mapped meshing approach. More general meshing schemes [2] could be incorporated, but are rarely required for underground coal mining applications. 5

Acknowledgements

The authors would like to thank NEDO, JCOAL and CSIRO for providing the funds for conducting this research work. The authors also wish to express their thanks to Dr Baotang Shen and Mr Brett Poulsen for their comments on the paper. Reference 1. Pande, G.N., Beer, G. & Williams, J.R., Numerical methods in rock mechanics. Wiley, New York, 1990, pp223. 2. George P.L., Automatic mesh generation, application to finite element methods. John Willey & sons, 1991,pp333. 3. Chen S.G., Craig S., Guo H. and Adhikary D.P., A pre/post processor for finite element modeling for coal mining applications. Proceedings of IC-SEC 2002, Singapore, December 2002.

LINEAR AND TORSION SPRING ANALOGIES FOR DYNAMIC UNSTRUCTURED MESHES IN FLUID STRUCTURE INTERACTION PROBLEMS - A COMPARATIVE STUDY R. AJAYKUMAR t , N.M.SUDHARSAN*, K.MURALf, K. KUMAR + AND B.C.KHOO* t

Institute of High Performance Computing, Singapore - 117528, *Singapore-MIT E-mail: ajay@ ihpc.a-star. edu.sg

Alliance

Dynamic mesh adaptation techniques are an important aspect to be considered in fluid structure interaction (FSI) problems. In these problems, the computational mesh has to be adapted at every time step to the new boundary position dictated by the structural response. In most cases, this adaptation can be achieved by moving the mesh points rigidly in response to the motion of the structure. However, this approach is no longer applicable for large structural displacements. The same holds if the outer boundaries of the mesh are fixed multiblock boundaries or if more complex deformations are considered. To tackle such problems, efficient grid regeneration and/or grid deformation techniques are required. One such approach is the linear spring analogy. In this procedure, the dynamic unstructured grids are usually represented by a network of fictitious linear springs. However this procedure fails for very large displacements. Two other methods to overcome these difficulties are implemented and compared in this paper. They are, namely: the "Modified Linear spring analogy" and "Torsional spring analogy". Their advantages, disadvantages and improvements for FSI problem's are discussed. Comparison is also made with other grid deformation schemes.

1

Introduction

Fluid-Structure Interaction (FSI) problems are described by fluid and structural field equations. Solutions to these problems require the coupling of fluid and structural solvers. This coupled problem can be viewed as a three-field problem by viewing the moving mesh as a system with its own dynamics. One way of coupling the Computational Fluid Dynamics (CFD) and the Computational Structural Dynamics (CSD) codes is to use a partitioned analysis (also called staggered procedure). This procedure is utilized here to study the phenomenon of a nearly incompressible fluid interacting with a solid plate. Herein, we address the mesh movement part of the FSI-simulation. Linear and Torsion spring analogies are used for updating the fluid grid point positions. They are compared on the basis of the quality of mesh obtained after the dynamic meshing. Normalized equiangular skewness is used to assess the quality of the resulting mesh. Comparison is also made with other mesh movement algorithms such as the Laplacian smoothing. 2 2.1

Governing Equations and Finite Element Formulaton Fluid Equations

An inviscid-compressible fluid model is often sufficient for analysis of hydrodynamic structures [5]. Considering an irrotational, isentropic inviscid fluid in small displacements and assuming constant density, the governing equations of the fluid can be written as:

V 2 0 = - ^ 0 , inS°

905

(1)

906

where, /? is the bulk modulus, p is the densityof the fluid, <j> is the velocity potential, S is the fluid domain and wave speed is given by c = ^P/p

.

The boundary conditions for this fluid flow can be written down as: | ^ = 0,on
(2)

—— = un , on dSb and dSf, (3) arc and P0-p<j>= pb ,on
J ^ - ^ k S 0 + j(VS0).(V0)dS° = 0

(6)

Linear triangular elements are used to discretize the fluid domain and, the resulting dynamic equation is marched in time using the Newmark time integration scheme. 2.2

Structural Equation - two dimensional plain strain

By the principle of virtual displacements we have:

\SeCs£dSl + \ps SutidS1 = \Sunfbd(dSb) l

S

[

S

(7)

dSb

where, S stands for the structural domain, Cs is the material stress-strain matrix, ps is the density of solid, u is the displacement vector, e is the strain tensor and / is the interface force vector, which is calculated as shown below: / = —n p , n is the outward normal from the solid. (8) Again, linear triangular elements are used to discretize the structural domain, and the resulting dynamic equation is marched in time using the Newmark integration scheme. 2.3

Modified Linear Spring Analogy to Describe the Mesh Movement

In this approach, a fictious spring is attached to each edge connecting two adjacent vertices i and j of a triangle, and the stiffness is chosen as the inverse of the length ltj [1]. Herein, the stiffness of the spring is expressed as the i/fh power of the length /,, multiplied by a scaling factor

kii=q^xi-xj)1Hyi-yj)2)r

(9)

907

The resulting quasi-static equation is: Kq = 0 (10)

q = qon dSb,dSf,dS0

where, q is the displacement of nodes, and q~ is the prescribed displacements on the boundaries. Equation (10) is solved using the Jacobi iterative scheme. 2.4

Torsional Spring Analogy to Dynamically Move the Fluid Mesh

Originally proposed by Farhat et.al [3], this method consists of introducing torsional springs at the mesh vertices to prevent neighboring triangles from interpenetrating each other. The stiffness of the torsion spring at each vertex of a triangular mesh is given by: / 21 2

(ID

C, = \ t •

where, the stiffness C is inversely related to area A, thus allowing the edges to sense the event of a dynamic triangle close to experiencing a negative area or a bad aspect ratio. The resulting equilibrium equation looks as shown below: pijk 'torsion

\ R ijk Tr'JkR ijk ijk L *-" " J H

ijk "-torsion

ijk

K

'J

( l 2 )

\>-^)

where, R relates the displacement at a node to the corresponding rotational increment. 3

Performance Metrics

Normalized equiangular skewness used to evaluate the quality of mesh is described below: 3.1

Normalized Equiangular Skewness

In the normalized angle deviation method, skewness is defined as [4]: n

— f)

180 -ee

/9

— /9

ee

(13)

where, 0mm is the largest angle in the face or cell, dmin is the smallest angle in the face or cell and 6e is the angle for an equiangular face or cell (60° for a triangle). According to this definition, a value of 0 indicates an equiangular cell (best) and a value of 1 indicates a completely degenerate cell (worst). 4

Results and Discussion

Two-dimensional simulations are done for channel flow (Fig 1) and free surface flow with a cantilever plate in the center. The domain for free surface flow is larger than Fig 1. It is seen from Fig 2 that for the coarser mesh (486 el.) the convergence of displacement is bad when compared to the fine mesh (840 el.). Hence, although the torsion spring generates better skewed triangles (Fig 3), the modified spring analogy manages in generating finer triangles near the structure, which enables a better capture of fluid flow there. It is thus expected that a combination of linear and torsion smoothing would be better and hence, this was tried with the free-surface problem. In this (free surface) case, the laplacian smoothing turns out to be better (Fig 4). It's also seen from Table 1 that the modified linear spring analogy is quite fast and effective when compared to other methods.

908 Olid (water) ft"»7Waf SaB A-2710)eW ' -035 E-7GP»

^

D

T.tnHirBpTinp| •fiimmBBli

3 — - —

Tendon spring, fine modi LinMrEprmg, owasemseh Tamoii spring, cottsemwli

0.01

0.02 TimB(EBo)

V Fig 1: Channel Flow with plate in center

Fig 2: Disp. of mid-pt. on top of the plate

0.482 0.-18

0.497

0.476

0.406

5 0.47R'

Linear sprint ,oo*ra madi Tonrtaa apriuc.ooaixo mool

9 0.17-1

| 0.495 R 0.494

B 0.472 -«! 0.493 0.47 0.468 0.4GG

Fig 3: Avg. skewness plot of fluid mesh

0.492

- A A K LigJunim smoothing - After Uno*r + T<w*k>n

r#y ^

/ v*

;

%

0.49 1

Fig 4: Smoothing for free-surface FSI flow

Table 1: CPU real time for total simulation with different smoothing algorithms Mesh Size Linear spring analogy Laplacian smoothing 40.90 s Coarse(1058 elements) 69.86 s Fine (3696 elements) 258.75 s 285.02 s References 1. Batina J.T., Unsteady Euler Airfoil Solutions Using Unstructured Dynamic Meshes, AIAA Journal, Vol.28, No.8 (1989) pp.1381-1388. 2. Blom F.J. and Leyland P., Analysis of Fluid-Structure Interaction By Means of Dynamic Unstructured Meshes, 4th International Symposium on Fluid-Structure Interaction, Vol 1, ASME (1997). 3. Farhat C , Degand C , Koobus B., and Lesoinne M., Torsional Springs for TwoDimensional Dynamic Unstructured Fluid Meshes, Computer methods in applied mechanics and engineering, 163 (1998) pp. 231-245. 4. Fluent Technologies Ltd., Handbook, 8.1.1, (2002) 5. Nitikitpaiboon C. and Bathe K.J., An Arbitary Lagrangian-Eulerian Velocity Potential Formulation for Fluid-Structure Interaction, Computers and Structures, 47 (1993)pp.871-891. 6. Wu G.X. and Taylor R.E., Time Stepping Solutions for Two-dimensional Nonlinear Wave Radiation Problem, Ocean Engineering, 22,8(1995) pp.785-798

SOLVING BIOT'S CONSOLIDATION MODEL FOR BRAIN TISSUE USING SPARSE MATRIX TECHNOLOGY Y. L. LI Institute of High Performance Computing, #01-01 The Capricorn, 1 Science Park Road, Singapore Science Park II, Singapore 117528 E-mail: [email protected] K. H. LEE Mechanical Department, National University of Singapore, 10 Kent Ridge Crescent Singapore 119260 In this paper, Biot's Consolidation model is applied to simulate the deformation behaviors of human brain tissue in neurosurgery. The brain tissue is porous material consisting of both the solid and fluid phases. Its deformation behavior is governed by coupled bi-phasic partial differential equations. FEM is applied to solve the governing equations and simulate the deformation of brain tissue in neurosurgery. But FEM results in a large-scale sparse equilibrium system as fine elements are implemented to mesh the geometrically complex human brain. So row indexed matrix is adopted to avoid the out of core problem and accelerate convergence using Biconjugate Gradient Method. But this scheme is not convenient in assembling the finite element global system. To overcome this handicap, an array of linked lists is implemented to assemble the global system. After the global system is assembled, boundary and initial conditions are applied for iterative solver.

1

Introduction

Biot's infinitesimal consolidation model [1] for soft tissue are governed by the following equations in Einstein's notation aiU + pJ=0 (la)

diu^-u^/dt-kp^O

(ib)

where Oy are components of the stress tensor; p is pore fluid pressure (Pa); ut are components of displacements (m);k is permeability (m3s/kg) and t is time (s). These equations assume that the solid tissue behaves in a linearly elastic fashion and that the pore fluid is incompressible. The equation (la) relates mechanical equilibrium to the fluid pressure gradient across the medium, while equation (lb) provides the constitutive relationship between volumetric strain and fluid pressure. And the brain is considered as a saturated medium. 2

Solving the coupled equations using FEM

Due to the complex shape of a human brain, FEM is used for solving the coupled system. 2.1

Finite element equations formulation

To implement FEM, Garlekin's method [2] is applied to get the weak form of the governing differential equations. For equation (la), the weighted residual method and the divergence theorem give

909

910

I ffV ^W'J

+ W

i.' WV ~ I P-iW'dV

=

fff*"'*

(2)

Similarly, for equation (lb) of the coupled equations, using finite differencing in time, the weighted residual method and the divergence theorem to give jy u^lw.udV - jy <,-w,dV + Atk £ p_,wudV = At £ kprln,w,dS = At £ h^dS Now, the use of the 0 method [2] implies ii,. = (1 - 0) ul + e uf and Pi = (1-0) pi + 0 />,"+1 Combining equations (2-4) and presenting them into matrices form lead to 4jvBrEBdvj *[jvNr[VL]dV|]'1-

"j][jvLr[VN]dV|J ffAA^IVLnVLjdvjjlp"' (1-0) ["jvBrEBdVil

[jvLr[VN]dV|]

(1 - 0)[jvNr[VL]dV|j

(3) (4)

<»

f s NrtdS,l

(l-ejArtJjJVLnVLjdV^ P J AtjlIhdS',

Calculation of the matrices can be found in Bather's textbook [2]. 2.2

Initial conditions

When time-marching is applied for this time-dependent problem, the initial displacements and pore pressures must be known. Interestingly, with 0=1.0, the elements which multiply the pore pressure p" are zero on the right hand side in equation (5). In this case, only the initial displacements are needed. At t=0, the external loads are supported by the pore fluid and the effective stress in the solid skeleton is zero because the pore fluid seepage velocity is limited when f—>0. So the total fluid flow out of the solid skeleton is zero and there is no geometric deformation in the solid skeleton, thus the displacements are zero at t=0. Once the initial pore pressure and displacements have been set up, 0can be changed from 1.0 to 0.5 as 0=0.5 gives the best computation result in dynamic problems. 2.3

Assembly of the global system

This section discusses the assembly of the global system using linked lists. In this work, the Biconjugate Gradient Method [3] is applied in solving the system of equations Kx=b. This method reference K through its multiplication of its transpose and a vector. These operations can be very efficient for a properly stored sparse matrix and here the row indexed storage scheme [4] is applied. Sparse matrix in this storage scheme is very convenient to multiply a vector to its right. This scheme sets up two one-dimensional arrays, sk and ij*. The first one stores nonzero K„ values, while the second stores integer values which indicate the locations of these nonzero values in K. It save memory in sparse matrix storage but costs time in locating the value of its element K,y. 2.3.1

List storage of stiffness matrix

The local stiffness matrix of an element can be calculated using equation (5). After the stiffness matrix of an element has been calculated, it is assembled into the global system. Its elements are added to the global matrix according to the global node numbers of the nodes of the element. This assembly procedure is simple to perform if the element K,-,- can

911 be easily located in the global system. Unfortunately, this is a tough problem in the rowindexed storage scheme for the sparse matrix [4]. As it is very difficult to locate the element K,y in the row-indexed storage scheme, linked lists [4] are used to store the sparse global stiffness matrix because this storage scheme can allocate the element Ky conveniently, quickly and economically. The stiffness matrix K has nonzero elements in every row and the nonzero elements are only a few in each row. Thus, it can be represented with an array of lists with the array size being n as the dimension of K. In each list, the head of list[/] stores the number of nonzero elements in this row while the other nodes store the nonzero values and their column numbers in iA row of K. The data structure of the linked list is defined as typedef struct LIST { int j; // The column number of element K[i] [j] double val; // The value of element K[i] [j] LIST *next; // Pointer to the next element in the list }LIST; Corresponding to this data structure, other operators are defined for its operation. E.g. Lookup(/, L) is for searching the node has the column number of j in the linked list of L. Add(/', val, L) adds parameter val to val at the list node has the column number of j in L. New(jc, val, j) generates a new node x with val and j . Insert(x, L) appends node x to L. Access(L, j) returns the value of val at node has the column number of j . With the operators, an algorithm for the global system assembly can be given as following. procedure AssembleGlobal(list k[l..n], real val, integer i, integer j) results—Lookup(j, k[i]) if(result) then Add(j,val, k[i]) Else New(x, val, j) Insert(x, k[i]) end AssembleGlobal After the global stiffness matrix has been stored in the lists, the algorithm for rowindexed storage can be modified for linked list stored sparse matrix. In original algorithm, K/, is obtained by array indexes. But here it can be found by calling the operator Access(K[i] j). In this way, the global stiffness matrix can be finally stored in row-indexed storage, which is extremely convenient for methods of iterative solution. 2.4

Application of boundary conditions

There are two kinds of boundary conditions. One is the degree-of-freedom constraint, the other is the applied load. At the exposed nodes, the pore pressure is zero; at the nodes close to the skull, the displacements are fixed in the normal direction but free in the tangential directions.

912

3

Results and conclusions

The simulation model has 6067 hexahedral elements with 6389 nodes and 25556 DOFs. It Young's modulus is E=5xl0 4 Pa; Poisson's ratio is v=0.49; permeability is k=1.0xl0"7 m3s/kg. Free drainage across open hole and other surfaces impermeable and smooth. The deformation by loading pressure is simulated using the model in Fig. 1 and deformation by pore pressure changing is simulated using the model in Fig. 2.

VFigure 1. Simulation model for loading pressure

Figure 2. Simulation model for pore pressure

Figure 3. Cut plane

With a pressure loading of #=lxl0 2 Pa and simulation time of Ato=50s and Atn=Atn_ Selected deformation steps at the cut plane in Fig. 3 for this case are given in Fig. 4.

L.

\L (1)

.

VI

\/

(2) (3) Figure 4. Deformation steps for loading pressure case

\ (4)

When pore pressure is assumed to be 5%E and simulation time is Atn=50s and At^At^+SOs. The zones within the bold curves in Fig. 2 are the holes where fluid drains out. Selected deformation steps at the cut plane in Fig. 3 for this case are given in Fig. 5.

JT\

Zi

pweprntin

y

* *

y

\

pot prtmri

,

*

x

7 \

> i....tiJ...i.J Ll..^S?~. lJ...±...L.LiJ l..l.^I?-; (1)

p«ra pmwre

,

f

•>•

attar CSF draknga

.(....* > t i i J . i l -X',2

(2) (3) Figure 5. Deformation steps for pore pressure case

*

2 (4)

The results show pore fluid pressure change is the main reason for brain shift. References 1. Biot M. A., General theory of three-dimensional consolidation. Journal of Applied Physics, 12 (1941) pp. 155-164. 2. Bathe K. J., Finite element procedures. Englewood Cliffs, N.J.: Prentice-Hall. 1996. 3. William H. P., Saul A. T., William T. V. and Brian P. F., Numerical recipes in C: The art of scientific computing, 2nd ed. Cambridge, New York: Cambridge University Press. 1994 4. Lewis H. R. and Denenberg L., Data structures and their algorithms. New York: HarperCollins Publishers. 1991.

3-D MULTI-BLOCK ORTHOGONAL GRID GENERATED BY LAPLACE EQUATION WITH SLIDING BOUNDARY CONDITION Z.K.ZHANG Temasek Laboratories,

National University of Singapore,10 Kent Ridge Crescent, Singaporel E-mail: tslzzk© nus. edu. sg

19260

A three-dimensional multi-level multi-block Laplace equation based grid generation method with sliding boundary condition is presented to generate a smooth, orthogonal-to-boundary grid. Laplace equation based method can obtain good smoothness across block interfaces when using ghost points along both sides of the interfaces. In the iterative solution of the Laplace equation, the sliding boundary condition is made possible by using NURBS to represent the body surface. With NURBS, the body surface grid points move freely along the surface and are updated at each iteration step of Laplace equation solution by projecting the second-layer grid points to the body surface in its normal direction, consequently the grid lines starting from the body are always orthogonal to it. Thus the body surface grid is created simultaneously with the interior volume grid.

1

Introduction

Grid orthogonality plays an important role in implementing boundary conditions in numerical simulation of flow field. Orthogonal grid simplifies the formulation of boundary condition and makes it more accurate. Conformal mapping method can indeed create smooth orthogonal grids, but it is difficult to specify or control the grid spacing and it is only applicable to two-dimensional case. Algebraic method is fast and can control both orthogonality and grid spacing but can not prevent grid from crossover, nor guarantee overall smoothness. Elliptic equation based method naturally smooths the discontinuities on boundaries and can control orthogonality and spacing on boundaries by iteratively correcting source terms, such as Sorenson method[1] or Hilgenstock methodt2,3,4]. But both Sorenson method and Hilgenstock method are time-consuming since there are two levels of loops in the two methods. Laplace equation based method (i.e., elliptic equation without source terms) maintains the smoothing property of the elliptic equation based method and has the tendency to create uniform grids if the boundary points are free to float (Neumann-Dirichlet or sliding condition) during the grid generation process. The method also ensures that the grid does not cross over. The motivation of the present paper is, in the Laplace equation solution process, to add sliding boundary condition by NURBS (Non-

913

914

Uniform Rational B-Spline) to body surface boundary which is represented by the NURBS. At each step of the Laplace equation iteration, the body surface grid points are up-dated by projecting the second-layer grid points to it in its normal direction. Thus the grid lines starting from the body surface are always orthogonal to it and its surface grid is generated simultaneously with the interior volume grid.

2

Description of the Method

The 3-D Laplace equations used for grid generation are

C„+Cw+Ca=o The equivalent equation in computational space can be written in a vector form[23'4] =o

(2)

where f = xl + yj + zk • Eq. (2) can be solved either with SOR (Successive Over-Relaxation) method or SLOR (Successive Line Over-Relaxation) method. To provide flexibility for grid generation, a multi-block, patch grid method is used. Using ghost or halo points along both sides of the interfaces ensure continuity across block interfaces. In solving Eq. (2), all the blocks are computed simultaneously, thus boundary points are updated at each iteration step. This ensures the grid lines to be smooth across the interfaces. In order to obtain boundary orthogonality, the sliding condition is incorporated by using NURBS to represent the body surface. This condition allows boundary points to move on the boundary freely. At each iteration step of Laplace equation solution, the body surface grid points are updated by projecting the second-layer grid points to the body in its normal direction, therefore the grid lines originating from the body are always orthogonal to it. Finally when iteration converges, both the body surface grid and the interior volume grid are created and the orthogonality on body surface is obtained as well. To generate grid faster and more efficiently, the grid generation process is made in a multi-level way. That is to say, a coarsest grid (first

915

level) is generated first by a few iteration steps. Then interpolate this grid and iterate by Laplace equation solver to obtain a finer grid (second-level). Each higher level grid is generated by its previous level grid in this way. The highest-level grid is the final grid. 3

Applications

An ellipsoid is selected as a test. The three semi-axes of the ellipsoid are 1.0, 2.0, 0.4, respectively. The ellipsoid surface (inner boundary) is divided into 6 blocks corresponding, respectively, to the six faces of a cube surface which is taken as the far-field (outer) boundary. The volume grid is, therefore, divided into six blocks too. Each block resembles a frustum of a pyramid. Taking a cube surface as the far field boundary is deliberate to test the influence of the method in minimizing boundary discontinuity effects on overall grid smoothness. The coarsest (first level) 6-block grid is generated by 20 iterations of Laplace equation solver with grid number of 5x5x5in each block (Fig.l) and the second-level grid ( 9 x 9 x 9 grid points in each block) is generated by 40 iterations (Fig.2).

x

Fig.l Coarsest grid(5x5x5)

Y

Fig.2 Second-level grid(9x9x9)

The third-level grid (17x17x17 in each block) and the fourth-level grid (33x33x33 in each block) are generated in the same way with 100 iterations respectively.Fig.3 and Fig.4 present the fourth-level volume grid and body surface grid respectively. Good grid smoothness can be found from the figures. The grid lines across the block interfaces are very smooth although there are slope discontinuities along the outer boundary . The

916

good smoothness is attributed to the Laplace equation itself and to using the ghost points.

Fig.3 The fourth-level grid (33 x33x33)

4

Fig.4 Body surface grid of the fourth-level grid

Conclusions

Representing body surface with NURBS is helpful in making grid lines starting from body surface orthogonal to it. At the same time it makes it possible to generate the volume grid and the body surface grid simultaneously by projecting the second-layer grid points to the body in its normal direction. How to search body normal lines faster and to ensure the fidelity of the NURBS are two important factors to the method. References 1. Sorenson, R.L., McCann, K., "A method for iterative specifications of multiple-block topologies", AIAA paper 91-0147, 1991. 2. White, J.A., "Elliptic grid generation with orthogonality and spacing control on an arbitrary number of boundaries", AIAA paper 90-1568, 1990. 3. Sonar, T., "Grid generation using elliptic partial differential equations", Braunschweig: DFVLR-FB89-15, 1989. 4. Zhang, Z.K., Zhu, Z.Q., Zhuang, F.G., "A multi-block grid generation system for multi-component aircraft", Proc. of 7th International Symp. on Comput. Fluid Dynamics, International Academic Publishers, Sept. 1997, pp. 390-395.

MESHING HUMAN BRAIN WITH HEXAHEDRAL MESHES FROM IMAGE SLICES USING MESH MAPPING METHOD Y. L. LI Institute of High Performance Computing, #01-01 The Capricorn, 1 Science Park Road, Singapore Science Park II, Singapore 117528 E-mail: liyl@ ihpc. a-star. edu.sg K. H. LEE Mechanical Department, National University of Singapore, 10 Kent Ridge Crescent, Singapore 119260 Finite element method (FEM) is applied to analyze the dynamic behaviors of human brain in biomechanics. But the brain cannot be easily meshed with hexahedral meshes for FEM simulation due to its complex shape. Conventionally, it is meshed with hexahedrons from its computerized tomography (CT) or magnetic resonance imaging (MRI) image voxels directly. Unfortunately, this method tends to generate poor quality elements at the boundary of the volume. Voxel-based meshing method also causes extremely fine meshes. Too fine a mesh results in a big system for matrix, which is a huge disadvantage in real-time dynamic simulation. Due to the shortcomings of voxel-based meshing method, another method for creating a threedimensional hexahedral mesh of the human brain is proposed herein. In this method, the image slices are used for extracting contours instead of being meshed with hexahedral elements using the image voxels directly. After the contours have been extracted, they are used for mapping the recognition model to the physical model using the method put forward in this paper. Hexahedral meshes generated from image slices of the human brain with different resolution are given for demonstrating efficiency of the method. Moreover, this mesh mapping method can also be used for generating hexahedral meshes from CT or MRI image slices of other human organs.

1

Introduction

Mesh mapping method [1] is based on the theory of Basis mesh (Recognition model) + F => Physical model. This method is theoretically very simple. Its capacity to handle realistic geometrical situations adequately strongly linked to the mapping function F. This paper implements the mesh mapping theory to map a regular recognition model of the human brain to its true physical model, which is constructed from the MRI image slices of the human brain. 1

Recognition model

Figure 2. Free surfaces of the recognition model

Figure 1. Recognition model

917

918

The regular recognition model of the human brain can be constructed using commercial FEM software package such as PATRAN®. It is given in Fig. 1 and is to be mapped to the true physical model by chosen mapping function. The boundary surfaces of the recognition model are classified as shown in Fig. 2. For simplicity and clarity, only part of the surfaces is presented. 2

Mapping function

The mapping function chosen here is the Laplacian function of

V V r = 0 in Q 1 T{Z,n,Z) = TT onT J where VV=A is the Laplacian operator. T can be the coordinate component x, y or z respectively. 3

Physical model

So far, the recognition model is ready and the mapping function is chosen. The last problem in solving this mapping function is to find the coordinates of the bounding surface nodes on the physical model. These coordinates are needed as boundary conditions when solving the mapping function numerically. The steps involved in obtaining these coordinates are described next. 3.1

Contours of image slices

The contours of the image slice of the human brain are extracted using image processing technologies [2]. They are shown as in Fig. 3. For simplicity and clarity, only contours on half side of the human brain are given. The other half is nearly symmetric.

Figure 3. Slices of contours extracted from image slices Further down, the contours extracted are broken and classified into groups for constructing the top, left and right side parts of the free surfaces. The contours in Fig. 3 are broken and classified into groups as in Fig. 4. They are used for construction of the surface patches of the physical model.

Figure 4. Broken contours for surface patches of brain

919 3.2

Surface patches of physical model

After all the contours extracted from the image slices have been divided into groups, they can used to construct the boundary surface patches of the human brain. Here they are represented by many small triangles as shown in Fig. 5.

(a)

3.3

(b) (c) Figure 5. Triangulated side surfaces of brain

Mapping of surface of recognition model to physical model

Once the boundary surfaces of the recognition model and the physical model have been identified and constructed, the surface patches of the former can be mapped to the latter using the mapping method by solving the two-dimensional Laplacian equation. But before the two-dimensional Laplacian equation is used for mapping the recognition surface to the physical surface, the surfaces need to be projected to a suitable two-dimensional plane in order for the mapping to be successful. The projection direction is chosen through evaluating the average normal of all triangles. After the surfaces are projected to a plane in this direction, the 2D Laplacian equations are solved for mapping. Here only the left part side is demonstrated, other parts are the same. Fig. 6 is the 2D mapping result of the left part side after the boundary surfaces of the recognition and true models are projected to the plane transverse to the average normal.

ip_ Figure 6. Result of two-dimensional mapping in a plane

'>:>"*& ISO metric view

Front view

Side view

Figure 7. Mapped quadrilateral surface and physical surface

Yet as shown in Fig. 7, the shapes of the mapped recognition surface and the physical surface are similar only when they are projected onto the plane transverse to the average normal direction. But differ from the physical one in the average normal direction. So the mapped nodes on this plane need another projection to the physical surface along the average normal direction. To do this, a line in this direction is drawn from the node so that it intersects with one triangle on the triangulated surface [3]. This intersection point is where the mapped node is projected.

920

4

Results and Conclusion

After the surface of recognition model is mapped to the physical model, the coordinates of the boundary nodes are used as boundary conditions to solve the three dimensional Laplacian equation and get the final result of the mapping method. Both coarse and fine meshes are generated and shown in Fig. 8. - >*s

"^11 ft

irse mesh A fine mesh A coarse Figure 8. Results of mapping method

The mapping method can be used for meshing the human brain with hexahedral elements. However, the difficulty in extracting contours from the image slices still remains. Even after the contours have been extracted, classifying them into different parts requires significant expert experience. Thus, the codes developed for this mesh mapping method need to be integrated with some expert system. In order to approximate the surface of the brain from image slices with triangles, the marching cube method for surface rendering can be used. The difficulty in using this method is that there is no clear demarcation between the brain tissue and bone. This causes great difficulty in distinguishing the brain and skull unless some new technology can be used to overcome the problem. It may also be worthwhile to investigate the use of image processing technology to extract the contours from the image slices. References 1. George P. L., Automatic mesh generation: Application to finite element method(Chichester: John Wiley and Sons. 1991). 2. Mignotte M. and Meunier J. A., Multiscale optimization approach for the dynamic contour-based boundary detection issue. Computerized Medical Imaging and Graphics, 25(3) (2001) pp. 265-275 3. Foley J. D., Dam A., Feiner S. K. and Hughes J. F., Computer Graphics: Principles and Practice(Addison-Wesley, Reading, MA, 2nd. edition, 1995).

0, 00

19 9 1

27 27 27

"i n 1 ~2

2,

1 1 1 1 1

7 7 7 7 7

3 0. 0 3 0.0 3 0. 0 3 0. 0 3 0. 0 3 0. 0 30

27 ^7

27

P279 sc ISBN 1-86094-345-4(pbk)

Imperial College Press '.icpress.co.uk

9 781860"943454

linn

Advances in electrical engineering and computational science

Read more

Recent Advances in Computational Terminology

Read more

Recent Advances in Computational and Applied Mathematics

Read more

Recent advances in computational and applied mathematics

Read more

Computational Science and Engineering

Read more

Computational science and engineering

Read more

Computational Science and Engineering

Read more

Computational science and engineering

Read more

Computational Science and Engineering

Read more

12.Computational Science and Engineering

Read more

Advanced Computational Methods in Science and Engineering (Lecture Notes in Computational Science and Engineering)

Read more

Advanced Computational Methods in Science and Engineering (Lecture Notes in Computational Science and Engineering)

Read more

Recent Advances in Intelligent Engineering Systems

Read more

Advanced Computational Methods In Science And Engineering

Read more

Allium Crop Science: Recent Advances

Read more

Recent Advances in Actinide Science (Special Publications)

Read more

Recent Advances in Anaesthesia and Intensive Care 022 (Recent Advances)

Read more

Advances in Electrochemical Science and Engineering

Read more

Advances in Electrochemical Science and Engineering

Read more

Advances in electrochemical science and engineering

Read more

Advances in Computer Science and Engineering

Read more

Advances in electrochemical science and engineering

Read more

Advances in Electrochemical Science and Engineering

Read more

Advances In Electrochemical Science And Engineering

Read more

Advances in electrochemical science and engineering

Read more

Flavour Science: Recent Advances and Trends

Read more

Recent Advances in Technologies

Read more

Advances in Electrical Engineering and Computational Science (Lecture Notes in Electrical Engineering)

Read more

Recent Advances in Physiotherapy

Read more

Recent Advances in Mechatronics

Read more

Recommend Documents

Advances in electrical engineering and computational science

Advances in Electrical Engineering and Computational Science Lecture Notes in Electrical Engineering Volume 39 For f...

Recent Advances in Computational Terminology

Recent Advances in Computational Terminology Natural Language Processing Editor Prof. Ruslan Mitkov School of Langu...

Recent Advances in Computational and Applied Mathematics

Recent Advances in Computational and Applied Mathematics Theodore E. Simos Editor Recent Advances in Computational a...

Recent advances in computational and applied mathematics

Recent Advances in Computational and Applied Mathematics Theodore E. Simos Editor Recent Advances in Computational a...

Computational Science and Engineering

CO M PUTAT I ONAL SCI ENCE AND ENGI NEE R I NG GILBERT STRANG Massachusetts Institute of Technology WELLESLEY-CAMBRI...

Computational science and engineering

Computational Science and Engineering

CO M PUTAT I ONAL SCI ENCE AND ENGI NEE R I NG GILBERT STRANG Massachusetts Institute of Technology WELLESLEY-CAMBRI...

Computational science and engineering

Computational Science and Engineering

CO M PUTAT I ONAL SCI ENCE AND ENGI NEE R I NG GILBERT STRANG Massachusetts Institute of Technology WELLESLEY-CAMBRI...

12.Computational Science and Engineering

12• Computational Science and Engineering 12• Computational Science and Engineering Algorithmic Differentiation and Dif...